## Problem Statement:
The study of highly uniform handwriting and book typologies is a particularly fascinating
case in the realm of manuscript studies (paleography and codicology). In such
circumstances, examining some basic layout aspects, primarily those connected to the
structure of the page and the use of available space, can be highly useful in
distinguishing between comparable scribal hands. You need to establish a set of layout
elements in this framework to create a pattern recognition system for identifying the
scribes who worked together to transcribe a single mediaeval Latin text. You also need
to test the discriminative strength of each considered characteristic, to see if selecting a
subset of traits for each scribe, specifically designed to identify him from the others,
could help us get better results. This method allowed us to add a simple reject option for
unreliably classified samples, such as those that were not assigned to any scribe or
were assigned to many scribes. The experiments, which used a big database of digital
images from the so-called "Avila Bible" – a massive Latin copy of the entire Bible
compiled during the sixteenth century between Italy and Spain – proved that the
proposed method works. Various photographs of the pages inside Avila Bible were
taken, and based on those photographs, they have derived various features. In the
Bible, there are total 12 authors who have written different scripts. So, our goal is
solving the classification problem and predict which author wrote the particular script.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
sns.set(rc={"figure.figsize":(15,6)})
pd.pandas.set_option("display.max_columns",None)

In [2]:
data = pd.read_csv("reliable author identification in avila bible.csv")

In [3]:
data

Unnamed: 0,Intercolumnar_distance,upper_margin,lower_margin,exploitation,row_number,modular_ratio,inter_linear_spacing,weight,peak_number,modular_ratio/inter_linear_spacing,class
0,0.241386,0.109171,-0.127126,0.380626,0.172340,0.314889,0.484429,0.316412,0.188810,0.134922,Marcus
1,0.303106,0.352558,0.082701,0.703981,0.261718,-0.391033,0.408929,1.045014,0.282354,-0.448209,Clarius
2,-0.116585,0.281897,0.175168,-0.152490,0.261718,-0.889332,0.371178,-0.024328,0.905984,-0.877830,Philippus
3,-0.326430,-0.652394,0.384996,-1.694222,-0.185173,-1.138481,-0.232828,-1.747116,-1.183175,-0.807380,Philippus
4,-0.437525,-0.471816,0.463236,-0.545248,0.261718,-0.972381,0.824183,-3.108388,-2.991700,-1.141030,Philippus
...,...,...,...,...,...,...,...,...,...,...,...
12012,0.093260,-0.087108,-2.268081,-0.164963,0.261718,0.148790,0.333428,0.587587,0.219991,0.072596,Marcus
12013,-0.215336,0.101320,0.235627,-0.280585,0.261718,-1.719828,-0.308329,1.008086,-0.154186,-1.302496,Philippus
12014,0.031541,0.297600,-3.210528,-0.583590,-0.721442,-0.224934,0.333428,0.664239,0.687713,-0.224659,Marcus
12015,0.266074,0.580242,0.114709,-0.165469,0.261718,0.024215,0.446679,0.428536,0.375899,-0.103698,Marcus


In [5]:
data["class"].unique()

array(['Marcus', 'Clarius', 'Philippus', 'Mongucus', 'Ubuntius',
       'Coronavirucus', 'Esequlius', 'Paithonius'], dtype=object)