# 1. Modifiers and objects: agreement

In [1]:
import json

import pandas as pd

from sklearn.metrics import cohen_kappa_score
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix

## Interannotator agreement

On the basis of the existing secondary literature (in history and linguistics), we manually defined two sets of commonly occuring objects and their (premodifiers) in a normalized spelling of the headword. Two annotators each independently categorized these sets. The modifiers were categorized as "descriptive" or "evaluative"; the objects were tagged along five classes, which reflect current debates about these artefacts and aimed for a reasonable distribution of the objects over these classes.

### Modifiers: agreement

The first independent categorization of the modifiers was provided by ADM:

In [2]:
mod_df = pd.read_excel('../annotations/mods_ADM.xlsx')
modifiers = sorted(set(mod_df[mod_df['modifier'] == 'MOD']['headword']))

The second independent categorization of the modifiers was provided by LF:

In [3]:
mod_df2 = pd.read_excel('../annotations/mods_LF.xlsx')
mod_df2

Unnamed: 0,headword,modifier,E/D,variants
0,adapted,MOD,D,adapted (90) | adanted (2) | -adapted (1) | ad...
1,admired,MOD,E,admired (43) | admited (6) | admitad (1) | dmi...
2,airy,MOD,E,airy (185)
3,ancient,MOD,E,ancient (41) | antient (22) | anclent (6) | an...
4,antique,MOD,E,antique (53) | antiqee (1) | pantique (1) | an...
...,...,...,...,...
277,worsted,MOD,D,worfted (72) | worfted- (15) | worftead (9) | ...
278,worsted-damask,MOD,D,worfted-damafk (79) | werfted-damafk (2) | wor...
279,writing,MOD,D,writing (127) | wrlting (15) | writing- (6) | ...
280,wrought,MOD,D,wrought (46) | wroughe (1)


The modifiers initially counted 282 distinct items:

In [4]:
print(len(mod_df))
print(len(mod_df) == len(mod_df2))

282
True


The established Cohen $\kappa$ score returned for these class labels is:

In [5]:
print('kappa:', cohen_kappa_score(mod_df['E/D'], mod_df2['E/D']))

kappa: 0.8407293851863805


This $\kappa$ statistic is a scalar between -1 and 1: larger positive values imply a strong agreement but values closer to zero (or negatives scores) means that the agreement might be due to chance. In this case, agreement can at least be considered as "strong" [[ref](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900052/#:~:text=Cohen%20suggested%20the%20Kappa%20result,1.00%20as%20almost%20perfect%20agreement.)].

In [6]:
enc = LabelEncoder().fit(mod_df['E/D'])
A_int = list(enc.transform(mod_df['E/D']))
B_int = list(enc.transform(mod_df2['E/D']))
cm = confusion_matrix(A_int, B_int)
cf = pd.DataFrame(cm, columns=enc.classes_, index=enc.classes_)

We write this to LateX for the paper:

In [7]:
print(cf.to_latex())

\begin{tabular}{lrr}
\toprule
{} &    D &   E \\
\midrule
D &  167 &   4 \\
E &   17 &  94 \\
\bottomrule
\end{tabular}



As is clear from the confusion matrix, there was relatively more disagreement regarding the evaluative class (which makes sense). During discussions in the adjudication phase, the annotators attempted to resolve instances of disagreement through qualitative discussions. This led to the following distribution of evaluative and descriptive items, with a considerable skew towards the "descriptive" modifers:

In [8]:
mod_df = pd.read_excel('../annotations/mods_ADJ.xlsx')
mod_df['DEF'].value_counts()

D    182
E    100
Name: DEF, dtype: int64

In [9]:
for gr, grdf in mod_df.groupby('DEF'):
    print(f'\item[{gr}] ' + ' - '.join(set(grdf['headword'])))

\item[D] blue - rosewood - window - drawing - glass - ebony - lisbon - moreen - couch - brussels - persian - chintz - breakfast - cheney - damask - sleeping - crimson - chelsea - english - hollands - cabriole - carved - panned - four-stall - dresden - french - parlour - brass - spanish - double - indigo - upholftery - drinking - fowling - card - dressing - silk - sconces - mohair - japan - winged - pembroke - kitchen - brilliant - check - black - muslin - dimity - built - pewter - toned - inlaid - eight - musical - elbow - italian - copper - camblet - miscellancous - jamaica - wrought - four-post - household - writing - madeira - arched - diamond - wearing - general - bay - cotton - dining - satin - servants - gilt - oriental - bowed - feather - red - marseilles - wood - sundry - serges - singular - coloured - private - various - domestic - german - several - bronze - variety - copyhold - double-key'd - brick-built - grey - leasehold - mahogany - metal - flemish - golden- - harrateen -

These are the modifiers that we will work with below:

In [10]:
print('modifiers:', ' - '.join(modifiers[:15]) + ' [...]')

modifiers: adapted - admired - airy - ancient - antique - arable - arched - attached - bay - beautiful - billiard - black - blue - bordered - bowed [...]


In [11]:
print('modifiers:', ' - '.join(modifiers))

modifiers: adapted - admired - airy - ancient - antique - arable - arched - attached - bay - beautiful - billiard - black - blue - bordered - bowed - brass - breakfast - brick-built - brilliant - broad - bronze - brown - brussels - built - cabriole - calico - camblet - capital - card - carpeting - carved - celebrated - check - cheerful - chelsea - cheney - chestnut - chinese - chintz - circular - clean - clever - coach- - coloured - comfortable - commodious - common - compact - complete - condition - convenience - convenient - copper - copyhold - cornices - cotton - couch - crimson - culinary - curious - damask - desirable - detached - diamond - dimity - dining - domestic - double - double-key'd - draught - drawing - dresden - dressing - drinking - dutch - dwelling - easy - eating - ebony - eight - eight-day - elbow - elegant - eligible - eminent - enamelled - english - exceeding - exceedingly - excellent - exquisite - extensive - family - fancy - farming - fashionable - feather - fiel

### Objects: agreement

We applied the same procedure to the object classes, which were of an economic nature, rather than linguistic categories. 139 distinct words were considered.

The second annotator (BS) was a different one than the second annotator in the previous task, because of the specific historic expertise required in completing this task.

In [12]:
obj_df = pd.read_excel('../annotations/objects_ADM.xlsx')
obj_df['category'] = obj_df['category'].str.lower()
obj_df

Unnamed: 0,headword,modifier,category,variants
0,apartment,OBJ,real estate,apartments (106) | aparsments (1) | apurtments...
1,assortment,OBJ,no,assortment (76) | affurtment (3) | assoriment ...
2,attic,OBJ,real estate,atties (45) | attie (11) | attles (6) | attics...
3,barrel,OBJ,appliances/utensils,barrel (88) | barrels (25) | bartel (4) | barr...
4,bath,OBJ,real estate,bath (280) | bath- (6) | bate (1) | baixe (1)
...,...,...,...,...
134,vault,OBJ,no,vaults (93) | aults (19) | vauits (5) | vault ...
135,villa,OBJ,real estate,villa (277) | vilia (3) | vhla (3) | villan (2...
136,wardrobe,OBJ,furniture,wardrubes (28) | wardrabes (25) | wardrube (19...
137,warehouse,OBJ,real estate,warehoufes (139) | warehoufe (123) | warehouse...


In [13]:
print(set(obj_df['category']))
print(len(set(obj_df['category'])))

{'animal accessories', 'accessories', 'real estate', 'appliances/utensils', 'instrument', 'clothing/fabrics', 'decoration', 'haberdashery', 'no', 'animal', 'furniture', 'tableware'}
12


In [14]:
obj_df2 = pd.read_excel('../annotations/objects_BS.xlsx')
obj_df2['category'] = obj_df2['category'].str.lower()

In [15]:
print('kappa:', cohen_kappa_score(obj_df['category'], obj_df2['category']))

kappa: 0.8182214472537053


In [16]:
enc = LabelEncoder().fit(obj_df['category'])
A_int = list(enc.transform(obj_df['category']))
B_int = list(enc.transform(obj_df2['category']))
cm = confusion_matrix(A_int, B_int)
cf = pd.DataFrame(cm, columns=enc.classes_, index=enc.classes_)

In [17]:
print(cf.to_latex())

\begin{tabular}{lrrrrrrrrrrrr}
\toprule
{} &  accessories &  animal &  animal accessories &  appliances/utensils &  clothing/fabrics &  decoration &  furniture &  haberdashery &  instrument &  no &  real estate &  tableware \\
\midrule
accessories         &            5 &       0 &                   0 &                    0 &                 0 &           0 &          0 &             0 &           0 &   0 &            0 &          0 \\
animal              &            0 &       2 &                   0 &                    0 &                 0 &           0 &          0 &             0 &           0 &   0 &            0 &          0 \\
animal accessories  &            0 &       0 &                   2 &                    0 &                 0 &           0 &          0 &             0 &           0 &   0 &            0 &          0 \\
appliances/utensils &            0 &       0 &                   0 &                    8 &                 0 &           0 &          0 &             0

In this case, the $\kappa$ dropped a bit, but was still "substantial". Cases of disagreement were resolved:

The final (and relatively skewed) distribution of category labels over the 12 categories looks as follows:

In [18]:
obj_df = pd.read_excel('../annotations/objects_ADJ.xlsx')
obj_df['DEF'] = obj_df['DEF'].str.lower()
obj_df = obj_df[obj_df['DEF'] != 'no']
obj_df['DEF'].value_counts()

real estate            46
furniture              25
decoration             18
clothing/fabric        10
appliances/utensils     9
tableware               6
accessories             5
instrument              4
animal/accessories      4
Name: DEF, dtype: int64

We write this to LateX to include this information in the appendix:

In [19]:
for gr, grdf in obj_df.groupby('DEF'):
    print(f'\item[{gr}] ' + ' - '.join(set(grdf['headword'])))

\item[accessories] trinket - locket - bracelet - earring - jewellery
\item[animal/accessories] pony - horse - saddle - harness
\item[appliances/utensils] mangle - hearth - utensil - butts - fire-arm - pistol - stove - chimney - barrel
\item[clothing/fabric] matress - shawl - clothes - sheet - hose - mercery - habderdashery - counterpane - drapery - handkerchief
\item[decoration] books - candelabra - pillar - carpet - vase - candlestick - frame - picture - lustre - screen - globe - shell - boxes - plant - cut-glass - chimney-glass - chandelier - lamp
\item[furniture] cabinet-work - bookcase - press - bedstead - chest - pantry - couch - closet - canopy - sideboard - desk - cellarets - bureau - sofa - wardrobe - drawers - secretaire - commode - dining-tables - library-case - chaise - settee - chair - furniture - cabinet
\item[instrument] harpsichord - pianoforte - piano - instrument
\item[real estate] chamber - dining-room - lots - cellar - apartment - orchard - coach-house - farm-house -

The final objects that we will use in this study were therefore, after excluding the not applicable category:

In [20]:
objects = sorted(set(obj_df[obj_df['modifier'] == 'OBJ']['headword']))
print('objects:', ' - '.join(objects[:15]) + ' [...]')

objects: apartment - attic - barrel - bath - bed-room - bedchamber - bedstead - bookcase - books - bottles - boxes - bracelet - brewhouse - buildings - bureau [...]


We persist the objects and modifiers to disk as JSON for reuse in the subsequent notebooks:

In [21]:
mod_lookup = {}
for i, row in mod_df.iterrows():
    mod_lookup[row['headword']] = row['DEF']

with open('../annotations/modifiers.json', 'w') as f:
    f.write(json.dumps(mod_lookup))

In [22]:
obj_lookup = {}
for i, row in obj_df.iterrows():
    obj_lookup[row['headword']] = row['DEF']

with open('../annotations/objects.json', 'w') as f:
    f.write(json.dumps(obj_lookup))