# TF-IDF

We'll rely on term frequency times inverted document frequency ([TF-IDF](https://web.stanford.edu/~jurafsky/slp3/11.pdf)), a powerful implementation of the [bag-of-words model](https://web.stanford.edu/~jurafsky/slp3/B.pdf), to measure meaningful similarity between documents while disregarding word order. Let's start by generating a matrix for the separate constituent parts of _Stjórn_.

In [None]:
import os,glob,json
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [6]:
def normalize(target):
    # This dict limits orthographical variation beyond the rule sets
    # of stjorn-extract.ipynb and menota-extract.ipynb:
    matrix = {
        'j': 'i',
        'v': 'u',
        # Use of d for ð in AM 226 rather spoils the results,
        # so instead of normalizing ð to þ we will normalize
        # both to d (resulting in strange forms of course):
        #'ð': 'þ',
        'ð': 'd',
        'þ': 'd',
        'á': 'a',
        'ǽ': 'æ',
        'é': 'e',
        'í': 'i',
        'ó': 'o',
        'ú': 'u',
        'ý': 'y',
        'ǿ': 'ø',
        'k': 'c', # rather than vice versa, because of Latin (e.g. Lucifer)
        '[': '',
        ']': ''
        }
    for k,v in matrix.items():
        target = target.replace(k, v)
    return target

titles = ['prologue', 'introduction', 'gn', 'ex', 'lv', 'nm', 'dt', 'ios', 'idc', 'rt', '1sm', '2sm', '3rg', '4rg']
tokens = []
for title in titles:
    with open(f"nlp/{title}.txt") as raw:
        document = raw.read().replace('\n', ' ')
        tokens.extend(document.split())

work_indices = {
    'stjorn1': (650,124417),
    'stjorn2': (124417,147678),
    'stjorn3': (147678,156943,160719),
    'stjorn4': (156943,160719)
}

stjorn = dict()
for _work, _range in work_indices.items():
    if len(_range) == 2:
        stjorn[_work] = normalize(' '.join(tokens[_range[0]:_range[1]]))
    else:
        stjorn[_work] = normalize(' '.join(tokens[_range[0]:_range[1]] + tokens[_range[2]:]))

menota = dict()
for text in glob.glob('../menota/dipl/*txt'):
    ref = os.path.basename(text).replace('.txt', '')
    with open(text) as doc:
        # We'll subject Menota to the same normalization standard as Stjórn:
        menota[ref] = normalize(doc.read().replace('\n', ''))

Note the arguments passed to the vectorizer class below. `min_df` sets a cutoff for the minimum number of documents in which a term has to appear in order to be included in the model. Since terms exclusive to single compositions are among the things that interest us, we'll leave this at `1`. `max_df` sets a cutoff point above which relative document frequency a term is ignored: in other words, a value of `0.8` ignores words occurring in over 80 percent of documents. Changing this value massively changes the document similarity scores downstream, and drastic changes in the setting have a pronounced effect on the document similarity rankings as well. A strict threshold is at any rate required to gain an insight into the relevance of individual terms, as leaving `max_df` at its default of `1.0` would lead the model to conclude that "oc" is the most meaningful term in many of our documents, while "æigi" is the top-ranking term for _Stjórn III_ as well as the _Norwegian Homily Book_ with a value as low as `0.4`. We can therefore leave the score high in the _Stjórn_-internal comparison, but may want to set a lower threshold for the larger corpus.

In [7]:
vectorizer = TfidfVectorizer(min_df=1, max_df=0.8)
model = vectorizer.fit_transform(stjorn.values())
df = pd.DataFrame(cosine_similarity(model), stjorn.keys(), stjorn.keys())
df

Unnamed: 0,stjorn1,stjorn2,stjorn3,stjorn4
stjorn1,1.0,0.05932,0.682042,0.425337
stjorn2,0.05932,1.0,0.083006,0.025865
stjorn3,0.682042,0.083006,1.0,0.479767
stjorn4,0.425337,0.025865,0.479767,1.0


After eliminating such variation as vowel length marks and the þ/ð distinction, these are now all pretty similar to one another, with the biggest difference between _Stjórn II_ and _III_.

Now let's first add _Konungs skuggsjá_ from Menota, as well as Unger's own edition of the _Norwegian Homily Book_. Fingers crossed that we have got the normalization standard of the former to approach Unger's methods reasonably well.

In [8]:
# We want only those parts of Unger's NHB matched in Menota:
nhb_titles = ['alcuin', 'hom', 'olafr', 'visio', 'paternoster', 'anhang1']
nhb = ''
for title in nhb_titles:
    filepath = f'../nhb/nlp/{title}.txt'
    with open(filepath) as doc:
        nhb = nhb + normalize(doc.read().replace('\n', ''))
stjorn_plus = []
for v in stjorn.values():
    stjorn_plus.append(v)
stjorn_plus.extend([menota['nks235g_konungs_skuggsja'], nhb])
model = vectorizer.fit_transform(stjorn_plus)
df = pd.DataFrame(cosine_similarity(model), list(stjorn.keys()) + ['ks', 'nhb'], list(stjorn.keys()) + ['ks', 'nhb'])
df

Unnamed: 0,stjorn1,stjorn2,stjorn3,stjorn4,ks,nhb
stjorn1,1.0,0.245597,0.320679,0.343395,0.007072,0.048417
stjorn2,0.245597,1.0,0.229203,0.301802,0.084316,0.034099
stjorn3,0.320679,0.229203,1.0,0.332619,0.014582,0.210626
stjorn4,0.343395,0.301802,0.332619,1.0,0.001074,0.072545
ks,0.007072,0.084316,0.014582,0.001074,1.0,0.055996
nhb,0.048417,0.034099,0.210626,0.072545,0.055996,1.0


_Stjórn III_ and _Konungs skuggsjá_ share material cognate within the vernacular, but not so much, or with insufficient spelling agreement, to stand out in this matrix. In fact, the _Norwegian Homily Book_ has a higher match with _Stjórn III_ than _Konungs skuggsjá_ does, which may be explained at least in part by the closer subject match for those parts of _Stjórn III_ not reflected in _Konungs skuggsjá_.

Next, let's model all of Menota along with Stjórn. Perhaps we'll leave Unger's _Homily Book_ in alongside the Menota edition as a proof of method for now.

In [14]:
vectorizer = TfidfVectorizer(min_df=1, max_df=0.6)
corpus = []
titles = []
for k,v in stjorn.items():
    titles.append(k)
    corpus.append(v)
titles.append('nhb')
corpus.append(nhb)
for k,v in menota.items():
    titles.append(k)
    corpus.append(v)
model = vectorizer.fit_transform(corpus)
df = pd.DataFrame(cosine_similarity(model), titles, titles)
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
df.sort_values(by=['stjorn3'], ascending=False)

Unnamed: 0,stjorn1,stjorn2,stjorn3,stjorn4,nhb,nraNorrFragm75_kross_saga,am132_egils_saga,am162btheta_njals_saga,nraNorrFragm64_barlaams_saga,nraNorrFragm81A_benedikts_regla,am1056IX_konungs_skuggsja_fragment,am78_kristinrettir,am63_heimskringla3,dg4-7_strengleikar,am132_droplaugasona_saga,am132_kormaks_saga,nraNorrFragm72x76_dialogar,nraNorrFragm53_haralds_saga_hardrada,am132_finnboga_saga,nraNorrFragm70_agotu_saga,nraNorrFragm62_karlamagnuss_saga,nraNorrFragm60A_stjorn,am132_fostbraedra_saga,lbsFragm82_olafs_saga_helga,nraNorrFragm58B_konungs_skuggsja,nraNorrFragm60C_stjorn,holmPerg30_landslog,am619_norwegian_homily_book,nraNorrFragm57_jons_saga_helga,nraNorrFragm69_nikulass_saga,am56_landslog,wolfAug9-10_egils_saga,nraNorrFragm66_thomass_saga,holmPerg17_thomass_saga,am383I_thorlaks_saga,holmPerg4_thidreks_saga,am132_njals_saga,am36_heimskringla2,am544_voluspa,am162bkappa_njals_saga,am305_landslog,nraNorrFragm58C_konungs_skuggsja,am132_olkofra_thattr,konungs_skuggsja_am243ba,nraNorrFragm54_sverris_saga,nraNorrFragm55B_hakonar_saga,nraNorrFragm79_mariu_saga,gks2365_voluspa,am243balpha_konungs_skuggsja,nraNorrFragm51_fagrskinna,am132_viga-glums_saga,am279a_gragas,am677_gregory,am132_laxdoela_saga,am302_landslog,am178_thidreks_saga,nraNorrFragm81B_benedikts_regla,am132_bandamanna_saga,nraNorrFragm71_gregors_saga_pafa,am655_laeknisbok,am519a_alexanders_saga,holmPerg34_landslog,am162balpha_njals_saga,am113_islendingabok,nraNorrFragm7_landslog,nraNorrFragm67_thomass_saga,nraNorrFragm56_thorgils_saga,nks235g_konungs_skuggsja,am132_hallfredar_saga,am35_heimskringla1,am242_codex_wormianus,nraNorrFragm78_mariu_saga,konungs_skuggsja_fragment_am1056xi,dg8II_olafs_saga,nraNorrFragm80_pals_saga,nraNorrFragm63_karlamagnuss_saga,nraNorrFragm77_dialogar,am28_codex_runicus,holmPerg34_boejarlog,dg8I_landslog,nraNorrFragm60B_stjorn,nraNorrFragm55A_hakonar_saga,skbA120_marys_complaint,nraNorrFragm59_rimbegla,nraNorrFragm65_floress_saga,nraNorrFragm52_olafs_saga_helga_hin_elzta,holmPerg6_barlaams_saga,nraNorrFragm68_brendanuss_saga,nraNorrFragm61_karlamagnuss_saga,nraNorrFragm58A_konungs_skuggsja
stjorn3,0.312301,0.274925,1.0,0.335113,0.255194,0.029514,0.233396,0.096043,0.110605,0.035001,0.017462,0.110909,0.111568,0.220797,0.131979,0.077022,0.056118,0.111631,0.092018,0.030573,0.194054,0.04732,0.130464,0.197158,0.173695,0.233327,0.091197,0.250506,0.125361,0.069946,0.112693,0.270733,0.137239,0.179653,0.04831,0.284971,0.173084,0.1272,0.068341,0.059824,0.042693,0.039952,0.103513,0.164425,0.180277,0.100596,0.11216,0.038482,0.164425,0.039807,0.13465,0.04388,0.067121,0.184134,0.044956,0.030339,0.109532,0.097506,0.077531,0.027792,0.172252,0.059283,0.056386,0.026999,0.141413,0.180012,0.094936,0.065172,0.135512,0.123365,0.237395,0.109988,0.017462,0.169611,0.107091,0.176795,0.112791,0.028241,0.0413,0.2006,0.158172,0.189536,0.010052,0.032681,0.032716,0.125467,0.141543,0.078644,0.126338,0.077348
stjorn4,0.334208,0.32063,0.335113,1.0,0.094347,0.012554,0.122759,0.045897,0.035702,0.015217,0.011482,0.04099,0.052128,0.080421,0.089178,0.045586,0.023553,0.034345,0.053268,0.011128,0.097666,0.050217,0.084528,0.053657,0.043357,0.051158,0.033748,0.089026,0.050085,0.037556,0.041142,0.139042,0.034169,0.103573,0.02196,0.088559,0.117359,0.057418,0.037778,0.033918,0.016733,0.015181,0.074796,0.054163,0.100377,0.027084,0.028593,0.019015,0.054163,0.015462,0.085139,0.013238,0.026303,0.102419,0.016362,0.012874,0.032252,0.055506,0.031931,0.009776,0.050662,0.034566,0.025938,0.011914,0.036866,0.064411,0.031581,0.022767,0.075785,0.058358,0.093268,0.040878,0.011482,0.054284,0.054259,0.047925,0.05048,0.005588,0.022548,0.04884,0.016721,0.078266,0.002868,0.038535,0.012178,0.03237,0.059039,0.039102,0.036768,0.027549
stjorn1,1.0,0.312637,0.312301,0.334208,0.093017,0.01981,0.103857,0.06035,0.043895,0.020408,0.039041,0.027957,0.075876,0.102881,0.067388,0.042794,0.045419,0.036157,0.057299,0.023214,0.11084,0.190631,0.070986,0.04389,0.099508,0.071635,0.024864,0.088199,0.073693,0.066345,0.02867,0.122039,0.134836,0.366871,0.038182,0.095045,0.089132,0.081529,0.073644,0.047524,0.024749,0.033503,0.049808,0.052551,0.114779,0.039044,0.074494,0.03446,0.052551,0.030421,0.079936,0.016734,0.043987,0.102553,0.02253,0.016249,0.021166,0.057351,0.044073,0.037017,0.079276,0.068363,0.047178,0.018276,0.032478,0.11689,0.060996,0.04901,0.051809,0.081893,0.160517,0.076868,0.039041,0.038471,0.100841,0.077518,0.072364,0.023305,0.047021,0.035658,0.03746,0.065061,0.010688,0.066791,0.028202,0.047072,0.135757,0.047604,0.040471,0.062369
holmPerg4_thidreks_saga,0.095045,0.139487,0.284971,0.088559,0.197059,0.019265,0.182649,0.092433,0.090676,0.047065,0.027479,0.252822,0.210678,0.271562,0.063317,0.066179,0.049743,0.130257,0.067086,0.052702,0.181085,0.010751,0.091162,0.243372,0.062936,0.129734,0.209304,0.198096,0.098651,0.046583,0.248106,0.220022,0.116727,0.143637,0.074716,1.0,0.130645,0.226257,0.033856,0.054146,0.182378,0.143865,0.06341,0.181354,0.180572,0.139117,0.048346,0.042513,0.181354,0.060524,0.09292,0.044352,0.049078,0.138778,0.195627,0.05931,0.120011,0.088216,0.066316,0.042971,0.224149,0.166656,0.044907,0.023152,0.18515,0.107659,0.065443,0.110028,0.099943,0.211991,0.186284,0.062239,0.027479,0.349478,0.115261,0.185953,0.069071,0.075771,0.130057,0.243792,0.049771,0.173111,0.034259,0.040715,0.081981,0.187243,0.246954,0.058534,0.124447,0.145922
stjorn2,0.312637,1.0,0.274925,0.32063,0.082524,0.014909,0.12303,0.043435,0.032473,0.014399,0.008515,0.055243,0.070826,0.099942,0.10351,0.058179,0.03439,0.040495,0.066319,0.030147,0.109625,0.015486,0.112869,0.042657,0.078489,0.076285,0.046096,0.10327,0.063455,0.053592,0.061896,0.118705,0.061348,0.025991,0.053335,0.139487,0.165271,0.058737,0.053077,0.044254,0.059664,0.084416,0.096416,0.052066,0.13729,0.062824,0.077793,0.023284,0.052066,0.034786,0.114005,0.008076,0.053146,0.143239,0.056657,0.057182,0.022167,0.089265,0.023131,0.019718,0.071302,0.198458,0.071057,0.012148,0.053904,0.086384,0.02711,0.068128,0.097457,0.062064,0.085028,0.056543,0.008515,0.082602,0.149032,0.070498,0.069567,0.006441,0.175979,0.086185,0.048282,0.056002,0.004098,0.061531,0.055528,0.062275,0.171602,0.036269,0.033177,0.079843
wolfAug9-10_egils_saga,0.122039,0.118705,0.270733,0.139042,0.109224,0.027983,0.701568,0.099415,0.054676,0.021769,0.006739,0.058911,0.124513,0.125335,0.167667,0.092361,0.041586,0.148046,0.09359,0.014689,0.137519,0.019006,0.134006,0.258869,0.04682,0.11368,0.048361,0.111209,0.117214,0.068801,0.058766,1.0,0.073352,0.055426,0.041264,0.220022,0.186654,0.145685,0.039599,0.076352,0.019125,0.040371,0.108674,0.071679,0.19021,0.097758,0.034152,0.029246,0.071679,0.012469,0.140851,0.050333,0.035813,0.227352,0.019924,0.024504,0.046554,0.150983,0.049988,0.018178,0.120363,0.078246,0.070999,0.037709,0.051984,0.09312,0.251877,0.023537,0.143487,0.158756,0.18772,0.058822,0.006739,0.103827,0.074624,0.146293,0.075748,0.006162,0.054462,0.065333,0.04898,0.238339,0.003064,0.030986,0.020352,0.148846,0.071361,0.061405,0.110376,0.047609
nhb,0.093017,0.082524,0.255194,0.094347,1.0,0.020726,0.040396,0.027983,0.176499,0.070031,0.033817,0.280182,0.094331,0.245621,0.031657,0.021995,0.076815,0.015287,0.024157,0.071144,0.109798,0.032455,0.033858,0.043926,0.092393,0.036567,0.221763,0.926713,0.055084,0.03737,0.276289,0.109224,0.078763,0.260345,0.0311,0.197059,0.051967,0.109936,0.024949,0.022993,0.170628,0.072849,0.022827,0.19351,0.043525,0.019779,0.079268,0.030362,0.19351,0.050067,0.036676,0.057587,0.077996,0.04617,0.141147,0.016925,0.258953,0.029471,0.104897,0.024407,0.084312,0.12279,0.016956,0.025005,0.215861,0.165641,0.016436,0.075179,0.026091,0.102765,0.167586,0.125212,0.033817,0.400428,0.092587,0.045056,0.067551,0.02563,0.07949,0.260152,0.012463,0.037185,0.014717,0.02208,0.061177,0.061082,0.22759,0.108806,0.040832,0.070541
am619_norwegian_homily_book,0.088199,0.10327,0.250506,0.089026,0.926713,0.019254,0.0634,0.026549,0.179516,0.064712,0.031699,0.258642,0.063915,0.230041,0.04208,0.026915,0.092836,0.020871,0.030977,0.080686,0.107889,0.031009,0.062786,0.055398,0.11687,0.068096,0.20196,1.0,0.054943,0.03541,0.249519,0.111209,0.076259,0.245499,0.02873,0.198096,0.071838,0.080366,0.023722,0.021894,0.152792,0.096479,0.037061,0.177609,0.058832,0.039782,0.084401,0.029177,0.177609,0.048555,0.047203,0.05343,0.104116,0.082242,0.125778,0.020955,0.241075,0.050991,0.105555,0.023138,0.131014,0.117941,0.019929,0.022588,0.196907,0.161446,0.019816,0.09464,0.036072,0.073057,0.17474,0.12158,0.031699,0.350464,0.128253,0.052294,0.084426,0.023446,0.077047,0.23896,0.012054,0.045459,0.013469,0.02076,0.064458,0.065724,0.207085,0.105575,0.045022,0.129069
am242_codex_wormianus,0.160517,0.085028,0.237395,0.093268,0.167586,0.013735,0.187044,0.081064,0.075037,0.050981,0.016844,0.093993,0.104237,0.135142,0.121074,0.136515,0.066993,0.069542,0.104305,0.023147,0.196674,0.032547,0.120639,0.124571,0.08164,0.074182,0.080509,0.17474,0.079345,0.065514,0.099082,0.18772,0.079875,0.114926,0.041703,0.186284,0.160044,0.113426,0.158543,0.069567,0.038048,0.063774,0.095125,0.110334,0.109559,0.048024,0.088257,0.090127,0.110334,0.020562,0.142361,0.059344,0.062066,0.188328,0.0401,0.046117,0.081021,0.125744,0.112916,0.035593,0.171667,0.060726,0.073297,0.064846,0.069763,0.085615,0.077957,0.032951,0.116214,0.11584,1.0,0.07254,0.016844,0.137282,0.101528,0.126373,0.120282,0.015957,0.038733,0.103582,0.053178,0.094614,0.005619,0.084096,0.032766,0.105928,0.08166,0.089276,0.071926,0.107129
am132_egils_saga,0.103857,0.12303,0.233396,0.122759,0.040396,0.024592,1.0,0.110579,0.022509,0.013282,0.008138,0.025102,0.125854,0.121128,0.239308,0.151345,0.044885,0.168421,0.161711,0.014122,0.098976,0.017027,0.237511,0.27703,0.048504,0.115886,0.018802,0.0634,0.117009,0.063498,0.022302,0.701568,0.05663,0.070418,0.044286,0.182649,0.295717,0.145488,0.046146,0.08189,0.020357,0.045683,0.174288,0.038313,0.205988,0.099661,0.055875,0.027021,0.038313,0.04721,0.238129,0.022452,0.042147,0.367731,0.018526,0.026097,0.018208,0.239743,0.037249,0.019722,0.180126,0.043332,0.084137,0.058749,0.023238,0.090098,0.30413,0.031259,0.189321,0.173233,0.187044,0.054089,0.008138,0.063843,0.078625,0.15298,0.097958,0.006226,0.046216,0.023684,0.055727,0.224533,0.002729,0.019334,0.020801,0.138635,0.075242,0.065135,0.104949,0.065544


The score for the two editions of the _Norwegian Homily Book_ may serve as our proof of method: with any `max_df` setting, and whether or not we normalize &lt;þ&gt; and &lt;ð&gt; to &lt;d&gt;, these come to a similarity in the range 0.93~0.97. As this compares an edition of Unger's with a Menota transcription, as does our comparison of _Stjórn_ with the remainder of the Menota corpus, we may be confident that the scores give a fair indication of lexical similarity to the extent the TF-IDF measure can provide one.

Let's take the constituent parts of _Stjórn_ one at a time and rank the Menota material by similarity. At this point, having demonstrated the validity of our method at least for editions of the same manuscript, we may remove Unger's _Homily Book_ from the data set:

In [10]:
nhb_index = titles.index('nhb')
corpus.pop(nhb_index)
titles.pop(nhb_index)
model = vectorizer.fit_transform(corpus)
df = pd.DataFrame(cosine_similarity(model), titles, titles)
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

def rank(subject):
    sorted_df = df.sort_values([subject], ascending=False)
    print(sorted_df[subject][:10].to_string()) # Limiting output for brevity

In [15]:
for i in work_indices.keys():
    print(f"{i}:")
    print("--------------------------------------------")
    rank(i)
    print('')

stjorn1:
--------------------------------------------
stjorn1                        1.000000
holmPerg17_thomass_saga        0.366871
stjorn4                        0.334208
stjorn2                        0.312637
stjorn3                        0.312301
nraNorrFragm60A_stjorn         0.190631
am242_codex_wormianus          0.160517
holmPerg6_barlaams_saga        0.135757
nraNorrFragm66_thomass_saga    0.134836
wolfAug9-10_egils_saga         0.122039

stjorn2:
--------------------------------------------
stjorn2                     1.000000
stjorn4                     0.320630
stjorn1                     0.312637
stjorn3                     0.274925
holmPerg34_landslog         0.198458
holmPerg34_boejarlog        0.175979
holmPerg6_barlaams_saga     0.171602
am132_njals_saga            0.165271
nraNorrFragm80_pals_saga    0.149032
am132_laxdoela_saga         0.143239

stjorn3:
--------------------------------------------
stjorn3                        1.000000
stjorn4                   

These are complex results. The redaction of _Thómass saga erkibyskups_ transmitted in Holm. Perg. 17 was "uden Tvivl tilbleven i Norge" as judged by its language and syntax, but probably later in the second half of the 13th century (Unger iii); "Oversættelsens Stil minder undertiden om Kongespeilet" (ibid.). The fact of its Norwegian origin alone does not explain an affinity with _Stjórn I_ against _Stjórn III_ (both of whose Norwegian ancestry is thought to be at two generations' remove at least), while the date of _Thomass saga_ would rather associate it with the latter. As for the similarity of style between _Thómass saga_ and _Konungs skuggsjá_ as observed by Unger, he cannot have meant a matter of style reflected in the cosine similarity of their TF-IDF within this corpus, as that is decidedly low:

In [21]:

df['holmPerg17_thomass_saga'].loc['nks235g_konungs_skuggsja']

0.045528920560418525

Since TF-IDF is a reflection of unusual word forms, our next step is to investigate which forms are conspicuously associated with the constituent parts of _Stjórn_. The outcome of this query is even more greatly affected by our `max_df` setting, and this is the test that prompted the equation between the characters &lt;þðd&gt; in preprocessing above, as Unger edited _Stjórn I_ from AM 226, which tends to use &lt;d&gt; for &lt;ð&gt;. This, as well as other normalization strategies in this notebook and others it depends on, should be kept in mind when consulting the below rankings of the most striking forms in each constituent part of _Stjórn_. It should also be remembered that commonly spelled function words have been eliminated using the `max_df` setting:

In [12]:
scores = pd.DataFrame(model.toarray(), titles, vectorizer.get_feature_names_out())
for i in work_indices.keys():
    print(f"{i}:")
    print("------------------------")
    print(scores.loc[i].sort_values(ascending=False)[:20])
    print('')

stjorn1:
------------------------
medr          0.547546
aa            0.371156
edr           0.248992
di            0.206674
diat          0.197833
eptir         0.146892
man           0.106545
ders          0.102619
scylld        0.101959
deirra        0.100019
tima          0.096744
uaru          0.093733
scolastica    0.091207
sealfs        0.084231
ioseph        0.083724
hefir         0.082703
taladi        0.078276
iacob         0.074836
sagdi         0.073801
eru           0.073116
Name: stjorn1, dtype: float64

stjorn2:
------------------------
aa          0.452333
dier        0.303516
moyses      0.263091
firir       0.230147
uit         0.225063
suo         0.206921
mællti      0.195620
scaltu      0.171302
aaron       0.166038
yfir        0.158607
drottinn    0.153031
cyni        0.118664
brutt       0.114661
balaam      0.107145
moysen      0.103471
munu        0.100776
deirra      0.092715
duiat       0.092427
drottins    0.084714
sculu       0.079317
Name: stjorn2, dtype:

_Stjórn I_ stood out for its form "medr" *before* the conflation of &lt;þðd&gt;, but it continues to be its most remarkable form after, occurring as it does in 16 out of 85 Menota items. What is also striking is that the exact form "medr" so frequently found in AM 226 occurs three times in the Holm. Perg. 17 manuscript of _Thómass saga erkibyskups_, while the AM 132 text of _Laxdœla saga_ and the Norr. Fragm. 58B fragment of _Konungs skuggsjá_ have it once each. "Medr" is thus not only a form that sets (the AM 226 text of) _Stjórn I_ apart, it is also one of the forms associating it with _Thómass saga_.