# TF-IDF

We'll rely on term frequency times inverted document frequency ([TF-IDF](https://web.stanford.edu/~jurafsky/slp3/11.pdf)), a powerful implementation of the [bag-of-words model](https://web.stanford.edu/~jurafsky/slp3/B.pdf), to measure meaningful similarity between documents while disregarding word order. Let's start by generating a matrix for the separate constituent parts of _Stjórn_.

In [None]:
import os,glob,json
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
def normalize(target):
    # This dict limits orthographical variation beyond the rule sets
    # of stjorn-extract.ipynb and menota-extract.ipynb:
    matrix = {
        'j': 'i',
        'v': 'u',
        # Experiment with *either* normalizing ð to þ,
        # or else both to d (because AM 226 often uses d for ð):
        'ð': 'þ',
        #'ð': 'd',
        #'þ': 'd',
        'á': 'a',
        'ǽ': 'æ',
        'ę': 'æ',
        'é': 'e',
        'í': 'i',
        'ó': 'o',
        'ú': 'u',
        'ý': 'y',
        'ǿ': 'ø',
        'k': 'c', # rather than vice versa, because of Latin (e.g. Lucifer)
        '[': '',
        ']': ''
        }
    for k,v in matrix.items():
        target = target.replace(k, v)
    return target

titles = ['prologue', 'introduction', 'gn', 'ex', 'lv', 'nm', 'dt', 'ios', 'idc', 'rt', '1sm', '2sm', '3rg', '4rg']
tokens = []
for title in titles:
    with open(f"nlp/{title}.txt") as raw:
        document = raw.read().replace('\n', ' ')
        tokens.extend(document.split())

work_indices = {
    'stjorn1': (650,124417),
    'stjorn2': (124417,147678),
    'stjorn3': (147678,156943,160719),
    'stjorn4': (156943,160719)
}

stjorn = dict()
for _work, _range in work_indices.items():
    if len(_range) == 2:
        stjorn[_work] = normalize(' '.join(tokens[_range[0]:_range[1]]))
    else:
        stjorn[_work] = normalize(' '.join(tokens[_range[0]:_range[1]] + tokens[_range[2]:]))

menota = dict()
for text in glob.glob('../menota/dipl/*txt'):
    ref = os.path.basename(text).replace('.txt', '')
    with open(text) as doc:
        # We'll subject Menota to the same normalization standard as Stjórn:
        menota[ref] = normalize(doc.read().replace('\n', ''))

Note the arguments passed to the vectorizer class below. `min_df` sets a cutoff for the minimum number of documents in which a term has to appear in order to be included in the model. Since terms exclusive to single compositions are among the things that interest us, we'll leave this at `1`. `max_df` sets a cutoff point above which relative document frequency a term is ignored: in other words, a value of `0.8` ignores words occurring in over 80 percent of documents. Changing this value massively changes the document similarity scores downstream, and drastic changes in the setting have a pronounced effect on the document similarity rankings as well. A strict threshold is at any rate required to gain an insight into the relevance of individual terms, as leaving `max_df` at its default of `1.0` would lead the model to conclude that "oc" is the most meaningful term in many of our documents, while "æigi" is the top-ranking term for _Stjórn III_ as well as the _Norwegian Homily Book_ with a value as low as `0.4`. We can therefore leave the score high in the _Stjórn_-internal comparison, but may want to set a lower threshold for the larger corpus.

In [None]:
vectorizer = TfidfVectorizer(min_df=1, max_df=0.8)
model = vectorizer.fit_transform(stjorn.values())
df = pd.DataFrame(cosine_similarity(model), stjorn.keys(), stjorn.keys())
df

Unnamed: 0,stjorn1,stjorn2,stjorn3,stjorn4
stjorn1,1.0,0.063975,0.510216,0.405235
stjorn2,0.063975,1.0,0.054947,0.057555
stjorn3,0.510216,0.054947,1.0,0.334973
stjorn4,0.405235,0.057555,0.334973,1.0


After eliminating such variation as vowel length marks and the þ/ð distinction, these are now all pretty similar to one another, with the biggest difference between _Stjórn II_ and _III_.

Now let's first add _Konungs skuggsjá_ from Menota, as well as Unger's own edition of the _Norwegian Homily Book_. Fingers crossed that we have got the normalization standard of the former to approach Unger's methods reasonably well.

In [4]:
# We want only those parts of Unger's NHB matched in Menota:
nhb_titles = ['alcuin', 'hom', 'olafr', 'visio', 'paternoster', 'anhang1']
nhb = ''
for title in nhb_titles:
    filepath = f'../nhb/nlp/{title}.txt'
    with open(filepath) as doc:
        nhb = nhb + normalize(doc.read().replace('\n', ''))
stjorn_plus = []
for v in stjorn.values():
    stjorn_plus.append(v)
stjorn_plus.extend([menota['nks235g_konungs_skuggsja'], nhb])
model = vectorizer.fit_transform(stjorn_plus)
df = pd.DataFrame(cosine_similarity(model), list(stjorn.keys()) + ['ks', 'nhb'], list(stjorn.keys()) + ['ks', 'nhb'])
df

Unnamed: 0,stjorn1,stjorn2,stjorn3,stjorn4,ks,nhb
stjorn1,1.0,0.308887,0.211326,0.370354,0.01092,0.045486
stjorn2,0.308887,1.0,0.145243,0.358827,0.067233,0.022487
stjorn3,0.211326,0.145243,1.0,0.209011,0.219577,0.414135
stjorn4,0.370354,0.358827,0.209011,1.0,0.000887,0.056356
ks,0.01092,0.067233,0.219577,0.000887,1.0,0.22991
nhb,0.045486,0.022487,0.414135,0.056356,0.22991,1.0


_Stjórn III_ and _Konungs skuggsjá_ share material cognate within the vernacular, but not so much, or with insufficient spelling agreement, to stand out in this matrix. In fact, the _Norwegian Homily Book_ has a higher match with _Stjórn III_ than _Konungs skuggsjá_ does, which may be explained at least in part by the closer subject match for those parts of _Stjórn III_ not reflected in _Konungs skuggsjá_.

Next, let's model all of Menota along with Stjórn. Perhaps we'll leave Unger's _Homily Book_ in alongside the Menota edition as a proof of method for now.

In [5]:
vectorizer = TfidfVectorizer(min_df=1, max_df=0.7)
corpus = []
titles = []
for k,v in stjorn.items():
    titles.append(k)
    corpus.append(v)
titles.append('nhb')
corpus.append(nhb)
for k,v in menota.items():
    titles.append(k)
    corpus.append(v)
model = vectorizer.fit_transform(corpus)
df = pd.DataFrame(cosine_similarity(model), titles, titles)
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
df.sort_values(by=['stjorn3'], ascending=False)

Unnamed: 0,stjorn1,stjorn2,stjorn3,stjorn4,nhb,nraNorrFragm75_kross_saga,am132_egils_saga,am162btheta_njals_saga,nraNorrFragm64_barlaams_saga,nraNorrFragm81A_benedikts_regla,am1056IX_konungs_skuggsja_fragment,am78_kristinrettir,am63_heimskringla3,dg4-7_strengleikar,am132_droplaugasona_saga,am132_kormaks_saga,nraNorrFragm72x76_dialogar,nraNorrFragm53_haralds_saga_hardrada,am132_finnboga_saga,nraNorrFragm70_agotu_saga,dg4-7_pamphilius_saga,nraNorrFragm62_karlamagnuss_saga,dg4-7_eliss_saga,nraNorrFragm60A_stjorn,am132_fostbraedra_saga,lbsFragm82_olafs_saga_helga,nraNorrFragm58B_konungs_skuggsja,nraNorrFragm60C_stjorn,holmPerg30_landslog,am619_norwegian_homily_book,nraNorrFragm57_jons_saga_helga,nraNorrFragm69_nikulass_saga,am56_landslog,wolfAug9-10_egils_saga,nraNorrFragm66_thomass_saga,holmPerg17_thomass_saga,am383I_thorlaks_saga,holmPerg4_thidreks_saga,am132_njals_saga,am36_heimskringla2,am544_voluspa,am162bkappa_njals_saga,am305_landslog,nraNorrFragm58C_konungs_skuggsja,am132_olkofra_thattr,nraNorrFragm54_sverris_saga,nraNorrFragm55B_hakonar_saga,nraNorrFragm79_mariu_saga,gks2365_voluspa,am243balpha_konungs_skuggsja,nraNorrFragm51_fagrskinna,am132_viga-glums_saga,am279a_gragas,am677_gregory,am132_laxdoela_saga,am302_landslog,am178_thidreks_saga,am113b_islendingabok,nraNorrFragm81B_benedikts_regla,am132_bandamanna_saga,nraNorrFragm71_gregors_saga_pafa,am655_laeknisbok,am519a_alexanders_saga,holmPerg34_landslog,am162balpha_njals_saga,nraNorrFragm7_landslog,nraNorrFragm67_thomass_saga,nraNorrFragm56_thorgils_saga,nks235g_konungs_skuggsja,am132_hallfredar_saga,am35_heimskringla1,am242_codex_wormianus,meII2_frostathingslog,nraNorrFragm78_mariu_saga,dg8II_olafs_saga,nraNorrFragm80_pals_saga,nraNorrFragm63_karlamagnuss_saga,nraNorrFragm77_dialogar,am28_codex_runicus,holmPerg34_boejarlog,dg8I_landslog,nraNorrFragm60B_stjorn,nraNorrFragm55A_hakonar_saga,skbA120_marys_complaint,nraNorrFragm59_rimbegla,nraNorrFragm65_floress_saga,nraNorrFragm52_olafs_saga_helga_hin_elzta,holmPerg6_barlaams_saga,nraNorrFragm68_brendanuss_saga,nraNorrFragm61_karlamagnuss_saga,nraNorrFragm58A_konungs_skuggsja
stjorn3,0.311182,0.219514,1.0,0.285717,0.308298,0.051853,0.315115,0.150851,0.131202,0.044639,0.019026,0.109576,0.1795,0.284217,0.187201,0.12104,0.103604,0.164539,0.150069,0.051606,0.225499,0.22355,0.253942,0.058605,0.199703,0.202287,0.184094,0.234922,0.096487,0.302921,0.14935,0.135471,0.110962,0.32967,0.169093,0.207535,0.051131,0.308221,0.250663,0.200322,0.088958,0.085066,0.049603,0.054939,0.149192,0.208857,0.108692,0.155278,0.058133,0.174281,0.043339,0.221837,0.061055,0.0834,0.264872,0.051185,0.038565,0.047312,0.108638,0.158565,0.087231,0.04909,0.189452,0.078972,0.070448,0.144056,0.250753,0.148263,0.082328,0.192172,0.188593,0.301264,0.122493,0.187897,0.178733,0.18136,0.187684,0.191177,0.028564,0.054698,0.134532,0.152632,0.22713,0.022186,0.046605,0.074384,0.140658,0.18692,0.095261,0.147208,0.096194
wolfAug9-10_egils_saga,0.159822,0.130853,0.32967,0.145503,0.15012,0.037293,0.726285,0.13457,0.066862,0.029944,0.00851,0.063357,0.182126,0.18703,0.213219,0.12585,0.087182,0.199553,0.143565,0.021417,0.119238,0.163334,0.138182,0.019941,0.183264,0.266295,0.069,0.131074,0.061714,0.151433,0.149808,0.138662,0.064008,1.0,0.107268,0.090672,0.048763,0.247308,0.24296,0.203925,0.048811,0.092972,0.02796,0.053269,0.14689,0.219993,0.106511,0.068655,0.044444,0.080408,0.016369,0.211566,0.063629,0.049593,0.288904,0.028131,0.033262,0.059261,0.051585,0.192278,0.064463,0.039393,0.136532,0.093356,0.082122,0.068482,0.15161,0.295589,0.039866,0.191488,0.211447,0.243896,0.07444,0.134015,0.116395,0.118662,0.156265,0.145169,0.009016,0.067209,0.070302,0.056526,0.270869,0.008079,0.042484,0.073365,0.166711,0.093493,0.068212,0.130578,0.059732
am132_egils_saga,0.151675,0.125608,0.315115,0.12629,0.095964,0.036256,1.0,0.161351,0.047554,0.023609,0.009822,0.033274,0.202871,0.192585,0.289021,0.190429,0.108581,0.236492,0.221019,0.022717,0.07424,0.14,0.115386,0.020307,0.292828,0.286833,0.075207,0.139439,0.036906,0.112741,0.158881,0.15153,0.031598,0.726285,0.09802,0.106515,0.053131,0.222085,0.365262,0.225057,0.054003,0.105318,0.02912,0.057297,0.218278,0.232477,0.107496,0.097174,0.044206,0.053325,0.048923,0.32471,0.039006,0.060135,0.436712,0.027152,0.035255,0.08439,0.027113,0.294376,0.061948,0.049096,0.194727,0.062303,0.096535,0.044345,0.159131,0.347901,0.046571,0.248027,0.242742,0.250478,0.034374,0.141235,0.081347,0.133781,0.1644,0.175665,0.009335,0.060607,0.034141,0.062803,0.263904,0.008342,0.032119,0.076232,0.165721,0.097713,0.071256,0.129635,0.08359
stjorn1,1.0,0.330322,0.311182,0.39797,0.125234,0.04108,0.151675,0.11579,0.051587,0.028672,0.012156,0.029382,0.113498,0.164954,0.102046,0.071954,0.056286,0.055043,0.090715,0.030298,0.080533,0.117172,0.085076,0.055734,0.127785,0.04759,0.199816,0.069529,0.032223,0.117916,0.100514,0.124722,0.030278,0.159822,0.1439,0.176305,0.029812,0.107531,0.143834,0.124956,0.067064,0.102463,0.025253,0.039614,0.078287,0.16401,0.063582,0.095345,0.050146,0.048951,0.03117,0.13549,0.033995,0.053716,0.149723,0.023642,0.029906,0.023318,0.028706,0.097559,0.040024,0.046198,0.085568,0.071339,0.096002,0.037342,0.144402,0.09417,0.050552,0.090294,0.116796,0.20848,0.028501,0.117743,0.045973,0.156907,0.159889,0.131305,0.020508,0.048133,0.06024,0.050251,0.095897,0.014226,0.077269,0.077926,0.045205,0.124952,0.044685,0.048212,0.067333
nhb,0.125234,0.065408,0.308298,0.091437,1.0,0.043273,0.095964,0.064876,0.185338,0.07801,0.033625,0.267624,0.136238,0.273668,0.069483,0.05337,0.090402,0.032185,0.060604,0.086495,0.270342,0.130372,0.260075,0.045735,0.088309,0.055261,0.103873,0.045044,0.209814,0.934206,0.075651,0.069451,0.26389,0.15012,0.094497,0.258775,0.036564,0.211421,0.107104,0.156607,0.044482,0.041497,0.167351,0.085836,0.058406,0.071912,0.028553,0.113866,0.053205,0.200831,0.052925,0.090539,0.071788,0.086393,0.102459,0.141467,0.01962,0.031438,0.244264,0.06652,0.106228,0.031154,0.108051,0.138667,0.033403,0.21267,0.223741,0.051655,0.092165,0.065319,0.141566,0.220362,0.228597,0.169557,0.394627,0.157718,0.054054,0.130899,0.025451,0.088192,0.23772,0.017999,0.070443,0.018722,0.036111,0.066797,0.067793,0.269056,0.122153,0.059666,0.090676
holmPerg4_thidreks_saga,0.107531,0.138206,0.308221,0.090999,0.211421,0.048422,0.222085,0.115175,0.107766,0.050654,0.029532,0.25829,0.232864,0.310338,0.100424,0.093397,0.072051,0.157768,0.105584,0.067807,0.232557,0.202036,0.25921,0.016775,0.127022,0.254227,0.064718,0.13811,0.215243,0.214263,0.115316,0.08362,0.253928,0.247308,0.142845,0.163921,0.081347,1.0,0.180143,0.249903,0.054986,0.073115,0.190567,0.154248,0.089567,0.193884,0.140546,0.086859,0.069754,0.189523,0.064142,0.14556,0.046184,0.063691,0.192502,0.20422,0.061239,0.034969,0.117656,0.128567,0.072652,0.061311,0.254419,0.189158,0.058178,0.197627,0.13607,0.089301,0.123886,0.133485,0.238257,0.214672,0.174131,0.096118,0.358743,0.126426,0.196422,0.093524,0.080103,0.147842,0.237901,0.05744,0.184289,0.042746,0.047917,0.103936,0.198496,0.265207,0.070142,0.153347,0.171
am619_norwegian_homily_book,0.117916,0.083204,0.302921,0.087553,0.934206,0.041286,0.112741,0.0608,0.190337,0.073653,0.031691,0.251926,0.105445,0.258445,0.076639,0.056043,0.103549,0.03599,0.064614,0.095067,0.259156,0.130009,0.250248,0.043368,0.111194,0.065124,0.122898,0.073202,0.195379,1.0,0.073824,0.064886,0.243707,0.151433,0.09092,0.245771,0.033845,0.214263,0.120254,0.126203,0.041797,0.038672,0.151611,0.107144,0.069663,0.08369,0.045901,0.115861,0.050377,0.189248,0.051631,0.095972,0.06816,0.109705,0.130627,0.128015,0.023347,0.028388,0.231393,0.083101,0.108031,0.029081,0.149481,0.133339,0.033848,0.19782,0.214887,0.052467,0.109295,0.071909,0.111486,0.226206,0.215477,0.162554,0.353188,0.185155,0.058877,0.142393,0.023263,0.085364,0.222806,0.017352,0.076258,0.017162,0.033717,0.068065,0.071136,0.247249,0.119098,0.062068,0.143907
am242_codex_wormianus,0.20848,0.10357,0.301264,0.114735,0.220362,0.045486,0.250478,0.119508,0.091128,0.053067,0.017097,0.095766,0.154781,0.219038,0.168093,0.173664,0.09327,0.098956,0.149946,0.036047,0.186286,0.219308,0.150624,0.040555,0.179807,0.134773,0.106384,0.084608,0.085936,0.226206,0.098179,0.100548,0.100277,0.243896,0.103501,0.138354,0.04764,0.214672,0.214862,0.168892,0.177334,0.078006,0.045961,0.075821,0.128983,0.144756,0.056069,0.131347,0.117484,0.116775,0.024514,0.203724,0.073959,0.073112,0.251392,0.047133,0.046999,0.073398,0.084475,0.163254,0.11429,0.052665,0.192625,0.079938,0.080577,0.082215,0.1498,0.120167,0.051176,0.163381,0.165093,1.0,0.112789,0.131992,0.149431,0.155919,0.135014,0.193588,0.017528,0.051325,0.100284,0.056913,0.132889,0.009807,0.094005,0.053786,0.103332,0.111393,0.095949,0.090116,0.128794
stjorn4,0.39797,0.35968,0.285717,1.0,0.091437,0.026073,0.12629,0.069654,0.033347,0.018091,0.002659,0.039825,0.060474,0.105301,0.097264,0.054846,0.030313,0.04239,0.064637,0.012127,0.068604,0.087363,0.078234,0.006916,0.097623,0.050682,0.133495,0.043463,0.041063,0.087553,0.062337,0.079624,0.039225,0.145503,0.04634,0.043802,0.014164,0.090999,0.121952,0.068102,0.03781,0.109717,0.016891,0.018224,0.07591,0.126888,0.048355,0.040315,0.031,0.047454,0.015366,0.099343,0.01895,0.042995,0.113727,0.016701,0.029193,0.017471,0.031242,0.066127,0.03063,0.016582,0.053618,0.035555,0.105909,0.034422,0.071053,0.050772,0.023368,0.084626,0.066973,0.114735,0.046116,0.060416,0.051086,0.071441,0.140061,0.067578,0.00499,0.024705,0.079049,0.026705,0.080476,0.004862,0.050239,0.059558,0.031305,0.051765,0.030166,0.03846,0.032618
dg4-7_strengleikar,0.164954,0.10089,0.284217,0.105301,0.273668,0.152778,0.192585,0.106022,0.101753,0.035597,0.027516,0.13379,0.135469,1.0,0.162913,0.11787,0.075433,0.096634,0.136311,0.085506,0.274068,0.14471,0.34163,0.011437,0.162268,0.094166,0.098124,0.094178,0.128842,0.258445,0.09656,0.119349,0.142666,0.18703,0.13478,0.211395,0.049804,0.310338,0.206315,0.155082,0.109916,0.077956,0.111817,0.110368,0.101632,0.110157,0.072879,0.186938,0.13442,0.165284,0.100044,0.173689,0.033764,0.066533,0.22817,0.100152,0.042648,0.029204,0.100213,0.113879,0.055748,0.054158,0.214249,0.116122,0.070334,0.121572,0.146429,0.08986,0.093465,0.139364,0.163931,0.219038,0.106659,0.145448,0.300731,0.092338,0.137456,0.111309,0.025877,0.075927,0.12021,0.045021,0.103004,0.027593,0.023871,0.131199,0.091194,0.300945,0.086818,0.102398,0.138036


The score for the two editions of the _Norwegian Homily Book_ may serve as our proof of method: with any `max_df` setting, and whether or not we normalize &lt;þ&gt; and &lt;ð&gt; to &lt;d&gt;, these come to a similarity in the range 0.93~0.97. As this compares an edition of Unger's with a Menota transcription, as does our comparison of _Stjórn_ with the remainder of the Menota corpus, we may be confident that the scores give a fair indication of lexical similarity to the extent the TF-IDF measure can provide one.

Let's take the constituent parts of _Stjórn_ one at a time and rank the Menota material by similarity. At this point, having demonstrated the validity of our method at least for editions of the same manuscript, we may remove Unger's _Homily Book_ from the data set:

In [6]:
nhb_index = titles.index('nhb')
corpus.pop(nhb_index)
titles.pop(nhb_index)
model = vectorizer.fit_transform(corpus)
df = pd.DataFrame(cosine_similarity(model), titles, titles)
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

def rank(subject):
    sorted_df = df.sort_values([subject], ascending=False)
    print(sorted_df[subject][:13].to_string()) # Limiting output for brevity

In [7]:
for i in work_indices.keys():
    print(f"{i}:")
    print("--------------------------------------------")
    rank(i)
    print('')

stjorn1:
--------------------------------------------
stjorn1                             1.000000
stjorn4                             0.398167
stjorn2                             0.329877
stjorn3                             0.311565
am242_codex_wormianus               0.208877
nraNorrFragm58B_konungs_skuggsja    0.199956
holmPerg17_thomass_saga             0.175020
dg4-7_strengleikar                  0.165271
nraNorrFragm54_sverris_saga         0.164292
wolfAug9-10_egils_saga              0.160967
nraNorrFragm63_karlamagnuss_saga    0.160011
nraNorrFragm80_pals_saga            0.157136
am132_egils_saga                    0.152953

stjorn2:
--------------------------------------------
stjorn2                             1.000000
stjorn4                             0.359272
stjorn1                             0.329877
stjorn3                             0.219284
nraNorrFragm58B_konungs_skuggsja    0.187332
holmPerg34_landslog                 0.175640
am132_njals_saga                    

These are complex results. The redaction of _Thómass saga erkibyskups_ transmitted in Holm. Perg. 17 was "uden Tvivl tilbleven i Norge" as judged by its language and syntax, but probably later in the second half of the 13th century (Unger iii); "Oversættelsens Stil minder undertiden om Kongespeilet" (ibid.). The fact of its Norwegian origin alone does not explain an affinity with _Stjórn I_ against _Stjórn III_ (both of whose Norwegian ancestry is thought to be at two generations' remove at least), while the date of _Thomass saga_ would rather associate it with the latter. As for the similarity of style between _Thómass saga_ and _Konungs skuggsjá_ as observed by Unger, he cannot have meant a matter of style reflected in the cosine similarity of their TF-IDF within this corpus, as that is decidedly low:

In [8]:

df['holmPerg17_thomass_saga'].loc['nks235g_konungs_skuggsja']

0.058561162086225624

Since TF-IDF is a reflection of unusual word forms, our next step is to investigate which forms are conspicuously associated with the constituent parts of _Stjórn_. The outcome of this query is even more greatly affected by our `max_df` setting, and this is the test that prompted the equation between the characters &lt;þðd&gt; in preprocessing above, as Unger edited _Stjórn I_ from AM 226, which tends to use &lt;d&gt; for &lt;ð&gt;. This, as well as other normalization strategies in this notebook and others it depends on, should be kept in mind when consulting the below rankings of the most striking forms in each constituent part of _Stjórn_. It should also be remembered that commonly spelled function words have been eliminated using the `max_df` setting:

In [9]:
scores = pd.DataFrame(model.toarray(), titles, vectorizer.get_feature_names_out())
for i in work_indices.keys():
    print(f"{i}:")
    print("------------------------")
    print(scores.loc[i].sort_values(ascending=False)[:20])
    print('')

stjorn1:
------------------------
medr      0.531886
aa        0.307348
edr       0.232817
fyrir     0.225208
þi        0.187701
gud       0.178210
þiat      0.171045
eptir     0.127508
hafdi     0.107135
honum     0.099942
meþr      0.099569
man       0.091378
scylld    0.090109
þers      0.088724
guds      0.088033
þeirra    0.087901
tima      0.083897
taladi    0.083809
sagdi     0.082886
uaru      0.082568
Name: stjorn1, dtype: float64

stjorn2:
------------------------
aa          0.384853
þier        0.269518
gud         0.252151
moyses      0.240537
med         0.215359
firir       0.199686
uit         0.197960
suo         0.183912
mun         0.182671
mællti      0.168344
scaltu      0.154965
aaron       0.153237
yfir        0.139124
drottinn    0.136138
eda         0.107792
cyni        0.107689
brutt       0.101818
balaam      0.095157
moysen      0.094956
ydr         0.094455
Name: stjorn2, dtype: float64

stjorn3:
------------------------
æigi        0.264773
dauid       0.2

_Stjórn I_ stood out for its form "medr" *before* the conflation of &lt;þðd&gt;, but it continues to be its most remarkable form after, occurring as it does in 16 out of 85 Menota items. What is also striking is that the exact form "medr" so frequently found in AM 226 occurs three times in the Holm. Perg. 17 manuscript of _Thómass saga erkibyskups_, while the AM 132 text of _Laxdœla saga_ and the Norr. Fragm. 58B fragment of _Konungs skuggsjá_ have it once each. "Medr" is thus not only a form that sets (the AM 226 text of) _Stjórn I_ apart, it is also one of the forms associating it with _Thómass saga_.