# Building a super-fast Latin dictionary
SymSpell is very fast but you have to provide it with a frequency list generated from a large corpus in the language you need. Compared to the many high-quality frequency lists for modern languages (see the [links on Wolf Garbe’s Github page](https://github.com/wolfgarbe/SymSpell#frequency-dictionaries-in-other-languages)), which are based on corpora with billions of words, the Latin lists are quite small and focussed on Antiquity.

Since we want to use SymSpell with late medieval texts, we have to generate our own frequency list.

## Extract words from a Latin corpus
Prerequisite: Download the "latin library" included in CLTK (Classical Language Toolkit) from https://github.com/cltk/lat_text_latin_library (click on the button `Code` and select `Download ZIP`, then unzip the file).


In [1]:
# Save the full path to the unzipped folder:
#corpus_path = "E:/Latin Corpus/lat_text_latin_library-master"              # Windows path
corpus_path = "/home/markus/cltk_data/latin/text/latin_text_latin_library/" # Linux path

In [2]:
import glob
from collections import Counter
import pickle
import re

In [3]:
# Iterate through all txt files in the corpus and collect
# the lines of text (except the first 16 and the last 8 lines
# of each file because they contain English metadata):
latin_corpus = []
file_counter = 0
for path in glob.iglob(f"{corpus_path}/**/*.txt", recursive=True):
    # Open every txt file and collect all lines of text
    # except the first 16 and the last 8 lines of each file. 
    with open(path, "r", encoding="utf-8") as f:
        file_counter += 1
        lines = f.readlines()
        for line in lines[16:-8]:
            latin_corpus.append(line)
print(f"Processed {file_counter} files and saved them in the latin_corpus variable.")

Processed 2141 files and saved them in the latin_corpus variable.


In [4]:
# Clean and normalize the lines and save them as a txt file.
# (THIS CAN TAKE SOME TIME depending on the size of your corpus!)
def normalize(text):
    normalization_patterns = {'v': 'u',
                             'V': 'U',
                             'j': 'i',
                             'J': 'I',
                             'ë': 'e',
                             'æ': 'ae',
                             'Æ': 'ae',
                             'œ': 'oe',
                             'Œ': 'Oe'}
    for pattern, replacement in normalization_patterns.items():
        text = re.sub(pattern, replacement, text)
    return text

cleaned_corpus = []

for line in latin_corpus:
    # Extract a list of words without punctuation, Greek letters etc.
    words = re.findall(r'[a-zA-Z]+', line)
    for word in words:
        word = normalize(word.lower())
        cleaned_corpus.append(word)

In [5]:
# Count the unique words:
COUNTS = Counter(cleaned_corpus)
print("Total # of words: ", sum(COUNTS.values()))
print("# of unique words:", len(COUNTS))
tops = COUNTS.most_common(50)
print("The 50 most frequent words:")
for k, v in tops:
    print(f"- {k}: {v}")

Total # of words:  12829228
# of unique words: 408383
The 50 most frequent words:
- et: 423445
- in: 261233
- est: 159844
- non: 159224
- ad: 128567
- ut: 114154
- quod: 100594
- cum: 97511
- si: 90942
- qui: 88798
- de: 77016
- a: 72548
- sed: 71776
- quae: 62046
- ex: 57890
- quam: 54185
- per: 48669
- esse: 47938
- nec: 43930
- sunt: 42173
- hoc: 42086
- uel: 40507
- se: 40341
- enim: 40096
- aut: 39569
- autem: 38608
- ab: 38604
- etiam: 36559
- eius: 33401
- quid: 31204
- sit: 31143
- atque: 29387
- quo: 28738
- me: 26821
- te: 26511
- quia: 26451
- ac: 26137
- ne: 25753
- tamen: 25441
- id: 25214
- dig: 24838
- ita: 24296
- haec: 23494
- iam: 22922
- nam: 22428
- eo: 22173
- pro: 21703
- eum: 21661
- mihi: 20324
- uero: 20216


In [6]:
# Save the counts in a txt file (<term> <count>\n<term> <count>\n…)
with open("frequency_dictionary_la.txt", "w", encoding="utf-8") as f:
    for word in COUNTS:
        f.write(f"{word} {COUNTS[word]}\n")

## Check the collected words
Unfortunately, the files are quite messy and sometimes suffer from very strange spelling and/or numerous transcription errors so that it fails in correcting some simple mistakes. The dictionary thinks, for example, that "mar" and "tio" were valid Latin words, which is not the case! 

In [7]:
import os
import pickle
from symspellpy.symspellpy import SymSpell, Verbosity
from hunspell import Hunspell

In [8]:
sym_spell = SymSpell(3,7)

In [9]:
# Load SymSpell dictionary from the frequency file we created above:
# (format: <term> <count>\n<term> <count>\n…)
# THIS CAN TAKE SOME TIME depending on the size of your corpus!

dict_path = "frequency_dictionary_la.txt"
#dict_path = "U:/frequency_dictionary_la.txt"
sym_spell.load_dictionary(dict_path, 0, 1)

# Cf. https://symspellpy.readthedocs.io/en/latest/examples/dictionary.html

True

In [10]:
from pprint import pprint
for word in ["mar", "tio", "chriftus"]:
    print(word)
    pprint([str(s) for s in sym_spell.lookup(word, Verbosity.CLOSEST)])

mar
['mar, 0, 59']
tio
['tio, 0, 10']
chriftus
['christus, 1, 1693']


To get rid of the typos and erroneous words, we check every word with another dictionary. Hunspell provides a very fast algorithm and there is a very good [Latin dictionary by Karl Zeiler and Jean-Pierre Sutto](https://extensions.libreoffice.org/en/extensions/show/latin-spelling-and-hyphenation-dictionaries) that you can use with Hunspell. 

In [11]:
h = Hunspell('la_LA', hunspell_data_dir='./test_flask_app/my_app/dictionaries')

In [12]:
# Check every word in SymSpell with Hunspell.
# Collect and count the wrong words.
counter = {"correct": 0, "wrong": 0}
wrong_words = {}
for word, count in sym_spell._words.items():
    hunspell = h.spell(word)
    if hunspell: 
        counter['correct'] += 1
    else:
        counter['wrong'] += 1
        wrong_words[word] = count
    #print(word, count, hunspell)

In [13]:
# Let's see how many words were considered correct and wrong:
counter

{'correct': 283255, 'wrong': 125128}

In [14]:
# OK, Hunspell does not recognize Roman numbers ... and some other stuff:
wrong_words

{'onimo': 1,
 'romanos': 1214,
 'iii': 5146,
 'iiii': 1997,
 'u': 5677,
 'uii': 2137,
 'uiii': 2329,
 'uiiii': 502,
 'x': 2274,
 'xi': 1290,
 'xii': 1803,
 'xiii': 1134,
 'xiiii': 433,
 'xu': 1765,
 'xui': 976,
 'xuii': 905,
 'xuiii': 764,
 'xuiiii': 96,
 'xx': 887,
 'xxi': 513,
 'xxii': 486,
 'xxiii': 408,
 'xxiiii': 95,
 'xxu': 525,
 'xxui': 356,
 'xxuii': 340,
 'xxuiii': 333,
 'romanaeque': 9,
 'octauianum': 27,
 'romana': 986,
 'reipublicae': 409,
 'romani': 3297,
 'romanum': 1098,
 'hispanos': 52,
 'graecorum': 521,
 't': 2018,
 'neglens': 1,
 'tiberi': 88,
 'quinctio': 111,
 'puberatem': 1,
 'sallustius': 151,
 'insituto': 1,
 'aramatam': 1,
 'homero': 101,
 'tydeum': 1,
 'fortiroem': 1,
 'mantuanus': 18,
 'martio': 156,
 'longioribusm': 1,
 'gynacea': 1,
 'sertorio': 26,
 'incuriosius': 5,
 'requiranda': 1,
 'lacedaemonii': 137,
 'athenienses': 264,
 'tactica': 1,
 'paruissimis': 3,
 'cornelius': 442,
 'frontinus': 6,
 'traiani': 128,
 'hadrianique': 2,
 'ix': 1038,
 'praestingi

You can preserve at least some of the words categorized as "wrong" by applying some tricks on words that are not recognized at first glance (e.g. by replacing “ci“ → “ti”, “e“ → “ae”, capitalizing the word, etc.).

In [15]:
# Delete false positives from the list of wrong words: 
# spelling variations, capitalization:
def do_tricks(word):
    if len(word) < 4: # with short words, do_tricks produces errors
        return False
    tricks = {"ci": "ti",
              "ti": "ci",    # fidutia
              "e": "ae",     # predicare
              "e": "oe",
              "oe": "e",     # foemina, foelix
              "ichi": "ihi", # nichil, nichilum
              "semet": "",   # semetipsos etc.
              "y": "i",      # ydoneus, consyderatus, paradysi
             }
    for pattern, replacement in tricks.items():
        trick_from_left = word.replace(pattern, replacement, 1)
        trick_from_right = replacement.join(word.rsplit(pattern, 1))
        for trick in [trick_from_left, trick_from_right]:
            if h.spell(trick):
                # Exit the for-loop as soon as a solution is found:
                return word
    return False
        

wrong_words_cleaned = {}
correct_words = {}
for word, count in wrong_words.items():
    if do_tricks(word):
        correct_words[word] = count
    elif len(word) > 2 and h.spell(word.capitalize()):
        correct_words[word] = count
    else:
        wrong_words_cleaned[word] = count

print(f"Input:  {len(wrong_words)} wrong words.")
print(f"Output: {len(wrong_words_cleaned)} wrong words.")
print(f"False positives: {len(wrong_words)-len(wrong_words_cleaned)} words.")

Input:  125128 wrong words.
Output: 95284 wrong words.
False positives: 29844 words.


In [16]:
# OK, that looks better already...
correct_words

{'romanos': 1214,
 'romanaeque': 9,
 'octauianum': 27,
 'romana': 986,
 'romani': 3297,
 'romanum': 1098,
 'hispanos': 52,
 'graecorum': 521,
 'tiberi': 88,
 'quinctio': 111,
 'sallustius': 151,
 'homero': 101,
 'tydeum': 1,
 'mantuanus': 18,
 'martio': 156,
 'sertorio': 26,
 'lacedaemonii': 137,
 'athenienses': 264,
 'cornelius': 442,
 'frontinus': 6,
 'traiani': 128,
 'hadrianique': 2,
 'pompei': 486,
 'martium': 43,
 'africanus': 443,
 'numantinos': 11,
 'illyrico': 120,
 'diocletianus': 1510,
 'iouianos': 1,
 'herculianos': 1,
 'romanus': 1352,
 'gothorum': 290,
 'alanorum': 22,
 'hunnorumque': 2,
 'gothos': 101,
 'pannonicos': 2,
 'hadriani': 109,
 'martius': 61,
 'lacedaemonios': 84,
 'marsos': 20,
 'samnites': 166,
 'pelignos': 1,
 'epiri': 67,
 'macedones': 171,
 'thessali': 38,
 'persis': 175,
 'indiam': 64,
 'dacos': 18,
 'moesos': 6,
 'thracas': 34,
 'martem': 163,
 'hannibali': 144,
 'caesarem': 1166,
 'antoninum': 84,
 'aurelium': 31,
 'parthico': 28,
 'alsiensibus': 1,
 '

In [17]:
wrong_words_cleaned

{'onimo': 1,
 'iii': 5146,
 'iiii': 1997,
 'u': 5677,
 'uii': 2137,
 'uiii': 2329,
 'uiiii': 502,
 'x': 2274,
 'xi': 1290,
 'xii': 1803,
 'xiii': 1134,
 'xiiii': 433,
 'xu': 1765,
 'xui': 976,
 'xuii': 905,
 'xuiii': 764,
 'xuiiii': 96,
 'xx': 887,
 'xxi': 513,
 'xxii': 486,
 'xxiii': 408,
 'xxiiii': 95,
 'xxu': 525,
 'xxui': 356,
 'xxuii': 340,
 'xxuiii': 333,
 'reipublicae': 409,
 't': 2018,
 'neglens': 1,
 'puberatem': 1,
 'insituto': 1,
 'aramatam': 1,
 'fortiroem': 1,
 'longioribusm': 1,
 'gynacea': 1,
 'incuriosius': 5,
 'requiranda': 1,
 'tactica': 1,
 'paruissimis': 3,
 'ix': 1038,
 'praestingit': 1,
 'gnei': 4,
 'transeuntur': 1,
 'republica': 131,
 'aptir': 1,
 'fortitor': 1,
 'utiliitatis': 1,
 'catafractis': 7,
 'que': 2335,
 'obpugnatione': 8,
 'maximianus': 1532,
 'salitio': 1,
 'lx': 240,
 'uergilio': 63,
 'gratiani': 35,
 'catafractas': 8,
 'catafracta': 1,
 'rempublicam': 161,
 'proximior': 20,
 'appelatur': 2,
 'solunt': 1,
 'exrrcitatione': 1,
 'hout': 233,
 'haines'

In [18]:
h.spell("confideremus")

True

In [19]:
# Let's store our result and write the wrong words to file:
with open("wrong_words_cleaned.txt", "w", encoding="utf-8") as f:
    line = []
    for word, count in wrong_words_cleaned.items():
        f.write(word+"\n")

## Apply corrections to the SymSpell dictionary
Now that we have a list of wrong words, we should delete them from SymSpell. Furthermore we can add/delete some hand-picked words I collected during the last two years.

In [20]:
# Delete the false entries we found above:
# THIS MAY TAKE SOME TIME depending on the size of your word list!
for word, count in wrong_words_cleaned.items():
    sym_spell.delete_dictionary_entry(word)
print(f"The cleaned dictionary has {len(sym_spell._words)} entries.")

The cleaned dictionary has 313099 entries.


In [21]:
# Add some hand-picked words to the dictionary:
counter = 0
with open("manually_added_words.csv", "r", encoding="utf-8") as f:
    for word in f.readlines():
        if word.strip() in sym_spell._words:
            pass
        else:
            counter += 1
            sym_spell.create_dictionary_entry(word.strip(), 10)
print(f"Added {counter} words.")

Added 834 words.


In [22]:
# Delete some hand-picked words from the dictionary:
counter = 0
with open("manually_deleted_words.csv", "r", encoding="utf-8") as f:
    for word in f.readlines():
        if word.strip() in sym_spell._words:
            sym_spell.delete_dictionary_entry(word.strip())
            counter += 1
print(f"Deleted {counter} words.")

Deleted 52 words.


In [23]:
# Save the improved SymSpell dictionary as a pickle stream
# Help on pickle: https://docs.python.org/3/library/pickle.html

with open("symspell_dictionary_LA.pickle", "wb") as f:
    sym_spell.save_pickle_stream(f)
print(f"The cleaned dictionary has {len(sym_spell._words)} entries.")

The cleaned dictionary has 313881 entries.


## Build bigram list
The SymSpell dictionary uses a bigram list to decide whether two tokens belong together forming one word or not. Example: neither the dictionary nor the list of bigrams contain "sempermaior" (which is a mistake) whereas "semper maior" (correct) is in the bigram list but not the dictionary (which includes both words individually).

To build a proper list of bigrams, we should to be aware of punctuation. Therefore, we transform the corpus into a list of sentences, i.e. we tokenize the corpus. Then, we take each sentence and collect the bigrams. 

In [24]:
# Let's build a tokenizer and transform our corpus into a list of sentences.
import re

def tokenize_sentences(list_of_lines, sentence_separators=r"([,;.?!:\(\)\[\]])\s?"):
    """ Transforms a list of text lines into a list of sentences 
        using the sentence_separators (i.e. a regular expression). """

    list_of_lines.append("#") # add dummy entry to simplify the moving window logic below
    sentences = [" "]         # add dummy entry to simplify the moving window logic below
    first_line = True

    for line in list_of_lines:
        line = line.strip() # get rid of spaces, line breaks etc. at the beginning/end of the line.

        # Delete abbreviations like "a.d.":
        line = re.sub(r"(\w)(\.)(\w)(\.)?", r"", line)

        # Identify the tokens using the sentence_separators and split the line:
        line = re.sub(sentence_separators, r"\1$TOKEN$", line)
        junks = re.split(r"\$TOKEN\$", line)

        # Iterate over the junks and decide if and how to join them:
        for junk in junks:
            junk = junk.strip()
            if junk != "" and junk not in "([":
                if sentences[-1][-1] in "[-=]":
                    # The previous line ends with a hyphen: cut the hyphen off and join:
                    sentences[-1] = sentences[-1][:-1] + junk
                elif sentences[-1][-1] in ",;.?!:)]":
                    # The previous line ends with punctuation: cut it off and
                    # add the current junk to the list of sentences:
                    sentences[-1] = sentences[-1][:-1]
                    sentences.append(junk)
                elif sentences[-1][-1] in "([":
                    # The previous line ends with an opening parenthesis/bracket:
                    # Cut the parenthesis/bracket and add the junk as a new sentence:
                    sentences[-1] = sentences[-1][:-2]
                    sentences.append(junk)
                else:
                    # No hyphen in the previous line: add a space:
                    sentences[-1] = sentences[-1] + " " + junk

        first_line = False
        
    # Fix the first and the last sentence:
    sentences[0] = sentences[0].strip()
    sentences = sentences[:-1]

    return sentences # return the sentences except the dummy

test = ["Hello World! This is\n", "Python, i.e. the most popu-\n", "lar (and eastiest to learn)\n","programming\n", "language today!\n", "(so far)", "Here we go!", "We are splitting", "sentences."]

tokenized_corpus = tokenize_sentences(latin_corpus)

In [25]:
# Let's extract the bigrams from the tokenized_corpus.
# THIS CAN TAKE A WHILE.
raw_bigrams = []
for sentence in tokenized_corpus:
    # Extract a list of words without punctuation, Greek letters etc.
    # (This is not a perfect solution because it produces some bad bigrams
    # because if the sentence includes numbers, Greek letters or other unusual 
    # characters, they will be ignored, which can lead to bigrams that are not
    # actually included in the corpus. We would need to build a more complex
    # tokenizer to solve this problem.)
    words = re.findall(r'[a-zA-Z]+', sentence.lower())
    if len(words) > 1:
        for idx, word in enumerate(words[1:]):
            # Filter out erroneous words by checking them with SymSpell:
            if words[idx] in sym_spell._words and word in sym_spell._words:
                raw_bigrams.append(f"{words[idx]} {word}")

# Count the unique bigrams:
BIGRAM_COUNTS = Counter(raw_bigrams)
print("Total # of bigrams: ", sum(BIGRAM_COUNTS.values()))
print("# of unique bigrams:", len(BIGRAM_COUNTS))
bigram_tops = BIGRAM_COUNTS.most_common(50)
print("The 50 most frequent bigrams:")
for k, v in bigram_tops:
    print(f"- {k}: {v}")        

Total # of bigrams:  8751089
# of unique bigrams: 4244833
The 50 most frequent bigrams:
- et in: 11525
- ad ed: 8528
- ulpianus ad: 8358
- non est: 7161
- id est: 5355
- et si: 4903
- si quis: 4702
- est in: 4371
- usque ad: 4285
- et non: 3953
- et ad: 3805
- quod si: 3742
- sed et: 3699
- ab eo: 3669
- est et: 3663
- in quo: 3660
- et cum: 3650
- non potest: 3640
- quod in: 3635
- qui in: 3551
- ut in: 3461
- non solum: 3260
- in eo: 3193
- et de: 3184
- si non: 3147
- sed etiam: 3144
- quae in: 2938
- paulus ad: 2917
- quod non: 2910
- in hoc: 2890
- nam et: 2730
- in qua: 2638
- si quid: 2624
- eo quod: 2521
- quod est: 2459
- est quod: 2435
- hoc est: 2394
- id quod: 2385
- in ea: 2365
- rei publicae: 2349
- in quibus: 2250
- ad eum: 2238
- non esse: 2237
- et ideo: 2227
- est enim: 2209
- ita ut: 2202
- non enim: 2191
- in his: 2166
- qui non: 2095
- neque enim: 2091


In [26]:
# Save the bigrams as a txt file (<term_part_1> <term_part_2> <count>\n…):
# THIS MAY TAKE SOME TIME! The resulting file is quite big (~ 80 MB)
with open("frequency_dictionary_bigrams_la.txt", "w", encoding="utf-8") as f:
    for bigram in BIGRAM_COUNTS:
        f.write(f"{bigram} {BIGRAM_COUNTS[bigram]}\n")

## Integrate the bigrams into the SymSpell dictionary

In [27]:
# Load the list of unique bigrams from the frequency file we created above:
# (format: <term_part_1> <term_part_2> <count>\n…)
# THIS MAY TAKE SOME TIME depending on the size of your corpus!

bigram_path = "frequency_dictionary_bigrams_la.txt"
#bigram_path = "U:/frequency_dictionary_bigrams_la.txt"
sym_spell.load_bigram_dictionary(bigram_path, 0, 2)

# Cf. https://symspellpy.readthedocs.io/en/latest/examples/dictionary.html
# Cf. https://symspellpy.readthedocs.io/en/latest/api/symspellpy.html#symspellpy.symspellpy.SymSpell.load_bigram_dictionary

True

## Properly saving the word frequencies and bigrams as a pickle file
There is one problem with the `save_pickle_stream` method of SymSpellPy: it does not save the bigrams! (The [documentation](https://symspellpy.readthedocs.io/en/latest/api/symspellpy.html#symspellpy.symspellpy.SymSpell.save_pickle_stream) tells us so as well!) To overcome this problem we have to save and load the pickled dictionary using custom functions:

In [28]:
import pickle
from symspellpy.symspellpy import SymSpell, Verbosity

In [29]:
def my_save_pickle(sym_spell, pickled_counts):
    """ Custom saver for SymSpellPy dictionaries that are pre-pickled. 
        The built-in pickle function of SymSpellPy compresses the pickle
        stream, which slows down loading the pickle file. Therefore, this
        functions save an un-compressed stream. Since the built-in load_pickle
        function will not work with un-compressed streams, we need to build 
        custom loading function as well. """
    with open(pickled_counts, "wb") as f:
        pickle_data = {
                "deletes": sym_spell._deletes,
                "words": sym_spell._words,
                "max_length": sym_spell._max_length,
                "data_version": sym_spell.data_version,
                "bigrams": sym_spell._bigrams,
            }
        pickle.dump(pickle_data, f)
        print("Pickled", len(sym_spell._words), "words and", len(sym_spell.bigrams), "bigrams to", pickled_counts)

    return True

In [30]:
my_save_pickle(sym_spell, "symspell_dictionary_LA.pickle")

Pickled 313881 words and 4244833 bigrams to symspell_dictionary_LA.pickle


True

# Test the new SymSpell dictionary
Let's load the new frequency lists into a new SymSpell dictionary and give it a test!

In [31]:
# Load SymSpell dictionary from pickled file (takes some seconds!):
def load_my_pickle(dict_path):
        """ Custom loader for SymSpellPy dictionaries that are pre-pickled
            with my custom safer. The pickle stream is not compressed (reduces loading
            time) and the pickle also contains bigrams (not only unique words). """
        with open(dict_path, "rb") as f:
            pickle_data = pickle.load(f)
            return pickle_data

new_sym_spell = SymSpell(3, 7)
        
dict_path = "symspell_dictionary_LA.pickle"
pickled_data = load_my_pickle(dict_path)

new_sym_spell._deletes = pickled_data["deletes"]
new_sym_spell._words = pickled_data["words"]
new_sym_spell._bigrams = pickled_data['bigrams']
new_sym_spell._max_length = pickled_data['max_length']
new_sym_spell.data_version = pickled_data['data_version']

In [32]:
f"The new dictionary contains {len(new_sym_spell._words)} words and {len(new_sym_spell._bigrams)} bigrams"

'The new dictionary contains 313881 words and 4244833 bigrams'

Let's how the improved dictionary deals with words it had problems with at the beginning:

In [33]:
for word in ["cio", "cum", "dig", "iohele"]:
    try:
        print(word, new_sym_spell._words[word])
    except:
        print(word, "ERROR")

cio 10
cum 97511
dig ERROR
iohele 10


In [34]:
from pprint import pprint
for word in ["mar", "tio", "cio", "chriftus"]:
    print(word)
    pprint([str(s) for s in sym_spell.lookup(word, Verbosity.CLOSEST)])

mar
['mare, 1, 2636',
 'mari, 1, 1707',
 'par, 1, 1347',
 'mai, 1, 1188',
 'mars, 1, 165',
 'maro, 1, 47',
 'mas, 1, 41',
 'far, 1, 38',
 'lar, 1, 15',
 'car, 1, 15',
 'nar, 1, 12',
 'mara, 1, 4',
 'mary, 1, 1']
tio
['tuo, 1, 3323',
 'otio, 1, 419',
 'pio, 1, 270',
 'io, 1, 257',
 'tiro, 1, 110',
 'aio, 1, 105',
 'tito, 1, 75',
 'dio, 1, 60',
 'fio, 1, 23',
 'ito, 1, 16',
 'cio, 1, 10',
 'trio, 1, 4',
 'lio, 1, 4',
 'teo, 1, 4',
 'tior, 1, 3',
 'teio, 1, 3',
 'sio, 1, 3',
 'bio, 1, 3',
 'uio, 1, 3',
 'tino, 1, 2',
 'itio, 1, 2',
 'atio, 1, 2',
 'tibo, 1, 1',
 'timo, 1, 1']
cio
['cio, 0, 10']
chriftus
['christus, 1, 1693']
