In this notebook, I collect the code I have written for converting the raw texts I have acquired into properly formatted .tess files.

Pending items to load:
- Augustine De Trinitate
- Aristotle Physics, Organon
- Gregory of Nyssa
- Gregory of Nazianzen
- Evagrius

# Maximus opera omnia (Word)

This version starts from a single document containing the whole opera omnia, and includes a clunky method for ensuring that the Ad Thalassium is formatted correctly. This is still pretty buggy, since it relies on the Word styles to assign the tags, and these are not always used consistently in the source file. I've tried to clean up the file and render it consistent, but can't guarantee that it's perfect yet. The .tess files currently included in the maximus-confessor package have been corrected by hand, and include all of Maximus' works with the exception of the Scholia on Pseudo-Dionysius (which are mostly by John of Scythopolis anyway, and are very complicated to format well). The only major outstanding problem is that the Opuscula have a non-standard numbering.

Note that the last two lines of the script were added to verify that no styles have been used that fall outside of the "if" statements. In fact, Heading 5 is frequently used, but only (or mostly) in headings added by the editor that I want to skip anyway.

In [2]:
from docx import Document

folder = "/home/administrador/maximus-confessor/texts/maximus/"
document = Document(folder+"Agios Maximos.docx")
titles = []
f = open('test.tess','w')
is_thal = 0
for paragraph in document.paragraphs:
    if paragraph.style.name == "Heading 3":
        f.close()
        f = open(folder+'maximus_confessor.'+paragraph.text+'.tess','w')
        titles.append(paragraph.text)
        title = titles[-1][:2]
        if paragraph.text == 'ΠΡΟΣ ΘΑΛΑΣΣΙΟΝ':
            is_thal = 1
            title = 'ad. thal.'
        elif paragraph.text == 'ΠΕΡΙ ΔΙΑΦΟΡΩΝ ΑΠΟΡΙΩΝ ':
            is_thal = 0
        part = 0
        line = 0
    elif paragraph.style.name == "Heading 4":
        part += 1
        line = 0
    elif paragraph.style.name == "Normal":
        if len(paragraph.text)>1:
            line +=1
        if is_thal and part == 1:
            part_name = 'prol'
        elif is_thal and part == 2:
            part_name = 'epist'
        elif is_thal and part > 2:
            part_name = part-2
        else:
            part_name = part
        if part != 0 and len(paragraph.text)>1:
            f.write('<'+title+' '+str(part_name)+'.'+str(line)+'> '+paragraph.text+'\n')
        elif part == 0:
            f.write('<'+title+' '+str(line)+'> '+paragraph.text+'\n')
    elif paragraph.style.name != "Heading 5":
        print(title)
        print(paragraph.style.name)
f.close()

ΠΕ
Body Text Indent 3
ΠΕ
Body Text Indent 3
ΣΧ
Heading 6
ΕΠ
Body Text Indent 3
ΕΠ
Body Text Indent 3
ΕΠ
Body Text Indent 3
ΚΕ
Body Text Indent 3
ΠΡ
Heading 2


# Corpus Corporum

This script is designed to import the Septuagint, but can easily be modified to load other texts from Corpus Corporum.

In [41]:
import xml.etree.ElementTree as ET

folder = "/home/administrador/maximus-confessor/texts/grc/"

def convert_xml_to_tess(xml_file):
    tree = ET.parse(xml_file)
    root = tree.getroot()
    namespace = {"tei": "http://www.tei-c.org/ns/1.0"}

    tess_output = ""

    for div1 in root.findall(".//tei:body/tei:div1", namespace):
        div1_id = div1.get("id")
        for div2 in div1.findall(".//tei:div2", namespace):
            chap_number = div2.get("n")
            for p in div2.findall(".//tei:p", namespace):
                for milestone in p.findall(".//tei:milestone[@unit='verse']", namespace):
                    verse_number = milestone.get("n")
                    text = ET.tostring(milestone, encoding="unicode", method="xml").split('>')[1].strip()
                    tess_line = f"<old_test. {div1_id}.{chap_number}.{verse_number}> {text}\n"
                    tess_output += tess_line

    return tess_output

if __name__ == "__main__":
    input_xml_file = folder+"LXX.xml"  # Replace with your input XML file
    output_tess_file = folder+"LXX.tess"  # Replace with your desired output file name

    tess_output = convert_xml_to_tess(input_xml_file)

    with open(output_tess_file, "w", encoding="utf-8") as output_file:
        output_file.write(tess_output)

    print(f"Conversion completed. Output saved to {output_tess_file}")


Conversion completed. Output saved to /home/administrador/maximus-confessor/texts/grc/LXX.tess


# Perseus

In [122]:
import requests
import re
import xml.etree.ElementTree as ET

folder = "/home/administrador/maximus-confessor/texts/grc/"

def download_xml(urn_id):
    url = f"https://scaife.perseus.org/library/{urn_id}/cts-api-xml/"
    response = requests.get(url)

    if response.status_code == 200:
        return response.text
    else:
        print(f"Failed to download XML. HTTP Status Code: {response.status_code}")
        return None

def clean_xml(xml_string):
    # Remove <lb n="15"/> and <note type="marginal">...</note> patterns
    cleaned_xml = re.sub(r'<lb[^>]+/>|<pb[^>]+/>|<milestone[^>]+/>|<note[^>]+>[\s\S]*?</note>', '', xml_string)
    cleaned_xml = re.sub(r'\t\t\t\t\t\t', ' ', cleaned_xml)
    return cleaned_xml

def convert_perseus_xml_to_tess(perseus_xml,title_id='GRC'):
    root = ET.fromstring(perseus_xml)
    namespace = {"tei": "http://www.tei-c.org/ns/1.0"}

    tess_output = ""
    verse_number = 1

    for chapter in root.findall(".//tei:div[@subtype='chapter']",namespace):
        chap_number = chapter.get("n")
        for p in chapter.findall(".//tei:p",namespace):
            text = ''.join(p.text.strip().split('\n'))
            tess_line = f"<{title_id} {chap_number}.{verse_number}> {text}\n"
            tess_output += tess_line
            verse_number += 1

    return tess_output




In [123]:
if __name__ == "__main__":
    urn_id = "urn:cts:greekLit:tlg0086.tlg006.1st1K-grc1"
    #urn_id = "urn:cts:greekLit:tlg2034.tlg006.opp-grc1:1"  # Replace with the desired urn id
    perseus_xml = download_xml(urn_id)

    if perseus_xml is not None:
        cleaned_xml = clean_xml(perseus_xml)
        tess_output = convert_perseus_xml_to_tess(cleaned_xml,title_id='categories')

        output_tess_file = folder+"aristotle.categories.tess"
        #output_tess_file = folder+"porphyry.isagoge.tess"  # Replace with your desired output file name

        with open(output_tess_file, "w", encoding="utf-8") as output_file:
            output_file.write(tess_output)

        print(f"Conversion completed. Output saved to {output_tess_file}")

Conversion completed. Output saved to /home/administrador/maximus-confessor/texts/grc/aristotle.categories.tess
