CHANGES:
- NEW TEXT FILES:
    * renamed for easier reference; renaming removed the need for a test looking a way to find the doc_id from the filename
    * added to a project-internal folder, which is more end-user-friendly and allows for checking and troubleshooting this code
- UPDATED METADATA FILE:
    * All the relevant metadata now comes directly from the file, no need to reintroduce static data with code on each run.
- MERGED SCRIPTS:
    * New approach to LAGT makes a lot of previous code obsolete / unnecessary. Preprocessing was reformulated into a test of text / metadata correspondence and merged into the same file for an easier total run
    * Combined cells that can run at once to reduce manual running
- ENHANCED HUMAN-READABILITY:
    * Re.search made "unsafe" to act as a notification if some filenames are not properly formatted. Renaming made use of rex unnecessary and left the code cleaner.
    * Removed testing / legacy code; if needed, they can be accessed from earlier versions, being unnecessary in final code.
    * New markdown cells and comments
- FUTURE-PROOFING:
    * More prints to control the workflow in case any changes to the manually added texts or metadata happen

TO BE ADDED:
- total token counts; the earlier approach based on raw text won't work as the new texts contain section tags

In [1]:
import os
import re
import shutil
import pickle
import unicodedata
import pandas as pd
from datetime import datetime
import spacy
from spacy.language import Language

In [2]:
# pip install ipywidgets

#!sudo pip install https://huggingface.co/Jacobo/grc_proiel_trf/resolve/main/grc_proiel_trf-3.7.5-py3-none-any.whl
# pip install https://huggingface.co/Jacobo/grc_proiel_trf/resolve/main/grc_proiel_trf-3.7.5-py3-none-any.whl # If you haven't enabled sudo for the IDE, this seems to work for .venv install

#If it still doesn't work (there were issues in the huggingface release), use GitHub instructions instead: https://github.com/jmyerston/greCy

# Tokenize & lemmatize EXPRECCE texts

## Test that the text files and metadata match (optional; also creates a dataframe with the full texts)

In [3]:
def load_metadata(path):
    metadata = pd.read_csv(path, sep=",")
    if not metadata.empty:
        print("Metadata dataframe created")
    else:
        print("Metadata dataframe is empty")
    return metadata


def load_texts(path):
    exprecce_texts = []
    succeeded = []
    failed = []

    for fn in os.listdir(path):
        if fn.endswith(".txt"):
            try:
                doc_id = re.search(r"(tlg\d{4}\.tlg\d{3})", fn).group(1)
                file_path = os.path.join(path, fn)  # join dir + filename
                with open(file_path, encoding="utf-8") as f:
                    string = f.read()
                exprecce_texts.append({"doc_id": doc_id, "string": string})
                succeeded.append(doc_id)
            except Exception as e:
                print(f"!!!!!! Error processing {fn}: {e}!!!!!!")

    print(f"\nText dataframe created.\nSuccessfully processed {succeeded}.\n")
    if len(failed) == 0:
        print("All texts processed successfully.")
    else:
        print(f"Failed: {failed}")
    exprecce_texts_df = pd.DataFrame(exprecce_texts)
    return exprecce_texts_df


def merge_data(metadata_df, texts_df):
    """Merge metadata and texts, keep everything, create author_id only if missing"""
    # Outer merge to keep everything
    merged = pd.merge(metadata_df, texts_df, on="doc_id", how="outer")
    merged.sort_values("doc_id", inplace=True)

    # Notify missing text
    missing_texts = merged[merged['string'].isna()]['doc_id'].tolist()
    if missing_texts:
        print(
            f"Warning: {len(missing_texts)} metadata rows have no text: {missing_texts}. Add the missing text file to th folder")
    else:
        print("All the metadata rows have a corresponding text")

    # Notify missing metadata
    missing_metadata = \
    merged[merged['doc_id'].isin(texts_df['doc_id']) & merged['string'].notna() & merged['author_id'].isna()][
        'doc_id'].tolist()
    if missing_metadata:
        print(
            f"Warning: {len(missing_metadata)} text files have no metadata: {missing_metadata}. Add metadata to the medatadata file an run again.")
    else:
        print("All the texts have corresponding metadata rows.")

    # Fill author_id for rows with no metadata using doc_id
    merged['author_id'] = merged['author_id'].fillna(merged['doc_id'].str.split(".").str[0])

    print(f"Merged dataframe has {len(merged)} rows.")
    return merged

### MAIN EXECUTION FOR CHECKIN

# Read EXPRECCE metadata from a manually created .csv file into a dataframe and raw text files into another dataframe
metadata_df = load_metadata("../data/sources/exprecce/exprecce_metadata_v2.csv")
texts_df = load_texts("../data/sources/exprecce/exprecce_raw_data_v2/")

# Merge the dataframes based on doc_id
exprecce_df = merge_data(metadata_df, texts_df)
# Show the result
exprecce_df

Metadata dataframe created

Text dataframe created.
Successfully processed ['tlg2062.tlg041', 'tlg0304.tlg001', 'tlg3150.tlg002', 'tlg2005.tlg001', 'tlg2062.tlg511', 'tlg2062.tlg043', 'tlg2015.tlg001', 'tlg2062.tlg050', 'tlg0390.tlg001', 'tlg2017.tlg065', 'tlg2012.tlg001', 'tlg0388.tlg004', 'tlg0391.tlg001', 'tlg0389.tlg001', 'tlg2040.tlg035', 'tlg0388.tlg002', 'tlg2011.tlg001', 'tlg2040.tlg034', 'tlg2010.tlg001', 'tlg2798.tlg005', 'tlg5451.tlg001', 'tlg2008.tlg001', 'tlg2007.tlg001', 'tlg0384.tlg002'].

All texts processed successfully.
All the metadata rows have a corresponding text
All the texts have corresponding metadata rows.
Merged dataframe has 24 rows.


Unnamed: 0,author_id,doc_id,grela_id,author,title,raw_date,not_before,not_after,lagt_genre,edition,edition_link,lagt_provenience,textsource,grela_source,string
0,tlg0304,tlg0304.tlg001,lagt_tlg0304.tlg001,,Acta et martyrium Apollonii,A.D. 2/4,101,400.0,Hagiogr.,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...,christian,exprecce,lagt,Μαρτύριον τοῦ ἁγίου καὶ πανευφήμου ἀποστόλου Ἀ...
1,tlg0384,tlg0384.tlg002,lagt_tlg0384.tlg002,,Acta Justini et septem sodalium (recensio B),A.D. 2/3,101,300.0,Hagiogr.,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...,christian,exprecce,lagt,"Μαρτύριον τῶν ἁγίων μαρτύρων Ἰουστίνου, Χαρίτω..."
2,tlg0388,tlg0388.tlg002,lagt_tlg0388.tlg002,,Martyrium Pauli,A.D. 2,101,200.0,"Apocryph., Hagiogr.","R.A. Lipsius & Bonnet, M., Acta apostolorum ap...",https://archive.org/details/ntapoc-acta-aposto...,christian,exprecce,lagt,Μαρτύριον τοῦ ἁγίου ἀποστόλου Παύλου. <1> Ἦσα...
3,tlg0388,tlg0388.tlg004,lagt_tlg0388.tlg004,,Acta Pauli et Theclae,A.D. 2,101,200.0,"Apocryph., Hagiogr.","R.A. Lipsius & Bonnet, M., Acta apostolorum ap...",https://archive.org/details/ntapoc-acta-aposto...,christian,exprecce,lagt,Πράξεις Παύλου καὶ Θέκλης. <1> Ἀναβαίνοντος Πα...
4,tlg0389,tlg0389.tlg001,lagt_tlg0389.tlg001,,Martyrium Petri,A.D. 2,101,200.0,"Apocryph., Hagiogr.","R.A. Lipsius & Bonnet, M., Acta apostolorum ap...",https://archive.org/details/ntapoc-acta-aposto...,christian,exprecce,lagt,Μαρτύριον τοῦ ἁγίου ἀποστόλου Πέτρου. <1> Κυρι...
5,tlg0390,tlg0390.tlg001,lagt_tlg0390.tlg001,,"Martyrium sanctorum Carpi, Papyli et Agathonicae",A.D. 2,101,200.0,Hagiogr.,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...,christian,exprecce,lagt,Μαρτύριον τῶν ἁγίων Κάρπου Παπύλου καὶ Ἀγαθονί...
6,tlg0391,tlg0391.tlg001,lagt_tlg0391.tlg001,,Acta Scillitanorum martyrum sive passio Sperat...,A.D. 2-3,101,300.0,Hagiogr.,"J. Armitage Robinson, The Passion of S. Perpet...",https://archive.org/details/MN5140ucmf_2/page/...,christian,exprecce,lagt,ΜΑΡΤΥΡΙΟΝ ΤΟΥ ΑΓΙΟΥ ΚΑΙ ΚΑΛΛΙΝΙΚΟΥ ΜΑΡΤΥΡΟΣ ΣΠ...
7,tlg2005,tlg2005.tlg001,lagt_tlg2005.tlg001,,Martyrium Pionii presbyteri et sodalium,A.D. 4,301,400.0,Hagiogr.,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...,christian,exprecce,lagt,Μαρτύριον τοῦ ἁγίου Πιονίου τοῦ πρεσβυτέρου κα...
8,tlg2007,tlg2007.tlg001,lagt_tlg2007.tlg001,,Martyrium Potamiaenae et Basilidis,post A.D. 3,301,,Hagiogr.,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...,christian,exprecce,lagt,"Ἕβδομος ἐν τούτοις ἀριθμείσθω Βασιλείδης, τὴν ..."
9,tlg2008,tlg2008.tlg001,lagt_tlg2008.tlg001,,Martyrium Cononis,post A.D. 4,401,,Hagiogr.,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...,christian,exprecce,lagt,<1.1> Πάλιν ὦ τῆς δυσσεβοῦς κρίσεως. Μετὰ τὸ τ...


## Process the text files into .pickles

In [4]:
### LOAD NLP MODEL AND DEFINE A CUSTOM SENTENCIZER
nlp = spacy.load('grc_proiel_trf')
nlp.disable_pipe("parser")

@Language.component("custom_sentencizer")
def custom_sentencizer(doc, max_words=11):
    strong_punct = {".", "!", "?", ";", "·"}
    for token in doc:
        token.is_sent_start = False
    start = 0
    for i, token in enumerate(doc):
        if token.text in strong_punct:
            if i + 1 < len(doc):
                doc[i + 1].is_sent_start = True
            start = i + 1
        elif token.text == "," and i + 1 < len(doc):
            nxt1 = doc[i + 1]
            nxt2 = doc[i + 2] if i + 2 < len(doc) else None
            trigger1 = nxt1.pos_ in {"CCONJ", "SCONJ", "ADP"}
            trigger2 = nxt2.pos_ in {"CCONJ"} if nxt2 else False
            if trigger1 or trigger2:
                current_len = i - start + 1
                if current_len >= max_words:
                    doc[i + 1].is_sent_start = True
                    start = i + 1
                    continue
                j = i + 1
                while j < len(doc) and not (
                        doc[j].text in strong_punct or
                        (doc[j].text == "," and j + 1 < len(doc) and doc[j + 1].pos_ in {"CCONJ", "SCONJ"})
                ):
                    j += 1
                if j - start >= max_words:
                    doc[i + 1].is_sent_start = True
                    start = i + 1
    return doc

nlp.add_pipe("custom_sentencizer", after="tagger")
print("NLP model loaded and custom sentencizer added")
spacy.__version__

NLP model loaded and custom sentencizer added


'3.8.7'

In [5]:
### DEFINITIONS & FUNCTIONS

_PUNCT = ".,;:!?··’'" # Punctuation to preserve
_ALLOWED_BLOCKS = r"\u0370-\u03FF\u1F00-\u1FFF\u0300-\u036F" # Greek & Coptic + Greek Extended + Combining Diacritics
_EXTRA_GREEK = r"\u1FBD\u0374\u0375" # Optional extra Greek marks (safe to keep)
_CTRL_FMT = re.compile(r"[\u0000-\u001F\u007F-\u009F\u200B\u200C\u200D\u2060\uFEFF]") # Remove controls/format chars (keeps normal space)
_REMOVE_DISALLOWED = re.compile(rf"[^{_ALLOWED_BLOCKS}{_EXTRA_GREEK} {re.escape(_PUNCT)}]") # Allow-list removal: keep Greek blocks + extra marks + ASCII space + chosen punctuation
_COLLAPSE_SPACES = re.compile(r" +") # Collapse runs of spaces
_WORDCHAR = r"A-Za-z0-9\u0370-\u03FF\u1F00-\u1FFF" # Treat "word char" here as: ASCII letters/digits + Greek ranges
_FINAL_POS = re.compile(rf"ϲ(?=[^{_WORDCHAR}]|$)") # Use the word-final class for deciding final sigma after lunate sigma


def grave_to_acute(string: str) -> str:
    grave = "\u0300"
    acute = "\u0301"
    nfd = unicodedata.normalize("NFD", string or "")
    return unicodedata.normalize("NFC", nfd.replace(grave, acute))


def clean_string(string: str) -> str:
    s = unicodedata.normalize("NFC", string) # Normalize to NFC
    s = grave_to_acute(s) # Map grave → acute (polytonic consistency)
    s = _FINAL_POS.sub("ς", s)  # Normalize lunate sigma ϲ: final ϲ → ς
    s = s.replace("ϲ", "σ") # Remaining ϲ → σ
    s = s.replace(", —", ", ").replace(", –", ", ").replace(". —", ". ").replace(". –", "-, ") # Replace dashes with space (we don't keep dashes in punctuation allow-list)
    s = s.replace("—", " ").replace("–", " ") # Replace remaining dashes with comma, as they ase used as such in some texts
    s = s.replace("...", ".") # Make starts of lacunae etc. into sentence ends
    s = s.replace(".,", ",") # Correct artefacts from the previous step
    s = _CTRL_FMT.sub("", s)  # Drop control/format codepoints, keep ordinary space
    s = _REMOVE_DISALLOWED.sub(" ", s)# Allow-list: keep only Greek blocks + combining + extra marks + chosen punctuation + space
    s = _COLLAPSE_SPACES.sub(" ", s).strip() # Collapse multiple spaces
    s = re.sub(r"\s+([.,;:!?··])", r"\1", s) # Remove possible spaces before punctuation
    s = s.replace("·", ".").replace("·", ".") # Normalize Greek middle dot / ano teleia to period
    return s


def split_text_by_divisions(raw_text):
    matches = list(re.finditer(r"<([^>]+)>", raw_text))
    chunks = []
    last_pos = 0
    last_div = 0  # text before first tag
    for m in matches:
        text_chunk = raw_text[last_pos:m.start()]
        chunks.append((last_div, text_chunk))
        last_div = m.group(1)
        last_pos = m.end()

    # remaining text after last tag
    chunks.append((last_div, raw_text[last_pos:]))
    return chunks


# Function to process a single text and write pickle
def data_to_pickles(doc_id, raw_text, target_path):
    all_sentences = []
    sent_counter = 0  # global sentence index

    # Split into division chunks
    chunks = split_text_by_divisions(raw_text)
    for division_number, chunk_text in chunks:
        cleaned_chunk = clean_string(chunk_text)
        if not cleaned_chunk.strip():
            continue
        doc = nlp(cleaned_chunk)
        for sent in doc.sents:
            token_data = [
                (t.text, t.lemma_.lower(), t.pos_, {"div": division_number}, t.idx - sent[0].idx, t.idx - sent[0].idx + len(t))
                for t in sent
            ]
            all_sentences.append((doc_id, sent_counter, sent.text, token_data))
            sent_counter += 1

    # Save pickle
    with open(os.path.join(target_path, doc_id + ".pickle"), "wb") as f:
        pickle.dump(all_sentences, f)

def process_txt_files(target_path):
    # Process files directly from .txt
    raw_path = "../data/sources/exprecce/exprecce_raw_data_v2/"
    os.makedirs(target_path, exist_ok=True)

    succeeded = []
    failed = []

    for fn in os.listdir(raw_path):
        if not fn.endswith(".txt"):
            continue
        try:
            doc_id = re.search(r"(tlg\d{4}\.tlg\d{3})", fn).group(1)
            with open(os.path.join(raw_path, fn), encoding="utf-8") as f:
                raw_text = f.read()
            data_to_pickles(doc_id, raw_text, target_path)
            succeeded.append(doc_id)
        except Exception as e:
            print(f"Error processing {fn}: {e}")
            failed.append(fn)

    print(f"Processed successfully into pickles: {succeeded}")
    if failed:
        print(f"Failed: {failed}")

In [6]:
### MAIN EXECUTION
target_path = "../data/output/exprecce/pickles"
process_txt_files(target_path)

Processed successfully into pickles: ['tlg2062.tlg041', 'tlg0304.tlg001', 'tlg3150.tlg002', 'tlg2005.tlg001', 'tlg2062.tlg511', 'tlg2062.tlg043', 'tlg2015.tlg001', 'tlg2062.tlg050', 'tlg0390.tlg001', 'tlg2017.tlg065', 'tlg2012.tlg001', 'tlg0388.tlg004', 'tlg0391.tlg001', 'tlg0389.tlg001', 'tlg2040.tlg035', 'tlg0388.tlg002', 'tlg2011.tlg001', 'tlg2040.tlg034', 'tlg2010.tlg001', 'tlg2798.tlg005', 'tlg5451.tlg001', 'tlg2008.tlg001', 'tlg2007.tlg001', 'tlg0384.tlg002']


In [11]:
# Source and destination directories
src_dir = "../data/output/exprecce/pickles/"
dst_dir = "/srv/data/greek/exprecce_sentences_2025-10/"

# Create destination directory if it doesn't exist
os.makedirs(dst_dir, exist_ok=True)

# Copy all contents (files + subfolders)
for item in os.listdir(src_dir):
    s = os.path.join(src_dir, item)
    d = os.path.join(dst_dir, item)
    if os.path.isdir(s):
        shutil.copytree(s, d, dirs_exist_ok=True)
    else:
        shutil.copy2(s, d)

print(os.listdir(dst_dir))
print("✅ All files copied successfully!")

['tlg0389.tlg001.pickle', 'tlg2062.tlg050.pickle', 'tlg2017.tlg065.pickle', 'tlg2012.tlg001.pickle', 'tlg0304.tlg001.pickle', 'tlg2798.tlg005.pickle', 'tlg0388.tlg002.pickle', 'tlg2062.tlg043.pickle', 'tlg2010.tlg001.pickle', 'tlg2008.tlg001.pickle', 'tlg2062.tlg041.pickle', 'tlg0391.tlg001.pickle', 'tlg2007.tlg001.pickle', 'tlg0384.tlg002.pickle', 'tlg2011.tlg001.pickle', 'tlg2005.tlg001.pickle', 'tlg2062.tlg511.pickle', 'tlg2040.tlg035.pickle', 'tlg2040.tlg034.pickle', 'tlg5451.tlg001.pickle', 'tlg3150.tlg002.pickle', 'tlg2015.tlg001.pickle', 'tlg0388.tlg004.pickle', 'tlg0390.tlg001.pickle']
✅ All files copied successfully!


In [15]:
with open(os.path.join(dst_dir + "tlg0389.tlg001.pickle"), "rb") as f:
    sent_data = pickle.load(f)
sent_data[:10]

[('tlg0389.tlg001',
  0,
  'Μαρτύριον τοῦ ἁγίου ἀποστόλου Πέτρου.',
  [('Μαρτύριον', 'μαρτύριος', 'NOUN', {'div': 0}, 0, 9),
   ('τοῦ', 'ὁ', 'DET', {'div': 0}, 10, 13),
   ('ἁγίου', 'ἅγιος', 'ADJ', {'div': 0}, 14, 19),
   ('ἀποστόλου', 'ἀποστόλου', 'NOUN', {'div': 0}, 20, 29),
   ('Πέτρου', 'πέτρος', 'PROPN', {'div': 0}, 30, 36),
   ('.', '.', 'PUNCT', {'div': 0}, 36, 37)]),
 ('tlg0389.tlg001',
  1,
  'Κυριακῆς οὔσης, ὁμιλοῦντος τοῦ Πέτρου τοῖς ἀδελφοῖς,',
  [('Κυριακῆς', 'κυριακός', 'ADJ', {'div': '1'}, 0, 8),
   ('οὔσης', 'εἰμί', 'AUX', {'div': '1'}, 9, 14),
   (',', ',', 'PUNCT', {'div': '1'}, 14, 15),
   ('ὁμιλοῦντος', 'ὁμιλοῦντος', 'VERB', {'div': '1'}, 16, 26),
   ('τοῦ', 'ὁ', 'DET', {'div': '1'}, 27, 30),
   ('Πέτρου', 'πέτρος', 'PROPN', {'div': '1'}, 31, 37),
   ('τοῖς', 'ὁ', 'DET', {'div': '1'}, 38, 42),
   ('ἀδελφοῖς', 'ἀδελφός', 'NOUN', {'div': '1'}, 43, 51),
   (',', ',', 'PUNCT', {'div': '1'}, 51, 52)]),
 ('tlg0389.tlg001',
  2,
  'καί προτρέποντος εἰς τήν τοῦ Χριστοῦ πίστ

### EXTRA: Dataframe with metadata from the pickles

In [12]:
# Build final sentence dataframe from pickles & merge metadata
def build_df_from_pickles(source_path, target_path):
    records = []
    for fname in sorted(os.listdir(source_path)):
        if not fname.endswith(".pickle"):
            continue
        with open(os.path.join(source_path, fname), "rb") as f:
            sent_data = pickle.load(f)
        for doc_id, sent_id, sent_text, token_data in sent_data:
            records.append({
                "doc_id": doc_id,
                "sentence_id": f"lagt_{doc_id}_{sent_id}",
                "position": sent_id,
                "text": sent_text,
                "tokens": token_data,
                "sentence_token_count": len(token_data),
            })

    # Load metadata and make a dataframe from the pickle info
    metadata_df = pd.read_csv("../data/sources/exprecce/exprecce_metadata_v2.csv", sep=",")
    if metadata_df.empty:
        raise ValueError("Metadata dataframe is empty")
    print(f"Metadata loaded: {len(metadata_df)} rows")
    df_sentences = pd.DataFrame(records)

    # Merge and reorder
    metadata_cols = [c for c in metadata_df.columns if c != "string"]
    df_merged = df_sentences.merge(metadata_df[metadata_cols], on="doc_id", how="left")
    newcolumns = ["author_id", "doc_id", "grela_id", "sentence_id", "position", "text", "tokens", "sentence_token_count", "author", "title", "raw_date", "not_before", "not_after", "lagt_genre", "lagt_provenience", "grela_source", "textsource", "edition", "edition_link"]
    df_merged = df_merged[newcolumns]

    # Save final CSV
    date = datetime.now().strftime("%Y%m%d_%H%M")
    df_sentences.to_csv(os.path.join(target_path, f"exprecce_sentences_{date}.csv"), index=True)
    df_merged.to_csv(os.path.join(target_path, f"exprecce_complete_{date}.csv"), index=True)
    return df_merged

In [13]:
################
# MAIN EXECUTION
################
source_path = "../data/output/exprecce/pickles"
target_path = "../data/output/exprecce/"
os.makedirs(target_path, exist_ok=True)
df_merged = build_df_from_pickles(source_path, target_path)

# Preview
print(f"Total sentences processed: {len(df_merged)}")
df_merged


Metadata loaded: 24 rows
Total sentences processed: 5002


Unnamed: 0,author_id,doc_id,grela_id,sentence_id,position,text,tokens,sentence_token_count,author,title,raw_date,not_before,not_after,lagt_genre,lagt_provenience,grela_source,textsource,edition,edition_link
0,tlg0304,tlg0304.tlg001,lagt_tlg0304.tlg001,lagt_tlg0304.tlg001_0,0,Μαρτύριον τοῦ ἁγίου καί πανευφήμου ἀποστόλου Ἀ...,"[(Μαρτύριον, μαρτύριος, NOUN, {'div': 0}, 0, 9...",12,,Acta et martyrium Apollonii,A.D. 2/4,101,400.0,Hagiogr.,christian,lagt,exprecce,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...
1,tlg0304,tlg0304.tlg001,lagt_tlg0304.tlg001,lagt_tlg0304.tlg001_1,1,Ἐπί Κομόδου βασιλέως γεναμένου διωγμοῦ κατά τῶ...,"[(Ἐπί, ἐπί, ADP, {'div': 0}, 0, 3), (Κομόδου, ...",16,,Acta et martyrium Apollonii,A.D. 2/4,101,400.0,Hagiogr.,christian,lagt,exprecce,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...
2,tlg0304,tlg0304.tlg001,lagt_tlg0304.tlg001,lagt_tlg0304.tlg001_2,2,"Ἀπολλώς δέ ὁ ἀπόστολος, ἀνήρ ὤν εὐλαβής, Ἀλεξα...","[(Ἀπολλώς, ἀπολλώς, PROPN, {'div': 0}, 0, 7), ...",20,,Acta et martyrium Apollonii,A.D. 2/4,101,400.0,Hagiogr.,christian,lagt,exprecce,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...
3,tlg0304,tlg0304.tlg001,lagt_tlg0304.tlg001,lagt_tlg0304.tlg001_3,3,"Οὗ προσαχθέντος, Περέννιος ὁ ἀνθύπατος εἶπεν.","[(Οὗ, ὅς, PRON, {'div': '1'}, 0, 2), (προσαχθέ...",8,,Acta et martyrium Apollonii,A.D. 2/4,101,400.0,Hagiogr.,christian,lagt,exprecce,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...
4,tlg0304,tlg0304.tlg001,lagt_tlg0304.tlg001,lagt_tlg0304.tlg001_4,4,"Ἀπολλώ, Χριστιανός εἶ;","[(Ἀπολλώ, ἀπολλώ, PROPN, {'div': '1'}, 0, 6), ...",5,,Acta et martyrium Apollonii,A.D. 2/4,101,400.0,Hagiogr.,christian,lagt,exprecce,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4997,tlg5451,tlg5451.tlg001,lagt_tlg5451.tlg001,lagt_tlg5451.tlg001_133,133,τῷ δέ δυναμένῳ πάντας ἡμᾶς εἰσαγαγεῖν τῇ ἑαυτο...,"[(τῷ, ὁ, DET, {'div': '8.4'}, 0, 2), (δέ, δέ, ...",16,,Passio sancti Sabae Gothi (sub auctore Athanar...,A.D. 4,301,400.0,Hagiogr.,christian,lagt,exprecce,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...
4998,tlg5451,tlg5451.tlg001,lagt_tlg5451.tlg001,lagt_tlg5451.tlg001_134,134,"δόξα,","[(δόξα, δόξα, NOUN, {'div': '8.4'}, 0, 4), (,,...",2,,Passio sancti Sabae Gothi (sub auctore Athanar...,A.D. 4,301,400.0,Hagiogr.,christian,lagt,exprecce,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...
4999,tlg5451,tlg5451.tlg001,lagt_tlg5451.tlg001,lagt_tlg5451.tlg001_135,135,"τιμή,","[(τιμή, τιμή, NOUN, {'div': '8.4'}, 0, 4), (,,...",2,,Passio sancti Sabae Gothi (sub auctore Athanar...,A.D. 4,301,400.0,Hagiogr.,christian,lagt,exprecce,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...
5000,tlg5451,tlg5451.tlg001,lagt_tlg5451.tlg001,lagt_tlg5451.tlg001_136,136,"κράτος, μεγαλωσύνη,","[(κράτος, κράτος, NOUN, {'div': '8.4'}, 0, 6),...",4,,Passio sancti Sabae Gothi (sub auctore Athanar...,A.D. 4,301,400.0,Hagiogr.,christian,lagt,exprecce,"Rudolf Knopf, Ausgewählte Märtyrerakten, 3. ne...",https://archive.org/details/ausgewahltemarty00...
