-
Notifications
You must be signed in to change notification settings - Fork 0
Greek Works and Books
This page is the reference for loading real Greek texts into pyaegean: the
classical literary corpus (Homer, Plato, Herodotus…) and the Greek New Testament.
You'd come here when you want to stop typing Greek by hand and instead pull a whole
work — or a single book, chapter, or line-range — straight into a Corpus you can
tokenize, scan, tag, and count.
Two doors lead in:
-
greek.load_work("tlg0012.tlg001")— any work in the Perseus canonical-greekLit / First1KGreek collections, addressed by a CTS id (thetlgAAAA.tlgBBBscheme explained below). -
greek.load_nt("John")— the Greek New Testament (Nestle 1904), with a gold lemma, morphology, Strong's number, and a gloss already attached to every word.
Both fetch their text once to a local cache and then work offline, and both return
the same standard Corpus object you get everywhere else in pyaegean. Once a work
is loaded, everything on Greek NLP applies to it.
A work id works everywhere now. As of 0.8.2 you can hand a Greek work id straight to almost any
aegeancommand — no Python required.aegean stats tlg0012.tlg001,aegean export tlg0012.tlg002 -f csv -o odyssey.csv, andaegean db build tlg0012.tlg001 -o iliad.dball resolve the id throughload_workfor you. Anywhere a command takes aCORPUSargument it now accepts a registered id (lineara,nt, …), a Greek work id (tlg0012.tlg001), a path to a saved.json/.dbcorpus, or-for JSON on stdin. §5 below puts that to work.
New to all this? Start with Getting Started, then the Tutorial — it walks through loading the Iliad end to end. For the command-line forms, see CLI; for the licences and cache details, see Data & Provenance.
Every classical Greek work has a stable catalogue address. pyaegean uses the same one the scholarly world uses — the CTS ("Canonical Text Services") id — so the id you find in a citation or in the Scaife Viewer is exactly the id you paste here.
A work id has two halves joined by a dot:
tlg0012 . tlg001
└──┬──┘ └──┬──┘
author work
group within
that author
- The first half (
tlg0012) names the author / text group — Homer. - The second half (
tlg001) names one work by that author —tlg001is the Iliad,tlg002is the Odyssey.
So tlg0012.tlg001 is read as "Homer, work 1 = the Iliad." The numbers are
arbitrary catalogue numbers, not anything you'd guess — you look them up. The
tlg prefix comes from the Thesaurus Linguae Graecae, whose numbering this
scheme inherits. (A few First1KGreek works use a stoa prefix instead of tlg;
the dotted shape is the same.)
You never have to memorise these. The next two sections show how to list the common ones (built in) and find any other (one website).
The dot matters.
load_worksplits on it: the part before the dot is the author directory, the part after is the work file. Pass something without a dot and you get a clear error telling you the expected shape (tlgGROUP.tlgWORK).
pyaegean ships a small, hand-verified catalogue of well-known works so you can discover ids without leaving Python. Every id below was confirmed to resolve against the live source. It's a starting point, not the whole canon (see §3 for everything else).
from aegean import greek
works = greek.popular_works()
len(works) # 25
works[0] # {'id': 'tlg0012.tlg001', 'author': 'Homer', 'title': 'Iliad'}popular_works() is pure metadata — no download, so it works offline and is
instant. Each entry is a plain dict with id, author, and title.
aegean greek works…prints the same catalogue as a table and a copy-paste hint:
Popular Greek works
┌────────────────┬──────────────┬──────────────────────────────────┐
│ id │ author │ title │
├────────────────┼──────────────┼──────────────────────────────────┤
│ tlg0012.tlg001 │ Homer │ Iliad │
│ tlg0012.tlg002 │ Homer │ Odyssey │
│ … │ … │ … │
└────────────────┴──────────────┴──────────────────────────────────┘
Load one with, e.g.: aegean greek work tlg0012.tlg001 --ref 1.1-1.10
This is a curated subset — the full canon is at https://scaife.perseus.org
Add --json for machine-readable output: aegean greek works --json.
This is the complete list, pulled from the live popular_works() function:
| id | author | title |
|---|---|---|
tlg0012.tlg001 |
Homer | Iliad |
tlg0012.tlg002 |
Homer | Odyssey |
tlg0020.tlg001 |
Hesiod | Theogony |
tlg0020.tlg002 |
Hesiod | Works and Days |
tlg0085.tlg004 |
Aeschylus | Seven Against Thebes |
tlg0085.tlg005 |
Aeschylus | Agamemnon |
tlg0085.tlg006 |
Aeschylus | Libation Bearers |
tlg0011.tlg001 |
Sophocles | Trachiniae |
tlg0011.tlg002 |
Sophocles | Antigone |
tlg0011.tlg003 |
Sophocles | Ajax |
tlg0011.tlg004 |
Sophocles | Oedipus Tyrannus |
tlg0006.tlg001 |
Euripides | Cyclops |
tlg0006.tlg002 |
Euripides | Alcestis |
tlg0006.tlg003 |
Euripides | Medea |
tlg0019.tlg002 |
Aristophanes | Knights |
tlg0019.tlg003 |
Aristophanes | Clouds |
tlg0016.tlg001 |
Herodotus | Histories |
tlg0003.tlg001 |
Thucydides | History of the Peloponnesian War |
tlg0032.tlg002 |
Xenophon | Memorabilia |
tlg0032.tlg006 |
Xenophon | Anabasis |
tlg0059.tlg002 |
Plato | Apology |
tlg0059.tlg003 |
Plato | Crito |
tlg0059.tlg004 |
Plato | Phaedo |
tlg0059.tlg030 |
Plato | Republic |
tlg0086.tlg010 |
Aristotle | Nicomachean Ethics |
Notice the pattern: works by one author share the first half of the id. All four
Sophocles plays are tlg0011.tlg00x; all four Plato dialogues here are
tlg0059.tlg0xx. That's the tlgAAAA group at work.
This is the curated short list. To search the full 1,778-work discovery index
in-package, use catalog() / aegean greek catalog — see
§3.
load_work accepts any Perseus canonical-greekLit / First1KGreek CTS id, not
just the 25 above. There are two ways to find an id — a built-in search, and the
official web browser.
pyaegean bundles a complete discovery index of every work that has a Greek
(-grc) edition in those two collections — 1,778 works (768 from Perseus
canonical-greekLit, 1,010 from First1KGreek), far more than the 25 highlights in
popular_works(). It is pure bundled metadata (id, author, English title, Greek
title, source), so searching it needs no network and is instant. Anything it
lists, load_work can fetch.
catalog(query=None, *, author=None, title=None, source=None) returns a list of
dicts, each with id, author, title, greek_title, and source. The filters
combine (all must match); query is free-text across id, author, and either title.
from aegean import greek
len(greek.catalog()) # 1778 (768 perseus + 1010 first1k)
len(greek.catalog(author="plato")) # 39 — every Plato work in the open repos
greek.catalog(title="Ἀντιγόνη") # search by Greek title too
# → [{'id': 'tlg0011.tlg002', 'author': 'Sophocles', 'title': 'Antigone',
# 'greek_title': 'Ἀντιγόνη', 'source': 'perseus'}]
len(greek.catalog("herodotus")) # 2 — free-text across id/author/title
len(greek.catalog(source="first1k")) # 1010 — limit to one collection
greek.catalog()[0]
# → {'id': 'tlg0001.tlg001', 'author': 'Apollonius Rhodius',
# 'title': 'Argonautica', 'greek_title': 'Argonautica', 'source': 'perseus'}From the shell, aegean greek catalog [QUERY] mirrors it (--author/-a,
--title/-t, --source, --limit/-n rows to show, --output/-o to save,
--json):
aegean greek catalog --author plato # filter by author (39 matches)
aegean greek catalog sappho # free-text query (a no-match is reported plainly)
aegean greek catalog --author homer -o homer_works.csv # wrote 49 works to homer_works.csvA query with no matches reports it plainly rather than printing an empty table:
aegean greek catalog sappho
# No works match. Try a looser filter, or browse https://scaife.perseus.org(sappho returns nothing because Sappho's Greek text isn't openly digitized in
either collection — see the coverage note below.)
The catalogue is honest about coverage. It lists exactly what these open repositories actually hold at the pinned commit — not the entire theoretical canon. Some authors whose Greek text isn't openly digitized there (e.g. Sappho) simply won't appear; that's the same set
load_workcan reach, no surprises.
For the canonical web view — or to confirm an edition — use Scaife:
- Go to the Scaife Viewer: https://scaife.perseus.org. This is the official browser for these exact collections.
- Search for your author or work and open it.
- Read the CTS id out of the URL or the work's citation. A Scaife URN looks like
urn:cts:greekLit:tlg0012.tlg001.perseus-grc2— thetlg0012.tlg001middle is the id you pass toload_work(drop theurn:cts:greekLit:prefix and the trailing edition label).
That's it — there's no separate registry to install. If you pass an id that can't be found in either collection, you get an actionable error rather than a silent empty result:
greek.load_work("tlg9999.tlg999")
# aegean.data.DataNotAvailableError: could not fetch 'tlg9999.tlg999' (...).
# Works are addressed as 'tlgGROUP.tlgWORK', e.g. tlg0012.tlg001 (Iliad).Some works exist in more than one digital edition, and a few exist in both collections. Two optional arguments let you steer:
| argument | values | what it does |
|---|---|---|
source |
"auto" (default), "perseus", "first1k"
|
which collection to search. "auto" tries Perseus first, then First1KGreek. |
edition |
a full filename, or a fragment like "perseus-grc2"
|
pin a specific edition file when a work has several; otherwise the highest-numbered -grc* Greek edition wins. |
# force the First1KGreek copy, and a specific edition file fragment
greek.load_work("tlg0086.tlg005", source="first1k", edition="1st1K-grc1")With no ref, you get the whole work as one Document per top-level part — a
book of the Iliad, a chapter run of a prose work.
from aegean import greek
corpus = greek.load_work("tlg0012.tlg001") # network on first use, then cached
len(corpus) # 24 (the 24 books of the Iliad)
corpus.documents[0].id # 'tlg0012.tlg001:1'
corpus.documents[0].meta.name # 'Ἰλιάς — Book 1'
sum(len(d.tokens) for d in corpus) # 127339ref is a citation address that matches the work's own structure. The shape
mirrors how classicists cite: book, then chapter or line, separated by dots, with
an optional range after a hyphen.
ref |
meaning | example work |
|---|---|---|
"1" |
one top-level part (a book) | Iliad book 1 |
"1.2" |
a nested division (book 1, chapter 2) | Herodotus 1.2 |
"1.1-1.50" |
a verse line-range across two full addresses | Iliad book 1, lines 1–50 |
"1.1-50" |
the same range, shorthand (the hi end inherits the lo prefix) | Iliad book 1, lines 1–50 |
A worked, verified example (the opening of the Iliad):
from aegean import greek
corpus = greek.load_work("tlg0012.tlg001", ref="1.1-1.10")
len(corpus) # 1 (one Document for the selected range)
doc = corpus.documents[0]
doc.id # 'tlg0012.tlg001:1.1-1.10'
doc.meta.name # 'Ἰλιάς — 1.1-1.10'
len(doc.lines) # 10
# first verse, joined back to text:
" ".join(t.text for t in doc.tokens if t.line_no == 0)
# 'μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος'For a prose work, the middle component is a chapter rather than a verse line:
greek.load_work("tlg0016.tlg001", ref="1.2") # Herodotus, book 1, chapter 2# list the well-known ids first if you need one
aegean greek works
# then load a section
aegean greek work tlg0012.tlg001 --ref 1.1-1.10Output (verified, from cache):
tlg0012.tlg001
┌──────────────┬───────────────────────────────────────────┐
│ field │ value │
├──────────────┼───────────────────────────────────────────┤
│ documents │ 1 │
│ tokens │ 78 │
│ first │ tlg0012.tlg001:1.1-1.10 │
│ name │ Ἰλιάς — 1.1-1.10 │
│ source │ PerseusDL/canonical-greekLit (…grc2.xml) │
│ data_version │ PerseusDL/canonical-greekLit@d4fab69a2c26 │
└──────────────┴───────────────────────────────────────────┘
aegean greek work flags:
| flag | meaning | default |
|---|---|---|
WORK_ID (argument) |
the CTS id, e.g. tlg0012.tlg001
|
required |
--ref |
section to select: 1, 1.2, 1.1-1.50
|
whole work |
--source |
auto, perseus, or first1k
|
auto |
--edition |
pick a specific edition file | best -grc*
|
--output / -o
|
write the corpus to a JSON file | print summary |
--json |
machine-readable summary on stdout | table |
Save a whole work to disk to reuse without re-fetching:
aegean greek work tlg0012.tlg001 -o iliad.json # wrote 24 documents to iliad.jsonEditorial <note> and <bibl> material is not dropped and not mixed into the
running text — it rides along in doc.meta.notes so the text you analyse is clean
while the apparatus stays available.
Loading a work re-fetches and re-parses it every run. For anything you'll come back to — searching it, joining it with another work, sharing it — write it once into a SQLite database and read from that. The database carries the documents and their tokens, plus an FTS5 full-text index, so searches are instant and need no network.
The key change in 0.8.2 is that aegean db build (and combine, stats, export,
… — anything taking a CORPUS) accepts a Greek work id directly, so you can do
all of this without writing a line of Python.
aegean db build tlg0012.tlg001 -o iliad.db
# fetches/parses the Iliad once, then: wrote 24 documents to iliad.dbtlg0012.tlg001 is resolved through load_work for you (network on first use, then
the cache). Add --no-fts to skip the full-text index if you only want the raw
tables.
The Python equivalent — load it yourself, then save:
from aegean import greek
greek.load_work("tlg0012.tlg001").to_sql("iliad.db") # fts=True by defaultaegean combine merges several corpora into a single saved corpus. Each source is
resolved like any other corpus argument, so two work ids become one database:
aegean combine tlg0012.tlg001 tlg0012.tlg002 -o homer.db
# wrote 48 documents to homer.db (merged 2 sources)That's the Iliad (24 books) and the Odyssey (24 books) — all of Homer — in one
file. Write -o homer.json instead for a portable JSON corpus rather than a
database. The merged corpus's provenance names every source it was built from,
so the trail back to Perseus stays intact.
If two sources share a document id, --on-conflict decides what happens —
error (the default, refuse and tell you), first, last, or suffix (keep both,
disambiguating the later id):
aegean combine a.json b.json -o merged.db --on-conflict lastThe same thing in Python — combine([...]), or Corpus.merge(*others):
from aegean import greek, combine
iliad = greek.load_work("tlg0012.tlg001")
odyssey = greek.load_work("tlg0012.tlg002")
homer = combine([iliad, odyssey]) # dedupe="error" by default
len(homer) # 48
homer.provenance.source # 'Merged corpus (aegean.combine)'
homer.to_sql("homer.db")
# equivalently, from one corpus:
homer = iliad.merge(odyssey, dedupe="error")Once it's a database, full-text search is one command and needs no network. It prints the document, the token position, and the matched text:
aegean db search homer.db μῆνιν 'μῆνιν' in homer.db
┌────────────────┬─────┬───────┐
│ doc │ pos │ text │
├────────────────┼─────┼───────┤
│ tlg0012.tlg001:1 │ 0 │ μῆνιν │
└────────────────┴─────┴───────┘
μῆνιν — "wrath", the very first word of the Iliad. Add --limit N to cap hits
(default 50) or --json for machine-readable output. The query is a literal token
or phrase, matched through the FTS5 index.
(Here it is on a small offline corpus, so you can run it yourself end to end:)
aegean db build lineara -o lineara.db # wrote 1721 documents to lineara.db
aegean db search lineara.db KU-RO --limit 3
# 'KU-RO' in lineara.db
# ┌───────┬─────┬───────┐
# │ doc │ pos │ text │
# ├───────┼─────┼───────┤
# │ HT9a │ 25 │ KU-RO │
# │ HT9b │ 20 │ KU-RO │
# │ HT11a │ 7 │ KU-RO │
# └───────┴─────┴───────┘To extend a database you've already built — say you started with the Iliad and now
want the Odyssey in the same file — db add upserts the new source: documents
whose id already exists are replaced, new ids are added, and the FTS index is
refreshed.
aegean db build tlg0012.tlg001 -o homer.db # start with the Iliad
aegean db add tlg0012.tlg002 -o homer.db # added/updated 24 documents in homer.dbThe source can be a work id, a registered corpus id, a .json/.db file, or -.
In Python it's the append=True flag on either saver:
from aegean import greek
greek.load_work("tlg0012.tlg001").to_sql("homer.db") # build
greek.load_work("tlg0012.tlg002").to_sql("homer.db", append=True) # upsert inCorpus.subset(ids) returns a new corpus with only the documents you name — handy
before saving or combining. It records a subset: N of M documents by id note in
the provenance so the slice stays honest:
from aegean import greek
iliad = greek.load_work("tlg0012.tlg001")
books_1_3 = iliad.subset([f"tlg0012.tlg001:{n}" for n in (1, 2, 3)])
len(books_1_3) # 3
books_1_3.provenance.notes[-1] # '...subset: 3 of 24 documents by id'
books_1_3.to_sql("iliad_opening.db")The reverse direction, too. A
queryover the inscription corpora can save its matches as a reusable corpus:aegean query lineara --where word-prefix=KU -o ku_words.dbwrites the matched inscriptions as a.json/.dbyou can thendb searchorcombine. See CLI for the full query language.
greek.load_nt(...) is the Koine counterpart to load_work. It returns a Corpus
of the Greek NT (Nestle 1904) where every token already carries gold
annotations: a lemma, a Robinson-style morph parse, a strongs number, a
reconciled UD upos, the normalized form, and (where available) a short gloss.
You don't have to run a tagger — it's all there.
One book ships inside the package (Philemon), so load_nt("Philemon") works
fully offline; the other 26 books fetch to cache on first use.
ref works just like load_work's, but reads as chapter.verse:
ref |
selects |
|---|---|
"3" |
chapter 3 |
"3.16" |
chapter 3, verse 16 |
"3.16-3.18" |
verses 3:16–3:18 |
"3.16-18" |
the same range, shorthand |
"3-5" |
chapters 3 through 5 |
A complete, verified, offline example (Philemon is the bundled book):
from aegean import greek
corpus = greek.load_nt("Philemon", ref="1.1-1.3") # no network: bundled book
len(corpus) # 1 (one Document per chapter)
doc = corpus.documents[0]
doc.id # 'Phlm 1'
doc.meta.name # 'Philemon 1'
len(doc.tokens) # 41
t = doc.tokens[0]
t.text # 'Παῦλος'
t.annotations
# {'lemma': 'Παῦλος', 'morph': 'N-NSM', 'strongs': '3972',
# 'normalized': 'Παῦλος', 'upos': 'NOUN', 'ref': 'Phlm.1.1', 'gloss': 'Paul'}Turn the per-token annotations into a table (every field becomes a column):
import pandas as pd
from aegean import greek
corpus = greek.load_nt("Philemon", ref="1.1")
rows = [{"text": t.text, **t.annotations} for t in corpus.documents[0].tokens]
pd.DataFrame(rows)[["text", "lemma", "morph", "upos", "strongs", "gloss"]].head()
# text lemma morph upos strongs gloss
# Παῦλος Παῦλος N-NSM NOUN 3972 Paul
# δέσμιος δέσμιος N-NSM NOUN 1198 one bound, a prisoner
# Χριστοῦ Χριστός N-GSM NOUN 5547 anointed, the Messiah, the Christ
# Ἰησοῦ Ἰησοῦς N-GSM NOUN 2424 Jesus
# καὶ καί CONJ CCONJ 2532 and, even, also, namelyOther forms:
greek.load_nt("John") # whole Gospel of John (fetches on first use)
greek.load_nt("John", ref="1.1-18") # John 1:1–18
greek.load_nt("Rom", ref="8") # Romans chapter 8
greek.load_nt() # the whole 27-book NTload_nt(book=None, *, ref=None, force=False) — force=True re-fetches even if
cached; passing a ref without a book raises (you can't address a verse across
all 27 books).
| field | what it is |
|---|---|
lemma |
dictionary headword (gold, from Nestle 1904) |
morph |
Robinson-style morphology tag, e.g. N-NSM, V-PAI-3S
|
strongs |
Strong's number (e.g. 3972 = Paul) |
upos |
coarse Universal Dependencies POS, reconciled from morph
|
normalized |
accent/diacritic-normalized form |
ref |
canonical address of the token, e.g. Phlm.1.1
|
gloss |
brief English gloss (bundled Dodson lexicon, when available) |
The Robinson→UPOS mapping that fills upos is exposed if you want it directly:
from aegean.scripts.greek.nt import robinson_to_upos
robinson_to_upos("N-NSM") # 'NOUN'
robinson_to_upos("V-PAI-3S") # 'VERB'
robinson_to_upos("T-NSM") # 'DET'
robinson_to_upos("CONJ") # 'CCONJ'
robinson_to_upos("N-PRI") # 'PROPN' (proper noun)
robinson_to_upos("A-NUI") # 'NUM' (indeclinable numeral)nt_books() lists every book in canonical order together with the abbreviations
load_nt (and gloss-nt) will accept for it. It's pure metadata — no download.
from aegean import greek
books = greek.nt_books()
len(books) # 27
books[3] # {'name': 'John', 'aliases': ['john', 'jn', 'jhn']}Any of the canonical name or any alias is accepted (case-insensitive; spaces
and dots are ignored — "1 Cor", "1cor", "1Cor." all resolve to 1Cor). The
name is what shows up in document ids.
On the command line:
aegean greek nt-books # the table below
aegean greek nt-books --json # same data as JSON| book | accepted names |
|---|---|
| Matt | matthew, matt, mt |
| Mark | mark, mk, mrk |
| Luke | luke, lk, luk |
| John | john, jn, jhn |
| Acts | acts, act |
| Rom | romans, rom, rm |
| 1Cor | 1corinthians, 1cor, 1co |
| 2Cor | 2corinthians, 2cor, 2co |
| Gal | galatians, gal, ga |
| Eph | ephesians, eph |
| Phil | philippians, phil, php |
| Col | colossians, col |
| 1Thess | 1thessalonians, 1thess, 1th |
| 2Thess | 2thessalonians, 2thess, 2th |
| 1Tim | 1timothy, 1tim, 1ti |
| 2Tim | 2timothy, 2tim, 2ti |
| Titus | titus, tit |
| Phlm | philemon, phlm, phm |
| Heb | hebrews, heb |
| Jas | james, jas, jms |
| 1Pet | 1peter, 1pet, 1pe |
| 2Pet | 2peter, 2pet, 2pe |
| 1John | 1john, 1jn, 1jhn |
| 2John | 2john, 2jn, 2jhn |
| 3John | 3john, 3jn, 3jhn |
| Jude | jude, jud |
| Rev | revelation, rev, rv, apocalypse |
An unrecognised name gives a helpful error:
greek.load_nt("Genesis")
# ValueError: unknown NT book 'Genesis'; use a name or abbreviation like
# 'John', 'Jn', 'Matthew', '1Cor', 'Rev'(Genesis is in the Hebrew Bible, not the Greek NT — load_nt covers the 27 NT
books only.)
Looking up a single word's Koine gloss without loading a book? That's
aegean greek gloss-nt/greek.lookup_nt(...), documented on Greek NLP — it uses the bundled Dodson lexicon and needs no download.
Both loaders return a standard Corpus, so the rest of the toolkit just works:
from aegean import greek
corpus = greek.load_work("tlg0012.tlg001", ref="1.1-1.10")
line = " ".join(t.text for t in corpus.documents[0].tokens if t.line_no == 0)
line # 'μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος'
greek.scan_hexameter(line).pattern # scan it as dactylic hexameter
greek.syllabify("Ἀχιλῆος") # syllabify a word from itSee Greek NLP for syllabification, accentuation, metrical scansion, tagging, lemmatization, and parsing — every one of those applies to a loaded work.
Each fetched work is pinned to a specific upstream commit, so the same id gives you the same text tomorrow. The commit is recorded on the corpus:
corpus = greek.load_work("tlg0012.tlg001", ref="1")
corpus.provenance.data_version # 'PerseusDL/canonical-greekLit@d4fab69a2c26'
corpus.provenance.license # 'CC BY-SA 4.0 (Perseus Digital Library)'| collection | repository | licence |
|---|---|---|
| canonical-greekLit | PerseusDL/canonical-greekLit |
CC BY-SA 4.0 (Perseus Digital Library) |
| First1KGreek | OpenGreekAndLatin/First1KGreek |
CC BY-SA 4.0 (Open Greek and Latin) |
| New Testament | biblicalhumanities/Nestle1904 |
CC0 (morphology/lemmas/Strong's); base text public domain |
Environment variables let you track a newer upstream state or authenticate large-scale discovery:
| variable | effect |
|---|---|
PYAEGEAN_GREEKLIT_REF |
override the pinned canonical-greekLit commit |
PYAEGEAN_FIRST1K_REF |
override the pinned First1KGreek commit |
PYAEGEAN_NT_CORPUS_URL |
point load_nt at an alternate NT corpus asset |
PYAEGEAN_GITHUB_TOKEN / GITHUB_TOKEN
|
authenticate first-time work discovery (the GitHub contents API is rate-limited to 60 req/hour unauthenticated) |
Files are fetched to your cache, never bundled or re-hosted (except the single offline NT sample book). Full details are on Data & Provenance.
- First use of a non-cached work needs the network. After that it's read from the local cache and works offline. The one exception that's offline from the start is the bundled NT book, Philemon.
-
TEI structures vary.
reffollows each work's own<div>nesting and<l>line numbering. Arefthat doesn't match the work's structure raises a clearValueErrorrather than guessing. -
popular_works()is a curated subset, not the canon. 25 entries for quick discovery; the full 1,778-work discovery index iscatalog()/aegean greek catalog(§3), the web browser is the Scaife Viewer, and any valid id loads. -
Line numbering is the edition's, not invented. Verse
<l>lines are filtered by their numericn; non-numeric or unnumbered lines won't match a numeric range. - NT annotations are gold and fixed (Nestle 1904 morphology), independent of pyaegean's own taggers — useful as a benchmark as well as a corpus.
For the broader picture of what is and isn't in scope, see Limitations.
- Greek NLP — everything you can run on a loaded work
-
CLI — the
aegean greek work/works/nt-bookscommands - Tutorial — a guided, end-to-end load of the Iliad
- Data & Provenance — caches, licences, pinned commits
- Limitations — scope and known gaps
Start here
Aegean scripts
Greek
Capabilities
Reference