Greek Works and Books

This page is the reference for loading real Greek texts into pyaegean: the classical literary corpus (Homer, Plato, Herodotus…) and the Greek New Testament. You'd come here when you want to stop typing Greek by hand and instead pull a whole work — or a single book, chapter, or line-range — straight into a Corpus you can tokenize, scan, tag, and count.

Two doors lead in:

greek.load_work("tlg0012.tlg001") — any work in the Perseus canonical-greekLit / First1KGreek collections, addressed by a CTS id (the tlgAAAA.tlgBBB scheme explained below).
greek.load_nt("John") — the Greek New Testament (Nestle 1904), with a gold lemma, morphology, Strong's number, and a gloss already attached to every word.

Both fetch their text once to a local cache and then work offline, and both return the same standard Corpus object you get everywhere else in pyaegean. Once a work is loaded, everything on Greek NLP applies to it.

A work id works everywhere now. As of 0.8.2 you can hand a Greek work id straight to almost any aegean command — no Python required. aegean stats tlg0012.tlg001, aegean export tlg0012.tlg002 -f csv -o odyssey.csv, and aegean db build tlg0012.tlg001 -o iliad.db all resolve the id through load_work for you. Anywhere a command takes a CORPUS argument it now accepts a registered id (lineara, nt, …), a Greek work id (tlg0012.tlg001), a path to a saved .json/.db corpus, or - for JSON on stdin. §5 below puts that to work.

New to all this? Start with Getting Started, then the Tutorial — it walks through loading the Iliad end to end. For the command-line forms, see CLI; for the licences and cache details, see Data & Provenance.

1. The `tlgAAAA.tlgBBB` work-id scheme, in plain language

Every classical Greek work has a stable catalogue address. pyaegean uses the same one the scholarly world uses — the CTS ("Canonical Text Services") id — so the id you find in a citation or in the Scaife Viewer is exactly the id you paste here.

A work id has two halves joined by a dot:

tlg0012 . tlg001
└──┬──┘   └──┬──┘
 author     work
 group     within
           that author

The first half (tlg0012) names the author / text group — Homer.
The second half (tlg001) names one work by that author — tlg001 is the Iliad, tlg002 is the Odyssey.

So tlg0012.tlg001 is read as "Homer, work 1 = the Iliad." The numbers are arbitrary catalogue numbers, not anything you'd guess — you look them up. The tlg prefix comes from the Thesaurus Linguae Graecae, whose numbering this scheme inherits. (A few First1KGreek works use a stoa prefix instead of tlg; the dotted shape is the same.)

You never have to memorise these. The next two sections show how to list the common ones (built in) and find any other (one website).

The dot matters. load_work splits on it: the part before the dot is the author directory, the part after is the work file. Pass something without a dot and you get a clear error telling you the expected shape (tlgGROUP.tlgWORK).

2. The built-in catalogue — `popular_works()`

pyaegean ships a small, hand-verified catalogue of well-known works so you can discover ids without leaving Python. Every id below was confirmed to resolve against the live source. It's a starting point, not the whole canon (see §3 for everything else).

Python

from aegean import greek

works = greek.popular_works()
len(works)            # 25
works[0]              # {'id': 'tlg0012.tlg001', 'author': 'Homer', 'title': 'Iliad'}

popular_works() is pure metadata — no download, so it works offline and is instant. Each entry is a plain dict with id, author, and title.

CLI

aegean greek works

…prints the same catalogue as a table and a copy-paste hint:

                        Popular Greek works
┌────────────────┬──────────────┬──────────────────────────────────┐
│ id             │ author       │ title                            │
├────────────────┼──────────────┼──────────────────────────────────┤
│ tlg0012.tlg001 │ Homer        │ Iliad                            │
│ tlg0012.tlg002 │ Homer        │ Odyssey                          │
│ …              │ …            │ …                                │
└────────────────┴──────────────┴──────────────────────────────────┘

Load one with, e.g.:  aegean greek work tlg0012.tlg001 --ref 1.1-1.10
This is a curated subset — the full canon is at https://scaife.perseus.org

Add --json for machine-readable output: aegean greek works --json.

The full catalogue (id → author → title)

This is the complete list, pulled from the live popular_works() function:

id	author	title
`tlg0012.tlg001`	Homer	Iliad
`tlg0012.tlg002`	Homer	Odyssey
`tlg0020.tlg001`	Hesiod	Theogony
`tlg0020.tlg002`	Hesiod	Works and Days
`tlg0085.tlg004`	Aeschylus	Seven Against Thebes
`tlg0085.tlg005`	Aeschylus	Agamemnon
`tlg0085.tlg006`	Aeschylus	Libation Bearers
`tlg0011.tlg001`	Sophocles	Trachiniae
`tlg0011.tlg002`	Sophocles	Antigone
`tlg0011.tlg003`	Sophocles	Ajax
`tlg0011.tlg004`	Sophocles	Oedipus Tyrannus
`tlg0006.tlg001`	Euripides	Cyclops
`tlg0006.tlg002`	Euripides	Alcestis
`tlg0006.tlg003`	Euripides	Medea
`tlg0019.tlg002`	Aristophanes	Knights
`tlg0019.tlg003`	Aristophanes	Clouds
`tlg0016.tlg001`	Herodotus	Histories
`tlg0003.tlg001`	Thucydides	History of the Peloponnesian War
`tlg0032.tlg002`	Xenophon	Memorabilia
`tlg0032.tlg006`	Xenophon	Anabasis
`tlg0059.tlg002`	Plato	Apology
`tlg0059.tlg003`	Plato	Crito
`tlg0059.tlg004`	Plato	Phaedo
`tlg0059.tlg030`	Plato	Republic
`tlg0086.tlg010`	Aristotle	Nicomachean Ethics

Notice the pattern: works by one author share the first half of the id. All four Sophocles plays are tlg0011.tlg00x; all four Plato dialogues here are tlg0059.tlg0xx. That's the tlgAAAA group at work.

This is the curated short list. To search the full 1,778-work discovery index in-package, use catalog() / aegean greek catalog — see §3.

3. Finding any other work

load_work accepts any Perseus canonical-greekLit / First1KGreek CTS id, not just the 25 above. There are two ways to find an id — a built-in search, and the official web browser.

The built-in catalogue — `catalog()` / `aegean greek catalog`

pyaegean bundles a complete discovery index of every work that has a Greek (-grc) edition in those two collections — 1,778 works (768 from Perseus canonical-greekLit, 1,010 from First1KGreek), far more than the 25 highlights in popular_works(). It is pure bundled metadata (id, author, English title, Greek title, source), so searching it needs no network and is instant. Anything it lists, load_work can fetch.

catalog(query=None, *, author=None, title=None, source=None) returns a list of dicts, each with id, author, title, greek_title, and source. The filters combine (all must match); query is free-text across id, author, and either title.

from aegean import greek

len(greek.catalog())                       # 1778   (768 perseus + 1010 first1k)
len(greek.catalog(author="plato"))         # 39     — every Plato work in the open repos
greek.catalog(title="Ἀντιγόνη")            # search by Greek title too
# → [{'id': 'tlg0011.tlg002', 'author': 'Sophocles', 'title': 'Antigone',
#     'greek_title': 'Ἀντιγόνη', 'source': 'perseus'}]
len(greek.catalog("herodotus"))            # 2      — free-text across id/author/title
len(greek.catalog(source="first1k"))       # 1010   — limit to one collection

greek.catalog()[0]
# → {'id': 'tlg0001.tlg001', 'author': 'Apollonius Rhodius',
#    'title': 'Argonautica', 'greek_title': 'Argonautica', 'source': 'perseus'}

From the shell, aegean greek catalog [QUERY] mirrors it (--author/-a, --title/-t, --source, --limit/-n rows to show, --output/-o to save, --json):

aegean greek catalog --author plato        # filter by author (39 matches)
aegean greek catalog sappho                # free-text query (a no-match is reported plainly)
aegean greek catalog --author homer -o homer_works.csv   # wrote 49 works to homer_works.csv

A query with no matches reports it plainly rather than printing an empty table:

aegean greek catalog sappho
# No works match. Try a looser filter, or browse https://scaife.perseus.org

(sappho returns nothing because Sappho's Greek text isn't openly digitized in either collection — see the coverage note below.)

The catalogue is honest about coverage. It lists exactly what these open repositories actually hold at the pinned commit — not the entire theoretical canon. Some authors whose Greek text isn't openly digitized there (e.g. Sappho) simply won't appear; that's the same set load_work can reach, no surprises.

The Scaife Viewer (the web browser for these collections)

For the canonical web view — or to confirm an edition — use Scaife:

Go to the Scaife Viewer: https://scaife.perseus.org. This is the official browser for these exact collections.
Search for your author or work and open it.
Read the CTS id out of the URL or the work's citation. A Scaife URN looks like urn:cts:greekLit:tlg0012.tlg001.perseus-grc2 — the tlg0012.tlg001 middle is the id you pass to load_work (drop the urn:cts:greekLit: prefix and the trailing edition label).

That's it — there's no separate registry to install. If you pass an id that can't be found in either collection, you get an actionable error rather than a silent empty result:

greek.load_work("tlg9999.tlg999")
# aegean.data.DataNotAvailableError: could not fetch 'tlg9999.tlg999' (...).
# Works are addressed as 'tlgGROUP.tlgWORK', e.g. tlg0012.tlg001 (Iliad).

Picking the edition and the source

Some works exist in more than one digital edition, and a few exist in both collections. Two optional arguments let you steer:

argument	values	what it does
`source`	`"auto"` (default), `"perseus"`, `"first1k"`	which collection to search. `"auto"` tries Perseus first, then First1KGreek.
`edition`	a full filename, or a fragment like `"perseus-grc2"`	pin a specific edition file when a work has several; otherwise the highest-numbered `-grc*` Greek edition wins.

# force the First1KGreek copy, and a specific edition file fragment
greek.load_work("tlg0086.tlg005", source="first1k", edition="1st1K-grc1")

4. Loading a work and addressing parts of it (`ref`)

Whole work

With no ref, you get the whole work as one Document per top-level part — a book of the Iliad, a chapter run of a prose work.

from aegean import greek

corpus = greek.load_work("tlg0012.tlg001")   # network on first use, then cached
len(corpus)                                   # 24   (the 24 books of the Iliad)
corpus.documents[0].id                        # 'tlg0012.tlg001:1'
corpus.documents[0].meta.name                 # 'Ἰλιάς — Book 1'
sum(len(d.tokens) for d in corpus)            # 127339

Selecting a section with `ref`

ref is a citation address that matches the work's own structure. The shape mirrors how classicists cite: book, then chapter or line, separated by dots, with an optional range after a hyphen.

`ref`	meaning	example work
`"1"`	one top-level part (a book)	Iliad book 1
`"1.2"`	a nested division (book 1, chapter 2)	Herodotus 1.2
`"1.1-1.50"`	a verse line-range across two full addresses	Iliad book 1, lines 1–50
`"1.1-50"`	the same range, shorthand (the hi end inherits the lo prefix)	Iliad book 1, lines 1–50

A worked, verified example (the opening of the Iliad):

from aegean import greek

corpus = greek.load_work("tlg0012.tlg001", ref="1.1-1.10")
len(corpus)                          # 1   (one Document for the selected range)
doc = corpus.documents[0]
doc.id                               # 'tlg0012.tlg001:1.1-1.10'
doc.meta.name                        # 'Ἰλιάς — 1.1-1.10'
len(doc.lines)                       # 10
# first verse, joined back to text:
" ".join(t.text for t in doc.tokens if t.line_no == 0)
# 'μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος'

For a prose work, the middle component is a chapter rather than a verse line:

greek.load_work("tlg0016.tlg001", ref="1.2")   # Herodotus, book 1, chapter 2

The same thing on the command line

# list the well-known ids first if you need one
aegean greek works

# then load a section
aegean greek work tlg0012.tlg001 --ref 1.1-1.10

Output (verified, from cache):

                          tlg0012.tlg001
┌──────────────┬───────────────────────────────────────────┐
│ field        │ value                                     │
├──────────────┼───────────────────────────────────────────┤
│ documents    │ 1                                         │
│ tokens       │ 78                                        │
│ first        │ tlg0012.tlg001:1.1-1.10                   │
│ name         │ Ἰλιάς — 1.1-1.10                          │
│ source       │ PerseusDL/canonical-greekLit (…grc2.xml)  │
│ data_version │ PerseusDL/canonical-greekLit@d4fab69a2c26 │
└──────────────┴───────────────────────────────────────────┘

aegean greek work flags:

flag	meaning	default
`WORK_ID` (argument)	the CTS id, e.g. `tlg0012.tlg001`	required
`--ref`	section to select: `1`, `1.2`, `1.1-1.50`	whole work
`--source`	`auto`, `perseus`, or `first1k`	`auto`
`--edition`	pick a specific edition file	best `-grc*`
`--output` / `-o`	write the corpus to a JSON file	print summary
`--json`	machine-readable summary on stdout	table

Save a whole work to disk to reuse without re-fetching:

aegean greek work tlg0012.tlg001 -o iliad.json   # wrote 24 documents to iliad.json

Notes carried along

Editorial <note> and <bibl> material is not dropped and not mixed into the running text — it rides along in doc.meta.notes so the text you analyse is clean while the apparatus stays available.

5. Put works into a database

Loading a work re-fetches and re-parses it every run. For anything you'll come back to — searching it, joining it with another work, sharing it — write it once into a SQLite database and read from that. The database carries the documents and their tokens, plus an FTS5 full-text index, so searches are instant and need no network.

The key change in 0.8.2 is that aegean db build (and combine, stats, export, … — anything taking a CORPUS) accepts a Greek work id directly, so you can do all of this without writing a line of Python.

One work → one database

aegean db build tlg0012.tlg001 -o iliad.db
# fetches/parses the Iliad once, then:  wrote 24 documents to iliad.db

tlg0012.tlg001 is resolved through load_work for you (network on first use, then the cache). Add --no-fts to skip the full-text index if you only want the raw tables.

The Python equivalent — load it yourself, then save:

from aegean import greek
greek.load_work("tlg0012.tlg001").to_sql("iliad.db")   # fts=True by default

All of Homer in one database — `combine`

aegean combine merges several corpora into a single saved corpus. Each source is resolved like any other corpus argument, so two work ids become one database:

aegean combine tlg0012.tlg001 tlg0012.tlg002 -o homer.db
# wrote 48 documents to homer.db (merged 2 sources)

That's the Iliad (24 books) and the Odyssey (24 books) — all of Homer — in one file. Write -o homer.json instead for a portable JSON corpus rather than a database. The merged corpus's provenance names every source it was built from, so the trail back to Perseus stays intact.

If two sources share a document id, --on-conflict decides what happens — error (the default, refuse and tell you), first, last, or suffix (keep both, disambiguating the later id):

aegean combine a.json b.json -o merged.db --on-conflict last

The same thing in Python — combine([...]), or Corpus.merge(*others):

from aegean import greek, combine

iliad   = greek.load_work("tlg0012.tlg001")
odyssey = greek.load_work("tlg0012.tlg002")

homer = combine([iliad, odyssey])          # dedupe="error" by default
len(homer)                                  # 48
homer.provenance.source                     # 'Merged corpus (aegean.combine)'
homer.to_sql("homer.db")

# equivalently, from one corpus:
homer = iliad.merge(odyssey, dedupe="error")

Search the database — `db search`

Once it's a database, full-text search is one command and needs no network. It prints the document, the token position, and the matched text:

aegean db search homer.db μῆνιν

 'μῆνιν' in homer.db
┌────────────────┬─────┬───────┐
│ doc            │ pos │ text  │
├────────────────┼─────┼───────┤
│ tlg0012.tlg001:1 │ 0 │ μῆνιν │
└────────────────┴─────┴───────┘

μῆνιν — "wrath", the very first word of the Iliad. Add --limit N to cap hits (default 50) or --json for machine-readable output. The query is a literal token or phrase, matched through the FTS5 index.

(Here it is on a small offline corpus, so you can run it yourself end to end:)

aegean db build lineara -o lineara.db        # wrote 1721 documents to lineara.db
aegean db search lineara.db KU-RO --limit 3
#  'KU-RO' in lineara.db
#  ┌───────┬─────┬───────┐
#  │ doc   │ pos │ text  │
#  ├───────┼─────┼───────┤
#  │ HT9a  │ 25  │ KU-RO │
#  │ HT9b  │ 20  │ KU-RO │
#  │ HT11a │ 7   │ KU-RO │
#  └───────┴─────┴───────┘

Grow an existing database — `db add`

To extend a database you've already built — say you started with the Iliad and now want the Odyssey in the same file — db add upserts the new source: documents whose id already exists are replaced, new ids are added, and the FTS index is refreshed.

aegean db build tlg0012.tlg001 -o homer.db   # start with the Iliad
aegean db add  tlg0012.tlg002 -o homer.db    # added/updated 24 documents in homer.db

The source can be a work id, a registered corpus id, a .json/.db file, or -. In Python it's the append=True flag on either saver:

from aegean import greek

greek.load_work("tlg0012.tlg001").to_sql("homer.db")              # build
greek.load_work("tlg0012.tlg002").to_sql("homer.db", append=True) # upsert in

Taking just part of a corpus — `subset`

Corpus.subset(ids) returns a new corpus with only the documents you name — handy before saving or combining. It records a subset: N of M documents by id note in the provenance so the slice stays honest:

from aegean import greek

iliad = greek.load_work("tlg0012.tlg001")
books_1_3 = iliad.subset([f"tlg0012.tlg001:{n}" for n in (1, 2, 3)])
len(books_1_3)                       # 3
books_1_3.provenance.notes[-1]       # '...subset: 3 of 24 documents by id'
books_1_3.to_sql("iliad_opening.db")

The reverse direction, too. A query over the inscription corpora can save its matches as a reusable corpus: aegean query lineara --where word-prefix=KU -o ku_words.db writes the matched inscriptions as a .json/.db you can then db search or combine. See CLI for the full query language.

6. The Greek New Testament — `load_nt()`

greek.load_nt(...) is the Koine counterpart to load_work. It returns a Corpus of the Greek NT (Nestle 1904) where every token already carries gold annotations: a lemma, a Robinson-style morph parse, a strongs number, a reconciled UD upos, the normalized form, and (where available) a short gloss. You don't have to run a tagger — it's all there.

One book ships inside the package (Philemon), so load_nt("Philemon") works fully offline; the other 26 books fetch to cache on first use.

Loading a book, chapter, verse, or range

ref works just like load_work's, but reads as chapter.verse:

`ref`	selects
`"3"`	chapter 3
`"3.16"`	chapter 3, verse 16
`"3.16-3.18"`	verses 3:16–3:18
`"3.16-18"`	the same range, shorthand
`"3-5"`	chapters 3 through 5

A complete, verified, offline example (Philemon is the bundled book):

from aegean import greek

corpus = greek.load_nt("Philemon", ref="1.1-1.3")   # no network: bundled book
len(corpus)                          # 1   (one Document per chapter)
doc = corpus.documents[0]
doc.id                               # 'Phlm 1'
doc.meta.name                        # 'Philemon 1'
len(doc.tokens)                      # 41

t = doc.tokens[0]
t.text                               # 'Παῦλος'
t.annotations
# {'lemma': 'Παῦλος', 'morph': 'N-NSM', 'strongs': '3972',
#  'normalized': 'Παῦλος', 'upos': 'NOUN', 'ref': 'Phlm.1.1', 'gloss': 'Paul'}

Turn the per-token annotations into a table (every field becomes a column):

import pandas as pd
from aegean import greek

corpus = greek.load_nt("Philemon", ref="1.1")
rows = [{"text": t.text, **t.annotations} for t in corpus.documents[0].tokens]
pd.DataFrame(rows)[["text", "lemma", "morph", "upos", "strongs", "gloss"]].head()
#      text    lemma morph  upos strongs                             gloss
#    Παῦλος   Παῦλος N-NSM  NOUN    3972                              Paul
#   δέσμιος  δέσμιος N-NSM  NOUN    1198             one bound, a prisoner
#   Χριστοῦ  Χριστός N-GSM  NOUN    5547 anointed, the Messiah, the Christ
#     Ἰησοῦ   Ἰησοῦς N-GSM  NOUN    2424                             Jesus
#       καὶ      καί  CONJ CCONJ    2532           and, even, also, namely

Other forms:

greek.load_nt("John")                 # whole Gospel of John  (fetches on first use)
greek.load_nt("John", ref="1.1-18")   # John 1:1–18
greek.load_nt("Rom", ref="8")         # Romans chapter 8
greek.load_nt()                       # the whole 27-book NT

load_nt(book=None, *, ref=None, force=False) — force=True re-fetches even if cached; passing a ref without a book raises (you can't address a verse across all 27 books).

Per-token annotation fields

field	what it is
`lemma`	dictionary headword (gold, from Nestle 1904)
`morph`	Robinson-style morphology tag, e.g. `N-NSM`, `V-PAI-3S`
`strongs`	Strong's number (e.g. `3972` = Paul)
`upos`	coarse Universal Dependencies POS, reconciled from `morph`
`normalized`	accent/diacritic-normalized form
`ref`	canonical address of the token, e.g. `Phlm.1.1`
`gloss`	brief English gloss (bundled Dodson lexicon, when available)

The Robinson→UPOS mapping that fills upos is exposed if you want it directly:

from aegean.scripts.greek.nt import robinson_to_upos
robinson_to_upos("N-NSM")     # 'NOUN'
robinson_to_upos("V-PAI-3S")  # 'VERB'
robinson_to_upos("T-NSM")     # 'DET'
robinson_to_upos("CONJ")      # 'CCONJ'
robinson_to_upos("N-PRI")     # 'PROPN'  (proper noun)
robinson_to_upos("A-NUI")     # 'NUM'    (indeclinable numeral)

7. The 27 books and the names you can use

nt_books() lists every book in canonical order together with the abbreviations load_nt (and gloss-nt) will accept for it. It's pure metadata — no download.

from aegean import greek
books = greek.nt_books()
len(books)         # 27
books[3]           # {'name': 'John', 'aliases': ['john', 'jn', 'jhn']}

Any of the canonical name or any alias is accepted (case-insensitive; spaces and dots are ignored — "1 Cor", "1cor", "1Cor." all resolve to 1Cor). The name is what shows up in document ids.

On the command line:

aegean greek nt-books          # the table below
aegean greek nt-books --json   # same data as JSON

The full book table

book	accepted names
Matt	matthew, matt, mt
Mark	mark, mk, mrk
Luke	luke, lk, luk
John	john, jn, jhn
Acts	acts, act
Rom	romans, rom, rm
1Cor	1corinthians, 1cor, 1co
2Cor	2corinthians, 2cor, 2co
Gal	galatians, gal, ga
Eph	ephesians, eph
Phil	philippians, phil, php
Col	colossians, col
1Thess	1thessalonians, 1thess, 1th
2Thess	2thessalonians, 2thess, 2th
1Tim	1timothy, 1tim, 1ti
2Tim	2timothy, 2tim, 2ti
Titus	titus, tit
Phlm	philemon, phlm, phm
Heb	hebrews, heb
Jas	james, jas, jms
1Pet	1peter, 1pet, 1pe
2Pet	2peter, 2pet, 2pe
1John	1john, 1jn, 1jhn
2John	2john, 2jn, 2jhn
3John	3john, 3jn, 3jhn
Jude	jude, jud
Rev	revelation, rev, rv, apocalypse

An unrecognised name gives a helpful error:

greek.load_nt("Genesis")
# ValueError: unknown NT book 'Genesis'; use a name or abbreviation like
# 'John', 'Jn', 'Matthew', '1Cor', 'Rev'

(Genesis is in the Hebrew Bible, not the Greek NT — load_nt covers the 27 NT books only.)

Looking up a single word's Koine gloss without loading a book? That's aegean greek gloss-nt / greek.lookup_nt(...), documented on Greek NLP — it uses the bundled Dodson lexicon and needs no download.

8. What you can do once a work is loaded

Both loaders return a standard Corpus, so the rest of the toolkit just works:

from aegean import greek

corpus = greek.load_work("tlg0012.tlg001", ref="1.1-1.10")
line = " ".join(t.text for t in corpus.documents[0].tokens if t.line_no == 0)
line                                  # 'μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος'

greek.scan_hexameter(line).pattern    # scan it as dactylic hexameter
greek.syllabify("Ἀχιλῆος")            # syllabify a word from it

See Greek NLP for syllabification, accentuation, metrical scansion, tagging, lemmatization, and parsing — every one of those applies to a loaded work.

9. Reproducibility and provenance

Each fetched work is pinned to a specific upstream commit, so the same id gives you the same text tomorrow. The commit is recorded on the corpus:

corpus = greek.load_work("tlg0012.tlg001", ref="1")
corpus.provenance.data_version   # 'PerseusDL/canonical-greekLit@d4fab69a2c26'
corpus.provenance.license        # 'CC BY-SA 4.0 (Perseus Digital Library)'

collection	repository	licence
canonical-greekLit	`PerseusDL/canonical-greekLit`	CC BY-SA 4.0 (Perseus Digital Library)
First1KGreek	`OpenGreekAndLatin/First1KGreek`	CC BY-SA 4.0 (Open Greek and Latin)
New Testament	`biblicalhumanities/Nestle1904`	CC0 (morphology/lemmas/Strong's); base text public domain

Environment variables let you track a newer upstream state or authenticate large-scale discovery:

variable	effect
`PYAEGEAN_GREEKLIT_REF`	override the pinned canonical-greekLit commit
`PYAEGEAN_FIRST1K_REF`	override the pinned First1KGreek commit
`PYAEGEAN_NT_CORPUS_URL`	point `load_nt` at an alternate NT corpus asset
`PYAEGEAN_GITHUB_TOKEN` / `GITHUB_TOKEN`	authenticate first-time work discovery (the GitHub contents API is rate-limited to 60 req/hour unauthenticated)

Files are fetched to your cache, never bundled or re-hosted (except the single offline NT sample book). Full details are on Data & Provenance.

10. Limitations and honest notes

First use of a non-cached work needs the network. After that it's read from the local cache and works offline. The one exception that's offline from the start is the bundled NT book, Philemon.
TEI structures vary. ref follows each work's own <div> nesting and <l> line numbering. A ref that doesn't match the work's structure raises a clear ValueError rather than guessing.
popular_works() is a curated subset, not the canon. 25 entries for quick discovery; the full 1,778-work discovery index is catalog() / aegean greek catalog (§3), the web browser is the Scaife Viewer, and any valid id loads.
Line numbering is the edition's, not invented. Verse <l> lines are filtered by their numeric n; non-numeric or unnumbered lines won't match a numeric range.
NT annotations are gold and fixed (Nestle 1904 morphology), independent of pyaegean's own taggers — useful as a benchmark as well as a corpus.

For the broader picture of what is and isn't in scope, see Limitations.

Greek Works and Books

Greek Works and Books

1. The tlgAAAA.tlgBBB work-id scheme, in plain language

2. The built-in catalogue — popular_works()

Python

CLI

The full catalogue (id → author → title)

3. Finding any other work

The built-in catalogue — catalog() / aegean greek catalog

The Scaife Viewer (the web browser for these collections)

Picking the edition and the source

4. Loading a work and addressing parts of it (ref)

Whole work

Selecting a section with ref

The same thing on the command line

Notes carried along

5. Put works into a database

One work → one database

All of Homer in one database — combine

Search the database — db search

Grow an existing database — db add

Taking just part of a corpus — subset

6. The Greek New Testament — load_nt()

Loading a book, chapter, verse, or range

Per-token annotation fields

7. The 27 books and the names you can use

The full book table

8. What you can do once a work is loaded

9. Reproducibility and provenance

10. Limitations and honest notes

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pyaegean

Clone this wiki locally

1. The `tlgAAAA.tlgBBB` work-id scheme, in plain language

2. The built-in catalogue — `popular_works()`

The built-in catalogue — `catalog()` / `aegean greek catalog`

4. Loading a work and addressing parts of it (`ref`)

Selecting a section with `ref`

All of Homer in one database — `combine`

Search the database — `db search`

Grow an existing database — `db add`

Taking just part of a corpus — `subset`

6. The Greek New Testament — `load_nt()`