CLI

The `aegean` CLI

The whole toolkit from the command line — corpora, Greek NLP, analysis, data, and the (exploratory) AI layer — without writing Python.

pip install "pyaegean[cli]"     # typer + rich; the core stays zero-dependency
aegean --help

Three conventions hold everywhere:

--json on every command prints one machine-readable JSON document, so results pipe into jq, files, or other tools.
- reads stdin wherever a command takes a TEXT argument, so commands compose in shell pipelines.
Exit codes script cleanly: 0 success, 1 domain error (one line on stderr), 2 usage error. balance --strict exits 1 on any unbalanced total.

On Windows, set PYTHONUTF8=1 (and a UTF-8 console) so Greek renders correctly.

Corpus commands

aegean info lineara                         # size, provenance, license, citation
aegean load lineara --site "Haghia Triada" -o ht.json   # filter → lossless JSON
aegean show lineara HT13                    # one document, line by line
aegean search lineara "KU-*-RO"             # wildcard sign-pattern word search
aegean stats lineara --top 10               # word frequencies (--signs for signs)
aegean dispersion damos --top 10            # Gries' DP: evenly spread vs concentrated
aegean keyness damos --site Pylos           # key vocabulary of a subset vs the rest
aegean cache                                # the opt-in analysis cache (--clear to wipe)
aegean plot keyness damos --site Pylos -o pylos.png   # any of: freq | dispersion |
aegean plot scansion "ἄνδρα μοι ἔννεπε…" -o scan.svg  # keyness | network | balance | scansion
aegean balance lineara HT13                 # KU-RO reconciliation (--strict exits 1)
aegean cite lineara --site "Haghia Triada"  # cite the exact subset (--style bibtex|apa)
aegean export lineara -f csv -o lineara.csv # json | csv | parquet | epidoc
aegean geo lineara                          # located find-sites (+ --output sites.geojson)
aegean sign lineara KU                      # one sign: glyph, codepoint, sound value
aegean bridge linearb po-me                 # po-me → ποιμήν (shepherd)

Every command takes a corpus name: the bundled lineara / linearb / cypriot / cyprominoan / greek, or the fetched-on-demand damos (the full ~5,900-tablet Linear B corpus) and sigla (the SigLA Linear A dataset) — both CC BY-NC-SA, downloaded to your cache on first use. aegean stats damos --top 10 works exactly like its lineara counterpart.

The compound query engine takes repeated --where field=value rows (prefix or: to OR a row, ! to negate it; --fields lists the field registry):

aegean query lineara --where "site-is=Haghia Triada" --where "or:id-contains=ZA" \
       --output-kind words --json

Query results and filtered subsets print their citation, so the exact result set used in a paper is one --json | jq .citation away.

Greek NLP (`aegean greek …`)

The zero-dependency stages work immediately:

aegean greek betacode "mh=nin a)/eide qea/"          # μῆνιν ἄειδε θεά
aegean greek normalize "λόγoς kai μh=νιν" --lenient  # repairs OCR artifacts, warns on stderr
aegean greek tokenize "ἐν ἀρχῇ ἦν ὁ λόγος." [--sentences]
aegean greek syllabify εἰσφέρω                       # εἰσ-φέ-ρω (compound exception)
aegean greek accent λόγος                            # paroxytone
aegean greek quantities πατρός                       # syllable quantities
aegean greek scan "ἄνδρα μοι ἔννεπε, Μοῦσα, πολύτροπον, ὃς μάλα πολλὰ"
aegean greek scan "ὦ κοινὸν αὐτάδελφον Ἰσμήνης κάρα" --meter trimeter  # iambic dialogue
aegean greek ipa "λόγος" --period koine              # reconstructed pronunciation
aegean greek tag "ἐν ἀρχῇ ἦν ὁ λόγος."               # UPOS per token
aegean greek lemmatize "μῆνιν ἄειδε θεά"             # lemma per word
aegean greek morph λόγον                             # candidate morphological parses
aegean greek pipeline "ἐν ἀρχῇ ἦν ὁ λόγος." --json   # per-token records, one call

Backend flags stand in for the use_*() activations — each may download/build to the cache on first use (a note goes to stderr), then everything is offline:

aegean greek pipeline "ἐν ἀρχῇ ἦν ὁ λόγος." --neural   # the joint neural pipeline
aegean greek parse "ἐν ἀρχῇ ἦν ὁ λόγος" --neural       # UD dependency tree
aegean greek tag "…" --treebank --tagger               # AGDT lookup + perceptron tagger
aegean greek gloss λόγος                               # LSJ gloss (~270 MB first use)

Real Greek works load on demand (Perseus canonical-greekLit / First1KGreek, CC BY-SA, commit-pinned, cached):

aegean greek work tlg0012.tlg001                 # the Iliad: 24 books, ~127k tokens
aegean greek work tlg0012.tlg001 -o iliad.json   # as a round-trippable corpus file
aegean greek work tlg0012.tlg001 --ref 1.1-1.50  # just book 1, lines 1–50

aegean greek eval ud --treebank perseus --split test --neural reproduces the published numbers through the official CoNLL 2018 evaluator (heavy: fetches gold data and the model); eval proiel|tagger|lemmatizer|parser cover the other measured evaluations.

Analysis (`aegean analyze …`)

Exploratory surface analyses over the undeciphered material — evidence to weigh, not conclusions:

aegean analyze distance KU-RO KI-RO          # weighted phonetic distance
aegean analyze align KU-RO KI-RO             # per-position alignment
aegean analyze compare po-me ποιμήν          # cross-script: Linear B vs Greek by sound
aegean analyze nearest qa-si-re-u greek      # rank a corpus's words by sound (→ βασιλεύς)
aegean analyze assoc lineara KU-RO KI-RO     # χ², G², Fisher, PMI over shared documents
aegean analyze cooccur lineara KU-RO         # what shares a tablet with KU-RO
aegean analyze clusters lineara              # stem + productive-suffix clusters
aegean analyze structure lineara [HT13]      # accounting/libation/list/text census

compare/nearest take --script-a/--script-b (greek · lineara · linearb · cypriot) and --fold-aspiration (θ/φ/χ → t/p/k, fairer against syllabic spelling).

Data (`aegean data …`)

aegean data list      # the fetchable datasets (sizes, licenses)
aegean data fetch grc-joint    # pre-fetch (e.g. before going offline); sha256-verified
aegean data cache     # cache location + contents (override with PYAEGEAN_CACHE)
aegean data versions --json > data-versions.json   # pin every dataset's sha256 for a paper

AI (`aegean ai …`) — exploratory, key-gated

Generative commands need a provider extra (pip install "pyaegean[anthropic]") and its API key in the environment; every result is labeled exploratory and carries its grounding:

aegean ai providers
aegean ai translate "ἐν ἀρχῇ ἦν ὁ λόγος"             # grounded hybrid translation
aegean ai translate "KU-RO 130" --script lineara      # exploratory (undeciphered!)
aegean ai gloss "μῆνιν ἄειδε θεά"                     # interlinear gloss
aegean ai hypotheses "A-TA-I-*301-WA-JA" --corpus lineara   # cautious decipherment hypotheses
aegean ai ask "What is KU-RO?" --corpus lineara --trace      # --trace audits the grounding
aegean ai extract "OLE S 1" --fields commodity,amount        # structured JSON, ready for jq
aegean ai eval --provider anthropic                          # grounding-fidelity eval

Add --trace to any of these to print the grounding provenance under the answer — the local corpus/lexicon/analysis facts the model was given, grouped by source.

Recipes

Reconcile every Haghia Triada account and keep the failures:

aegean balance lineara --json | jq '[.[] | select(.balances | not)]' > discrepancies.json

Lemmatize a file of Greek, one lemma per line:

cat chapter.txt | aegean greek lemmatize - --json | jq -r '.[].lemma'

Scan a poem line-by-line, keeping only the lines that scan:

while read -r line; do aegean greek scan "$line" --json 2>/dev/null | jq -r .pattern; done < poem.txt

Map a word's distribution and cite the subset you used:

aegean geo lineara --output sites.geojson
aegean cite lineara --site "Zakros" --style bibtex >> paper.bib

pyaegean

Home

Start here

Aegean scripts

Greek

Greek NLP

Capabilities

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI

The `aegean` CLI

Corpus commands

Greek NLP (`aegean greek …`)

Analysis (`aegean analyze …`)

Data (`aegean data …`)

AI (`aegean ai …`) — exploratory, key-gated

Recipes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pyaegean

Clone this wiki locally

CLI

The aegean CLI

Corpus commands

Greek NLP (aegean greek …)

Analysis (aegean analyze …)

Data (aegean data …)

AI (aegean ai …) — exploratory, key-gated

Recipes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pyaegean

Clone this wiki locally

The `aegean` CLI

Greek NLP (`aegean greek …`)

Analysis (`aegean analyze …`)

Data (`aegean data …`)

AI (`aegean ai …`) — exploratory, key-gated