-
Notifications
You must be signed in to change notification settings - Fork 0
FAQ
Common questions and the small snags that trip people up — especially if Python is new to you. If your problem isn't here, please open an issue.
Python can't find the library. Almost always one of:
-
Your virtual environment isn't active. Re-activate it (you do this every
new terminal session) —
.venv\Scripts\Activate.ps1on Windows,source .venv/bin/activateon macOS/Linux — then try again. See Getting Started, Step 3. -
It isn't installed in this environment. Run
pip install pyaegean. - You installed with one Python and are running another. Check with
pip show pyaegeanandpython --version.
Don't install system-wide or use sudo. Make a
virtual environment
and install into it — no special permissions needed.
Run this once, then activate again:
Set-ExecutionPolicy -Scope CurrentUser RemoteSignedThe core library has no hard third-party dependencies. pandas is an optional extra
for DataFrame output — install it with pip install "pyaegean[data]". It's imported
only when you call a DataFrame feature, which is part of why import aegean stays
fast.
pip install --upgrade pyaegeanPython 3.10 or newer. Check with python --version.
That's a display/font issue, not a data problem — the text is correct underneath.
- Best fix: use Jupyter or a modern editor (VS Code), which render polytonic Greek cleanly.
-
In a Windows terminal: run
chcp 65001to switch it to UTF-8 first, and use a font with Greek coverage. - If you write Greek to a file, open it as UTF-8.
You don't need one. Type Beta Code — the standard ASCII transliteration used by the TLG and Perseus — and convert:
from aegean import greek
greek.betacode_to_unicode("mh=nin") # 'μῆνιν'See Greek NLP → Beta Code for the full key.
Unicode has more than one way to encode the same accented letter. Normalise first:
greek.normalize("ό") # canonical NFC formNo. The core library, the full Linear A corpus, and the Greek pipeline all work
offline. A few opt-in things touch the network on first use, then cache:
the fetched corpora — aegean.load("damos") (the full ~5,900-tablet Linear B corpus,
~2 MB) and aegean.load("sigla") (the SigLA Linear A dataset, ~1 MB) —
data.fetch(...) for large extra assets (the facsimile images), the optional AI layer,
and the opt-in Greek backends. The treebank/LSJ/tagger/lemmatizer/parser backends now
fetch small prebuilt artifacts — greek.use_lsj() a ~15 MB index (not 270 MB of
Perseus TEI), and greek.use_treebank() / use_tagger() / use_lemmatizer() /
use_parser() one shared ~15 MB AGDT-derived bundle (no 75 MB download or local
training) — falling back to building from source if an asset is unreachable. The
[neural] models are larger: greek.use_neural_lemmatizer() (~232 MB) and
greek.use_neural_pipeline() (~518 MB). Everything else, including the rule-based
pipeline, works fully offline.
Only for the AI Layer (translation, glossing, decipherment
hypotheses). Everything else — analysis, scansion, morphology, statistics — needs
no key and no account. To use AI, install a provider extra and set its key, e.g.
pip install "pyaegean[anthropic]" and ANTHROPIC_API_KEY.
from aegean import data
data.cache_dir() # the cache location (override with the PYAEGEAN_CACHE env var)No — and the library is built to keep you honest about this. Linear A is undeciphered. The phonetic values come from Linear B as a working convention, and every analytical or AI method is labeled exploratory: evidence to weigh, never a translation. Treat results as leads for a human expert, not answers.
It's a hypothesis from a language model, returned as an ExploratoryResult with
provenance and an unmistakable exploratory label. Useful for ideas; never citable
as fact. Always verify against primary scholarship.
The default rule/seed engines are an offline baseline: high-precision on closed
classes (article, prepositions, pronouns…) and regular paradigms, but they miss
irregular, third-declension, contract, and most open-class forms — and they tell you
when a result is reconstructed (lemma_certain=False).
Several opt-in backends raise accuracy well past that baseline. The strongest is the
neural pipeline — greek.use_neural_pipeline() (the [neural] extra): one joint
model for POS, morphology, UD dependency parsing, and lemmatization, state of the art
on the UD Ancient Greek benchmarks (96.9 UPOS / 96.1 UFeats / 94.4 lemma / 89.2 UAS /
84.4 LAS on the Perseus test fold, measured end-to-end from raw text — see
the neural pipeline). The lighter tiers:
greek.use_treebank() supplies attested, correctly-accented lemmas, full morphology,
and gold POS for forms attested in the AGDT; greek.use_tagger() generalizes POS at
~84% on unseen forms; greek.use_neural_lemmatizer() (a GreTa seq2seq) reaches 76.3%
on unseen forms, while the zero-dependency greek.use_lemmatizer() (edit-trees +
perceptron) reaches ~40%; greek.use_parser() is a pure-Python dependency parser
(~0.67 UAS / 0.57 LAS on projective AGDT). Quantify any combination on your own gold
set with benchmark.compare_modes(). For meaning, opt into greek.use_lsj() (LSJ
glossing). See Treebank-backed mode
and Morphological analysis.
Every corpus carries its citation, and the repo ships a CITATION.cff:
corpus = aegean.load("lineara")
corpus.cite() # one line; also corpus.cite("bibtex") / corpus.cite("apa")
# Godart, L. & Olivier, J.-P. (1976–1985). Recueil des inscriptions en linéaire A. — https://github.com/mwenge/lineara.xyzThe citation follows the exact subset you used: a filtered corpus records what was filtered, and query results record the query —
corpus.filter(site="Haghia Triada").cite()
# … — https://github.com/mwenge/lineara.xyz [subset: filter(site='Haghia Triada') → 1110 of 1721 documents]
results = corpus.query([FilterRow("word-prefix", "KU")], output="words")
results.cite() # … [query: Word starts with: KU → N words]See Data & Provenance for full licensing and attribution.
-
Bugs / feature requests: GitHub Issues
— please include your pyaegean version (
python -c "import aegean; print(aegean.__version__)"). - How a function behaves: the per-domain reference pages — Linear A, Analysis, Greek NLP, AI Layer.
- Contributing a fix or a script plugin: Development.
Start here
Aegean scripts
Greek
Capabilities
Reference