-
Notifications
You must be signed in to change notification settings - Fork 0
FAQ
Common questions and the small snags that trip people up — especially if Python is new to you. If your problem isn't here, please open an issue.
Python can't find the library. Almost always one of:
-
Your virtual environment isn't active. Re-activate it (you do this every
new terminal session) —
.venv\Scripts\Activate.ps1on Windows,source .venv/bin/activateon macOS/Linux — then try again. See Getting Started, Step 3. -
It isn't installed in this environment. Run
pip install pyaegean. - You installed with one Python and are running another. Check with
pip show pyaegeanandpython --version.
Don't install system-wide or use sudo. Make a
virtual environment
and install into it — no special permissions needed.
Run this once, then activate again:
Set-ExecutionPolicy -Scope CurrentUser RemoteSignedThe core library has no hard third-party dependencies. pandas is an optional extra
for DataFrame output — install it with pip install "pyaegean[data]". It's imported
only when you call a DataFrame feature, which is part of why import aegean stays
fast.
pip install --upgrade pyaegeanPython 3.10 or newer. Check with python --version.
That's a display/font issue, not a data problem — the text is correct underneath.
- Best fix: use Jupyter or a modern editor (VS Code), which render polytonic Greek cleanly.
-
In a Windows terminal: run
chcp 65001to switch it to UTF-8 first, and use a font with Greek coverage. - If you write Greek to a file, open it as UTF-8.
You don't need one. Type Beta Code — the standard ASCII transliteration used by the TLG and Perseus — and convert:
from aegean import greek
greek.betacode_to_unicode("mh=nin") # 'μῆνιν'See Greek NLP → Beta Code for the full key.
Unicode has more than one way to encode the same accented letter. Normalise first:
greek.normalize("ό") # canonical NFC formNo. The core library, the full Linear A corpus, and the Greek pipeline all work
offline. A few opt-in things touch the network on first use, then cache:
data.fetch(...) for large extra assets (the facsimile images), the optional AI layer,
and the opt-in Greek backends — greek.use_treebank() (~75 MB AGDT), greek.use_lsj()
(~270 MB Perseus LSJ), greek.use_parser() (downloads the AGDT if needed, then
trains), and greek.use_neural_lemmatizer() (~232 MB ONNX model). Everything else,
including the rule-based pipeline, works fully offline.
Only for the AI Layer (translation, glossing, decipherment
hypotheses). Everything else — analysis, scansion, morphology, statistics — needs
no key and no account. To use AI, install a provider extra and set its key, e.g.
pip install "pyaegean[anthropic]" and ANTHROPIC_API_KEY.
from aegean import data
data.cache_dir() # the cache location (override with the PYAEGEAN_CACHE env var)No — and the library is built to keep you honest about this. Linear A is undeciphered. The phonetic values come from Linear B as a working convention, and every analytical or AI method is labeled exploratory: evidence to weigh, never a translation. Treat results as leads for a human expert, not answers.
It's a hypothesis from a language model, returned as an ExploratoryResult with
provenance and an unmistakable exploratory label. Useful for ideas; never citable
as fact. Always verify against primary scholarship.
The default rule/seed engines are an offline baseline: high-precision on closed
classes (article, prepositions, pronouns…) and regular paradigms, but they miss
irregular, third-declension, contract, and most open-class forms — and they tell you
when a result is reconstructed (lemma_certain=False).
Several opt-in backends raise accuracy well past that baseline. greek.use_treebank()
supplies attested, correctly-accented lemmas, full morphology, and gold POS for forms
attested in the AGDT. For forms outside the treebank, greek.use_tagger() generalizes
POS at ~84% on unseen forms, and lemmatization generalizes too:
greek.use_neural_lemmatizer() (a GreTa seq2seq, the [neural] extra) reaches 76.3%
on unseen forms, while the zero-dependency greek.use_lemmatizer() (edit-trees +
perceptron) reaches ~40%. Quantify any combination on your own gold set with
benchmark.compare_modes(). For meaning, opt into greek.use_lsj() (LSJ glossing);
for syntax, greek.use_parser() (a dependency parser, ~0.67 UAS / 0.57 LAS on
projective AGDT). See Treebank-backed mode
and Morphological analysis.
Every corpus carries its citation, and the repo ships a CITATION.cff:
corpus = aegean.load("lineara")
corpus.provenance.cite()
# Godart, L. & Olivier, J.-P. (1976–1985). Recueil des inscriptions en linéaire A. — https://github.com/mwenge/lineara.xyzSee Data & Provenance for full licensing and attribution.
- Bugs / feature requests: GitHub Issues.
- How a function behaves: the per-domain reference pages — Linear A, Analysis, Greek NLP, AI Layer.
- Contributing a fix or a script plugin: Development.
Start here
Aegean scripts
Greek
Capabilities
Reference