-
Notifications
You must be signed in to change notification settings - Fork 0
Development
git clone https://github.com/ryanpavlicek/pyaegean
cd pyaegean
pip install -e ".[dev]"Every change must pass the full gate before it lands on main:
pytest # full test suite
python -c "import aegean; print(len(aegean.load('lineara')))" # 1721
ruff check src tests # lint (clean)
mypy # type-check (clean; enforced in CI)
python -m build && python -m twine check dist/*
python scripts/check_footprint.py --wheel "dist/*.whl" # wheel = code + JSON only
python scripts/check_footprint.py # import-clean + import-fastCI (GitHub Actions) runs ruff, mypy (enforcing), the pytest matrix across
Python 3.10–3.13, a build job with twine check, and a footprint job
(scripts/check_footprint.py) that asserts import aegean loads no heavy deps,
imports fast, and that the wheel ships only code + JSON.
-
Lazy heavy imports.
numpy/pandas,onnxruntime, and provider SDKs are imported inside the functions that need them. Keepimport aegeaninstant. - No bundled binaries. Large assets go through the download-to-cache layer.
- Exploratory labeling. Any cross-linguistic, accounting, decipherment, or AI method documents its caveat and is labeled unverified at point of use.
-
Faithful ports. Behavior ported from the Linear A Workbench asserts against
shared golden fixtures (
tests/fixtures/golden/) so the Python port can't silently diverge.
- Subclass
core.Script(id,name,sign_inventory,tokenize) andregister()it. - Register a corpus loader with
core.corpus.register_loader(id, fn). - Import your plugin from
aegean/scripts/__init__.pyso it registers onimport aegean.
See aegean/scripts/lineara and aegean/scripts/greek for worked examples, and
the Architecture page for the layering rules.
src/aegean/
core/ model corpus script provenance numerals
scripts/ lineara/{loader,inventory,phonetic} greek/{loader,inventory}
linearb/{loader,inventory,phonetic,lexicon,epidoc}
cypriot/{loader,inventory,phonetic,lexicon}
cyprominoan/{loader,inventory} (undeciphered — signs only)
analysis/ distance align collocation morphology patterns query accounting structure
greek/ normalize tokenize syllabify accent prosody meter phonology pos morphology lemmatize
pipeline (one-call records) treebank (AGDT lexicon) lexicon (LSJ)
syntax (dependency parser) neural_lemmatizer (GreTa seq2seq)
joint (neural pipeline) mst udfeats
proiel (out-of-AGDT eval) ud (UD eval) benchmark (harness)
io/ epidoc (TEI export) tabular (CSV/Parquet export)
cli/ the `aegean` command ([cli] extra; typer+rich, imported only by the console script)
ai/ client cache providers grounding capabilities
translate/ (hybrid lexicon+LLM)
data/ bundled/{lineara,linearb,cypriot,cyprominoan,greek}/*.json + fetch()/cache
tests/ fixtures/golden/ (+ parity, property, and corpus tests)
wiki/ this documentation (published to the GitHub wiki by CI)
docs/ benchmarks.md, methodology.md, large-corpora.md + the API-reference site (mkdocs)
These pages live in wiki/ in the main repo and are published to the GitHub
wiki automatically by .github/workflows/wiki.yml on push to main. Edit the
Markdown here and open a normal change — do not edit the wiki UI directly, or
the next sync will overwrite it.
Start here
Aegean scripts
Greek
Capabilities
Reference