Geography

aegean.geo maps a corpus's find-sites to coordinates and hands you the corpus back as a geopandas GeoDataFrame — so you can ask where things are: where a word clusters, how far a script reaches, how a corpus spreads across Crete and the wider Aegean. You'd reach for it to draw a distribution map, run a spatial join, or export your find-sites as GeoJSON for QGIS, a web map, or a linked-open-data project.

It's an opt-in extra (pip install "pyaegean[geo]" — geopandas + shapely). import aegean stays instant and dependency-free; geopandas and shapely are imported lazily, only when you call a geo function, and that call raises a clear error if the extra isn't installed:

ImportError: geographic analysis needs the optional dependencies: pip install 'pyaegean[geo]'

Everything here is also reachable from the command line with aegean geo if you'd rather not write Python — see CLI. The table view (aegean geo CORPUS) works with just the core install; only GeoJSON export pulls in the [geo] extra.

Quick start (Python)

import aegean
from aegean import geo

corpus = aegean.load("lineara")

geo.to_geodataframe(corpus)                 # one row per inscription with a mapped site
geo.to_geodataframe(corpus, level="site")   # one row per site + its inscription count
geo.word_distribution(corpus, "KU-RO")      # the sites where KU-RO is attested, with counts
geo.site_coordinates()                       # the raw site -> coordinate gazetteer (no extra needed)

Each of the first three returns a geopandas.GeoDataFrame in EPSG:4326 (WGS84 lat/lon) with a geometry column of points — ready for .plot(), spatial joins, or export. Inscriptions whose site isn't in the gazetteer are silently dropped (see Coverage).

What's in the box

Object	What it gives you	Needs `[geo]`?
`geo.to_geodataframe(corpus)`	GeoDataFrame, one row per inscription	yes
`geo.to_geodataframe(corpus, level="site")`	GeoDataFrame, one row per site + count	yes
`geo.word_distribution(corpus, word)`	GeoDataFrame of sites where `word` occurs	yes
`geo.site_coordinates()`	`dict[str, SiteCoord]` — the bundled gazetteer	no
`geo.SiteCoord`	a single site's coordinates + Pleiades id	no

Corpus → GeoDataFrame

to_geodataframe is the workhorse. It walks the corpus, looks each inscription's meta.site up in the gazetteer, and builds point geometry from the matched coordinates. Two granularities:

`level="inscription"` — one row per text

import aegean
from aegean import geo

corpus = aegean.load("lineara")
gdf = geo.to_geodataframe(corpus)            # level="inscription" is the default

gdf.shape        # (1718, 7)
list(gdf.columns)
# ['id', 'site', 'label', 'region', 'period', 'pleiades', 'geometry']
gdf.crs          # <Geographic 2D CRS: EPSG:4326> ...
gdf.head(3)
#     id           site          label region period  pleiades             geometry
# 0  HT1  Haghia Triada  Haghia Triada  crete   LMIB  589672.0  POINT (24.79 35.06)
# 1  HT2  Haghia Triada  Haghia Triada  crete   LMIB  589672.0  POINT (24.79 35.06)
# 2  HT3  Haghia Triada  Haghia Triada  crete   LMIB  589672.0  POINT (24.79 35.06)

(1721 inscriptions in the Linear A corpus, 1718 of them with a site in the gazetteer.)

`level="site"` — one row per find-site

gdf = geo.to_geodataframe(corpus, level="site")

gdf.shape        # (52, 6)
list(gdf.columns)
# ['site', 'label', 'region', 'pleiades', 'inscriptions', 'geometry']
gdf.head(8)
#             site             label  region     pleiades  inscriptions             geometry
# 0  Haghia Triada     Haghia Triada   crete     589672.0          1110  POINT (24.79 35.06)
# 1         Khania            Khania   crete     589886.0           226  POINT (24.02 35.51)
# 2       Phaistos          Phaistos   crete     589987.0            66  POINT (24.81 35.05)
# 3        Knossos           Knossos   crete  781961476.0            59   POINT (25.16 35.3)
# 4         Zakros            Zakros   crete  650881089.0            53   POINT (26.26 35.1)
# 5    Palaikastro       Palaikastro   crete  213924739.0            25   POINT (26.27 35.2)
# 6          Malia             Malia   crete     589922.0            22  POINT (25.49 35.29)
# 7          Thera  Thera (Akrotiri)  aegean     599478.0            18   POINT (25.4 36.36)

Site rows come back sorted by inscription count (most prolific first). The total here (52) is the number of distinct located sites in the Linear A corpus.

Columns

Column	Levels	Type	Meaning
`id`	inscription	str	the inscription id (e.g. `HT1`)
`site`	both	str	the corpus's `meta.site` label (the gazetteer key)
`label`	both	str	the gazetteer's display name (may be fuller, e.g. `Iouktas (Mt Juktas)`)
`region`	both	str	one of the six region codes
`period`	inscription	str	the inscription's `meta.period` (e.g. `LMIB`)
`pleiades`	both	int / null	the Pleiades place id, if aligned
`inscriptions`	site	int	number of inscriptions from this site
`count`	(word_distribution)	int	number of inscriptions at this site that contain the word
`geometry`	all	Point	EPSG:4326 point, `POINT (lon lat)`

Note: pleiades arrives as a float in the GeoDataFrame (e.g. 589672.0) because the column holds nulls for unaligned sites and pandas promotes integer-with-nulls to float. The underlying value is still the integer place id; SiteCoord.pleiades gives you the clean int.

level only accepts "inscription" or "site"; anything else is a ValueError:

geo.to_geodataframe(corpus, level="county")
# ValueError: level must be 'inscription' or 'site'; got 'county'

CLI equivalent

aegean geo prints a located-sites table by default (no [geo] extra needed) and writes GeoJSON with --output. The CLI defaults to --level site.

aegean geo lineara
#        lineara: 52 located site(s) of 52
# ┌──────────────────┬───────┬───────┬───────────┐
# │ site             │ lat   │ lon   │ pleiades  │
# ├──────────────────┼───────┼───────┼───────────┤
# │ Apodoulou        │ 35.16 │ 24.73 │ 119143959 │
# │ Arkhalkhori      │ 35.15 │ 25.27 │ 220781958 │
# │ Armenoi          │ 35.3  │ 24.5  │           │
# │ ...              │       │       │           │
# └──────────────────┴───────┴───────┴───────────┘

aegean geo linearb
#     linearb: 3 located site(s) of 3
# ┌─────────┬───────┬───────┬───────────┐
# │ site    │ lat   │ lon   │ pleiades  │
# ├─────────┼───────┼───────┼───────────┤
# │ Knossos │ 35.3  │ 25.16 │ 781961476 │
# │ Mycenae │ 37.73 │ 22.75 │ 570491    │
# │ Pylos   │ 37.03 │ 21.7  │ 570640    │
# └─────────┴───────┴───────┴───────────┘

Machine-readable rows with --json:

aegean geo lineara --json
# [{"site": "Apodoulou", "lat": 35.16, "lon": 24.73, "pleiades": 119143959}, ... ]
# one object per located site; pleiades is "" when the site isn't aligned

`aegean geo` flags

Flag	Default	What it does
`CORPUS` (argument)	—	corpus id: `lineara`, `linearb`, `cypriot`, `cyprominoan`, `greek`, or a fetched corpus (`nt`, `damos`, `sigla`)
`--level`	`site`	`site` or `inscription` (only affects GeoJSON export)
`--output`, `-o`	—	write GeoJSON to this path instead of printing the table (needs `[geo]`)
`--json`	off	machine-readable JSON rows on stdout (table mode)
`--help`, `-h`	—	show usage and exit

Where a word shows up — `word_distribution`

word_distribution answers "where, across the corpus, does this word turn up?" It returns a site-level GeoDataFrame with a per-site count, sorted most-frequent first — exactly what you want to map a single term.

import aegean
from aegean import geo

corpus = aegean.load("lineara")
wd = geo.word_distribution(corpus, "KU-RO")      # the LA "total" word

wd.shape         # (3, 6)
list(wd.columns) # ['site', 'label', 'region', 'pleiades', 'count', 'geometry']
wd
#             site          label region   pleiades  count             geometry
# 0  Haghia Triada  Haghia Triada  crete     589672     32  POINT (24.79 35.06)
# 1       Phaistos       Phaistos  crete     589987      1  POINT (24.81 35.05)
# 2         Zakros         Zakros  crete  650881089      1   POINT (26.26 35.1)

The match is exact on the word token's surface form (t.text == word), so use the corpus's own transliteration (here, dash-joined sign sequences like KU-RO). See Linear A and Analysis for how to find the words worth mapping.

Edge case: if a word has zero hits the result has no rows, and geopandas can't infer the geometry column on an empty frame, so the call raises rather than returning an empty GeoDataFrame. Check that the word is attested first (e.g. with the corpus's concordance / counts).

There's no dedicated CLI subcommand for word_distribution — it's a Python-only helper.

The gazetteer

geo.site_coordinates() returns the bundled site → coordinate table — a dict[str, SiteCoord] keyed by the corpus's meta.site label. This is the one geo function that needs no extra; it's plain data. Coordinates are approximate (site-level, ~1 km), drawn from standard archaeological references — fine for mapping, not survey work.

from aegean import geo

coords = geo.site_coordinates()
len(coords)                       # 56
coords["Haghia Triada"]
# SiteCoord(name='Haghia Triada', lat=35.06, lon=24.79, region='crete', pleiades=589672)

`SiteCoord`

A frozen dataclass. Fields:

Field	Type	Meaning
`name`	str	display name (may be fuller than the corpus's site label)
`lat`	float	latitude, WGS84
`lon`	float	longitude, WGS84
`region`	str	one of the six region codes below
`pleiades`	int / None	Pleiades place id, if aligned (default `None`)
`pleiades_uri`	property → str / None	full `https://pleiades.stoa.org/places/<id>` URI, or `None`

sc = coords["Haghia Triada"]
sc.lat, sc.lon            # (35.06, 24.79)
sc.region                 # 'crete'
sc.pleiades               # 589672
sc.pleiades_uri           # 'https://pleiades.stoa.org/places/589672'

coords["Pyrgos"].pleiades_uri   # None  (not aligned)

The gazetteer covers the find-sites in all four Aegean-script corpora — the Cretan and Aegean Linear A sites, plus Pylos (Linear B), Cyprus, and the Cypro-Minoan sites Enkomi and Ugarit — and a few outliers like Tel Haror (Negev) and Margiana (Turkmenistan).

Regions

region is a controlled vocabulary of six values. The breakdown of the 56 gazetteer sites:

Region	Sites	What it covers
`crete`	40	the island of Crete (the bulk of Linear A)
`aegean`	5	the Aegean islands (Thera, Kea, Milos, Kythera, Samothrace)
`mainland`	4	the Greek mainland (Mycenae, Tiryns, Pylos, Hagios Stefanos)
`anatolia`	2	the Anatolian coast (Miletus, Troy)
`levant`	4	Cyprus and the Levantine coast (Enkomi, Ugarit, Tel Haror, Cyprus)
`remote`	1	far-flung outliers (Margiana, Turkmenistan)

Pleiades alignment

33 of the 56 find-sites are aligned to a Pleiades place id, for linked-open-data work. Every id is verified by coordinate — the Pleiades representative point is within a few km of ours and its description matches the site — so a match is confirmed, never guessed. It lives on SiteCoord.pleiades (an int), with SiteCoord.pleiades_uri giving the full https://pleiades.stoa.org/places/<id> URI, and surfaces as a pleiades column in the GeoDataFrames from to_geodataframe / word_distribution.

geo.site_coordinates()["Haghia Triada"].pleiades_uri
# 'https://pleiades.stoa.org/places/589672'

The remaining 23 sites are mostly minor findspots, peak sanctuaries, and caves not yet in Pleiades — left null, and listed as upstream-contribution candidates in docs/pleiades-candidates.md.

A few of the major aligned sites:

Site	Region	Pleiades id
Haghia Triada	crete	589672
Khania	crete	589886
Phaistos	crete	589987
Knossos	crete	781961476
Zakros	crete	650881089
Malia	crete	589922
Thera (Akrotiri)	aegean	599478
Pylos (Palace of Nestor)	mainland	570640
Mycenae	mainland	570491
Miletus	anatolia	599799
Enkomi	levant	13818291

Pull the full machine-readable list straight from the CLI:

aegean geo lineara --json
# every located site, with "pleiades" set to the id (or "" if unaligned)

GeoJSON export

A GeoDataFrame serialises to GeoJSON the standard geopandas way — both from Python and the CLI. The output is a FeatureCollection in EPSG:4326; the GeoDataFrame columns become each feature's properties, and geometry becomes a GeoJSON Point.

From the CLI

aegean geo lineara --level site -o la_sites.geojson
# wrote 52 features to la_sites.geojson

aegean geo lineara --level inscription -o la_inscriptions.geojson
# wrote 1718 features to la_inscriptions.geojson

The first feature of the site-level export:

{
  "id": "0",
  "type": "Feature",
  "properties": {
    "site": "Haghia Triada",
    "label": "Haghia Triada",
    "region": "crete",
    "pleiades": 589672.0,
    "inscriptions": 1110
  },
  "geometry": { "type": "Point", "coordinates": [24.79, 35.06] }
}

From Python

gdf = geo.to_geodataframe(corpus, level="site")

# (a) a GeoJSON string in memory:
gdf.to_json()[:60]
# '{"type": "FeatureCollection", "features": [{"id": "0", "type'

# (b) straight to a file (any geopandas-supported driver):
gdf.to_file("la_sites.geojson", driver="GeoJSON")

From there it drops straight into QGIS, a web map (Leaflet/Mapbox), or any GeoJSON-aware tool. Other tabular exports (CSV, Parquet, EpiDoc, SQLite) for the corpus itself live under aegean export; the geo path is specifically for spatial GeoJSON.

Plotting

A GeoDataFrame plots in one line (with matplotlib installed — that's the separate [viz] extra):

gdf = geo.to_geodataframe(corpus, level="site")
ax = gdf.plot()        # the find-sites as points
# overlay on a basemap of your choice, or size points by `inscriptions`:
gdf.plot(markersize="inscriptions" and gdf["inscriptions"] / 5)

For a quick word map, plot a word_distribution frame and scale by count:

wd = geo.word_distribution(corpus, "KU-RO")
wd.plot(markersize=wd["count"] * 10)

pyaegean doesn't ship its own basemap — bring your own (contextily, a shapefile of Crete, etc.). For non-spatial plots (sign frequencies, period histograms) see aegean.viz / CLI's aegean plot.

Coverage

to_geodataframe and word_distribution only emit rows for sites that are in the gazetteer; anything else is dropped. Per corpus:

Corpus	Docs	Located sites	Notes
`lineara`	1721	52 of 52 site labels	1718 docs have a mapped site; the rest have no/unknown site
`linearb`	18	3 of 3 (Knossos, Mycenae, Pylos)	the bundled Linear B sample
`cypriot`	2	1 of 1	small bundled sample
`cyprominoan`	2	2 of 2 (Enkomi, Ugarit)	small bundled sample

The gazetteer holds 56 sites total — more than any single corpus uses — so it already covers find-sites across all four scripts. The few Linear A inscriptions with no row simply carry no usable meta.site value.

Notes & limitations

Coordinates are approximate (~1 km, site-level). They're for mapping and distribution analysis, not for survey work or anything that needs trench-level precision. Don't measure distances and report metres.
Unmapped inscriptions are dropped silently in to_geodataframe / word_distribution. Compare len(corpus) against the GeoDataFrame's row count if you need to know how many were excluded; the CLI table prints "N located site(s) of M" so you can see the gap directly.
word_distribution raises on a zero-hit word rather than returning an empty frame (geopandas can't infer geometry on no rows). Check the word is attested first.
pleiades shows as a float in the GeoDataFrame because the column mixes ids with nulls; the id is still integral. Use SiteCoord.pleiades for the clean int.
23 sites have no Pleiades id — mostly minor findspots, peak sanctuaries, and caves. They're tracked as upstream-contribution candidates, not errors.
word_distribution matches the exact surface form. It won't normalise or fuzzy-match; pass the word as the corpus transliterates it.

See Limitations for the project-wide caveats.

Provenance

Coordinates are compiled from standard archaeological references (GORILA, Younger, public gazetteers) via the Linear A Research Workbench (Apache-2.0). See Data & Provenance and NOTICE.

Geography

Geography

Quick start (Python)

What's in the box

Corpus → GeoDataFrame

level="inscription" — one row per text

level="site" — one row per find-site

Columns

CLI equivalent

aegean geo flags

Where a word shows up — word_distribution

The gazetteer

SiteCoord

Regions

Pleiades alignment

GeoJSON export

From the CLI

From Python

Plotting

Coverage

Notes & limitations

Provenance

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pyaegean

Clone this wiki locally

`level="inscription"` — one row per text

`level="site"` — one row per find-site

`aegean geo` flags

Where a word shows up — `word_distribution`

`SiteCoord`