geoextent v0.13.0
The biggest release since the project went public: 79 commits, 23+ new content providers, a new plain-text source backed by spaCy NER, journal article landing-page support, and EPSG:4326-native coordinate order on by default.
Cite: 10.5281/zenodo.3925693 · PyPI: geoextent 0.13.0 · Docs: https://nuest.github.io/geoextent/
Highlights
🆕 Plain-text source via spaCy NER (#112)
A new handle_text handler runs spaCy NER over plain-text inputs, resolves place mentions through a gazetteer (Nominatim by default — no API key), and resolves named geological time periods through a bundled ICS GTS2020 chart (178 eons, eras, periods, epochs, ages). Calendar dates, decade and century envelopes, range expressions ("between 2010 and 2015"), and deep-time periods (Holocene, Mesozoic Era, Late Cretaceous) all flow into bbox / tbox. Pre-CE temporal extents use signed ISO 8601 year strings.
- New optional install extra:
pip install geoextent[nlp] - New
from_text()API; new--text STRING/-(stdin) / file / directory inputs - New
--annotate {auto,ansi,brackets,off}for in-terminal match highlighting - Standoff offsets contract: every mention carries
char_start/char_endindices into an NFC-normalisedsource_textecho
See the text extraction how-to and the highlighting how-to.
🆕 Journal landing-page support: journals/ umbrella (#76)
A new journals/ content-provider package extracts spatial and temporal extent from journal article landing pages with the ojsGeo (Open Journal Systems) or janeway_geometadata plugins. Source-preference priority is richer-geometry-first: JSON-LD spatialCoverage → <link rel="alternate" type="application/geo+json"> → DC.SpatialCoverage (GeoJSON / WKT) → DC.box → ISO 19139 EX_GeographicBoundingBox → OJS administrativeUnits → ICBM / geo.position points.
The existing Pensoft provider was refactored into the same hierarchy (public API unchanged). Article DOIs are lifted from the HTML head — JSON-LD identifier, citation_doi, prism.doi, DC.Identifier — and fed into --external-metadata enrichment so a user can pass a journal article URL (not a DOI) and still get CrossRef / DataCite records.
See the journals examples page.
🆕 28 new content providers
Repositories: InvenioRDM (generalised from Zenodo, covers CaltechDATA, TU Wien, Frei-Data, GEO Knowledge Hub, TU Graz, Materials Cloud Archive, FDAT, DataPLANT ARChive, KTH, Prism, NYU Ultraviolet) · B2SHARE (EUDAT) · Mendeley Data · 4TU.ResearchData · RADAR · NSF Arctic Data Center · DEIMS-SDR (follows external DOIs) · HALO DB · GBIF · SEANOE · UKCEH · NFDI4Earth Knowledge Hub · DataONE (covers KNB, PISCO, EDI/LTER, NEON, BCO-DMO, ESS-DIVE …) · Wikidata (via SPARQL) · Dataverse instances: ioerDATA, heiDATA, Edmond · CSW-based: BAW, MDI-DE, GDI-DE · STAC (any compliant Collection) · Generic CKAN (data.gov.uk, GovData.de, data.gov.au, …) · GitHub, GitLab, Forgejo/Gitea, Software Heritage · GeoScienceWorld · journals/ umbrella (OJS, Janeway, Pensoft re-folded)
🆕 New file-format support
- Cloud Optimized GeoTIFF (COG) over HTTP(S) via GDAL
/vsicurl/— header-only metadata extraction without downloading the file (#11) - Point clouds (LAS/LAZ) via laspy — header-only bbox extraction and temporal extent from creation date
- Esri File Geodatabase (
.gdb) via GDAL's OpenFileGDB driver - Zarr (
.zarr) V2 and V3 via GDAL's Zarr driver (#9)
🆕 Other features
--metadata-firstsmart strategy: try metadata-only extraction, fall back to data download. Automatic metadata fallback after empty data downloads is on by default (opt out with--no-metadata-fallback).--time-formatfor configurable temporal output: date-only (default), ISO 8601, or anystrftime(#39)-p/--parallelthread-based parallel file extraction within directories;workers=NAPI parameter (#34)progress_callbackstructured progress forfrom_file()/from_directory()/from_remote()— three built-in callbacks (tqdm, logging, list-collecting) (#80)--map,--preview,--map-dimfor static map preview on OpenStreetMap tiles (terminal display via term-image;pip install geoextent[preview]) (#35)--jointo merge multiple--outputexports into one file (GPKG / GeoJSON / CSV)- AppImage for portable Linux distribution — single-file executable bundling Python + GDAL + PROJ via conda-forge + appimagetool; built on every tag push (#40)
- Interactive download-size confirmation (
--max-download-sizewithdownload_size_soft_limit=True) - Temporal extent from raster files: NetCDF CF time dimensions, GeoTIFF
TIFFTAG_DATETIME, ACDDtime_coverage_start/end, band-levelACQUISITIONDATETIME(#22) - GeoCSV format:
CoordX/CoordYcolumns,.prjsidecars, WKT geometry columns, EarthScope#-prefixed metadata (#52)
⚠️ Breaking changes
- Default coordinate order is now EPSG:4326 native
[lat, lon]for plain bounding boxes. Output bbox is[minlat, minlon, maxlat, maxlon]instead of[minlon, minlat, maxlon, maxlat]. GeoJSON output continues to use[lon, lat]per RFC 7946. Pass--legacy(orlegacy=Truein the API) to keep the previous[lon, lat]order for plain bboxes. - API renamed to snake_case (PEP 8):
from_file(),from_directory(),from_remote(). The camelCase aliases (fromFile,fromDirectory,fromRemote) andfrom_repository()are removed. Internal handler modules and functions are also snake_case. - Drop support for bare numeric Zenodo record IDs (e.g.
820562); use the DOI (10.5281/zenodo.820562) or URL (https://zenodo.org/records/820562). --placenamedefault service changed fromgeonames(which requiresGEONAMES_USERNAME) tonominatim(no API key). Use--placename-service geonamesto opt back.
Selected bug fixes
- Reject raster / vector files whose coordinates are projected but falsely declared WGS84
- Validate bboxes against WGS84 ranges before merging
--convex-hullstrips multi-hundred-KB admin polygons fromplace_namesprovenance after consuming them--placenameruns only on the final merged extent (no more per-file duplicate warnings)- Fix
convex_hull_mergefor degenerate single-point and two-point line-segment hulls - Text-file detection now combines extension, MIME, basename allowlist (
README,LICENSE,CHANGELOG) and a content sniff — runs NER on extensionless project files while explicitly skipping source code (.py,.js,.sh,.yaml, …) --geojsonionow reports the precise endpoint and cause when the geojson.io Gist fallback fails (commonly anonymous-gist 401)
Full diff: v0.12.0...v0.13.0
Full changelog: https://nuest.github.io/geoextent/changelog.html#id1
Install
pip install geoextent==0.13.0
# Optional extras
pip install 'geoextent[nlp]==0.13.0' # spaCy NER for text inputs
pip install 'geoextent[preview]==0.13.0' # static map / terminal previewAppImage: see the Linux AppImage how-to.
Citing this release: Nüst, Daniel; Garzón, Sebastian; Drechsler, Lars and Qamaz, Yousef. (2026, May 15). geoextent (Version v0.13.0). Zenodo. https://doi.org/10.5281/zenodo.3925693