Skip to content
This repository has been archived by the owner on Mar 11, 2024. It is now read-only.

etalab/geozones

Repository files navigation

GeoZones

Simplistic spatial/administrative referential.

Pour une documentation relative aux niveaux administratifs français, veuillez consulter le fichier LISEZMOI.md.

This project is a set of tools to produce a shared spatial/administrative referential based on open datasets.

The purpose is to be embeddable in applications for autocompletion. There is no purpose of universality (country levels are not comparable) nor precision (most sourced datasets have a 100m precision).

These tools work on and exports WGS84 spatial data.

Requirements

This project uses MongoDB 2.6+ and GDAL as main tooling. Build tools are written in Python 3 and make use of:

  • click
  • PyMongo
  • Fiona
  • Shapely

The web interface requires Flask.

Translations requires Babel and Transifex client.

Getting started

There are many way of getting a development environment started.

Assuming you have Virtualenv and MongoDB installed and configured on you computer:

$ git clone https://github.com/etalab/geozones.git
$ cd geozones
$ virtualenv -p /bin/python3 .
$ source bin/activate
$ pip install -e .
$ geozones -h

There is a docker-compose.yml file providing a MongoDB instance. You can also run the entire tool into docker. See Using docker for more details.

Model

There are two main models:

  • level hierarchies
  • zone/territories

GeoZones use MongoDB as working storage.

Levels

They define relationships between levels and their names. They are not stored into the database but they are exported with the following properties:

Property Description
id A string identifier for the level (ie. country, fr:commune...)
label The humain string representation in English (ie. World). *
admin_level An administrative scale index (0 is the biggest and 100 the smallest level)
parents The list of known parent levels identifier

*: Labels are optionally translatables

You can contribute your country specific levels. Currently geozones support the following levels:

Common levels

identifier administrative level description
country-group 10 Groups of countries (World, UE...)
contry 20 A country
country-subset 30 An administrative subset of a country

French levels

identifier administrative level description
fr:region 40 Regions of France
fr:epci 68 Intercommunality of France
fr:departement 60 Departements of France
fr:collectivite 60 French overseas collectivities
fr:arrondissement 70 Arrondissements of France
fr:commune 80 Communes of France
fr:canton 98 Cantons of France
fr:iris 98 Iris of France

Luxembourguish levels

identifier administrative level description
lu:district 40 District of Luxembourg
lu:canton 60 Canton of France
lu:commune 80 Communes of France

Zones

A zone is a spatial polygon for a given level. It has at least one unique code (unique on its level) and a name. It can have many known keys, that are not necessarily unique (ie. postal codes can be shared by many towns).

Labels are optionally translatable.

Some zones are defined as an aggregation of other zones. They are called aggregation in geozones and built after all data are loaded.

The following properties are exported in the GeoJSON output:

Property Description
id A unique identifier defined by <level>:<code>[@creation]
code The zone unique identifier in this level
level The level identifier
name The zone display name (can be translatable)
population Estimated/approximative population (optional)
area Estimated/approximative area in km² (optional)
wikidata A Wikidata node identifier (optional)
wikipedia A Wikipedia reference (optional)
dbpedia A DBPedia reference (optional)
flag A DBPedia reference to a flag (optional)
blazon A DBPedia reference to a blazon (optional)
keys A dictionary of known keys/code for this zone
parents A list of every known parent zone identifier
ancestors A list of ancestors (optional)
successors A list of successors (optional)
validity A date range validity (start/end) (optional)

Note that you can choose via the keys option which properties you would like to export during the distribution step.

Translations

Level names and some territories are translatable. They are provided as gettext files. Translations are handled on transifex.

Here’s the workflow:

# Ensure you have the optionnal tools to process translations
$ pip install -e .[i18n]
# Extract translatabls labels
$ pybabel extract -F babel.cfg -o geozones/translations/geozones.pot .
# Push updated translations template to Transifex
$ tx push -s
# Fetch last translations from Transifex
$ tx pull
# Compile translations for packaging/distribution
$

To add an extra language:

$ pybabel init -D geozones -i geozones/translations/geozones.pot -d geozones/translations -l <language code>
$ tx push -t -l <language code>

Commands

A set of commands are provided for the build process. You can list them all with:

$ geozones --help

download

Download the required datasets. Datasets will be stored into a downloads subdirectory.

load

Load and process datasets into database.

aggregate

Perform zones aggregations for zones defined as aggregation of others.

postprocess

Perform some non geospatial processing (ex: set the postal codes, attach the parents…).

--exclude and --only options make possible to run a set of postprocess function(s).

dist

Dump the produced dataset as GeoJSON files for distribution. Files are dumped in a build subdirectory.

full

All in one task equivalent to:

# Perform all tasks from download to distibution
$ geozones download preload load aggregate postprocess dist

explore

Serve a web interface to explore the generated data.

status

Display some useful informations and statistics.

Commands are chainable so you can write:

# Perform all tasks from download to distibution
$ geozones download load -d aggregate postprocess dist dist -s status

sourceslist

Generate a datasets donwload list for external usage.

This allows using an external download manager by example.

Ex: using 10 parallels threads with curl:

mkdir download && cd download && geozones sourceslist | xargs -P 10 -n 1 curl -O

logos

Fetch zones logos/flags/blazons from Wikipedia when available.

Options

serialization

You can export data in (Geo)JSON or msgpack formats.

The msgpack format consumes more CPU on deserialization but does not take many gigabytes of RAM given that it can iterate over data without loading the whole file.

Reused datasets

Using in docker

Middleware only

If you only want a MongoDB instance in docker and continue using a native Python environment, just use the provided docker-compose.yml as it is:

docker-compose up -d

Your MongoDB instance will be available on localhost:27017.

Complete stack

If you want to run the entire application within docker, you can use a docker-compose.override.yml to add an extra docker instance for geozones.

A sample docker-compose.override.yml is provided in docker-compose.geozones.yml.

cp docker-compose.{geozones,override}.yml
docker-compose up -d

Your MongoDB instance will be available on localhost:27017 and the explore interface localhost:5000.

Then you can run any geozones command with docker-compose run geozones <command>.

Ex:

docker-compose run geozones status

Possible improvements

Build

  • Incremental downloads, maybe with checksum check
  • Global post-processor
  • Post-processor dependencies
  • Audit trail
  • Distribute GeoZone as a standalone python executable
  • Some quality check tools

Fields

  • Global weight = f(population, area, level)

Output

  • Different precision output
  • Localized JSON outputs (Output are english only right now)
  • Translations as distributable JSON (as an alternative to the current PO/MO format)
  • Translations as Python package
  • Model versioning
  • Statistics/coverages in levels

Web interface

  • Querying
  • Only fetch zones for viewport (less intensive for lower layers)
  • A full web-service as a separate project