GeoZones

Simplistic spatial/administrative referential.

Pour une documentation relative aux niveaux administratifs français, veuillez consulter le fichier LISEZMOI.md.

This project is a set of tools to produce a shared spatial/administrative referential based on open datasets.

The purpose is to be embeddable in applications for autocompletion. There is no purpose of universality (country levels are not comparable) nor precision (most sourced datasets have a 100m precision).

These tools work on and exports WGS84 spatial data.

Requirements

This project uses MongoDB 2.6+ and GDAL as main tooling. Build tools are written in Python 3 and make use of:

click
PyMongo
Fiona
Shapely

The web interface requires Flask.

Translations requires Babel and Transifex client.

Getting started

There are many way of getting a development environment started.

Assuming you have Virtualenv and MongoDB installed and configured on you computer:

$ git clone https://github.com/etalab/geozones.git
$ cd geozones
$ virtualenv -p /bin/python3 .
$ source bin/activate
$ pip install -e .
$ geozones -h

There is a docker-compose.yml file providing a MongoDB instance. You can also run the entire tool into docker. See Using docker for more details.

Model

There are two main models:

level hierarchies
zone/territories

GeoZones use MongoDB as working storage.

Levels

They define relationships between levels and their names. They are not stored into the database but they are exported with the following properties:

Property	Description
id	A string identifier for the level (ie. `country`, `fr:commune`...)
label	The humain string representation in English (ie. `World`). *
admin_level	An administrative scale index (0 is the biggest and 100 the smallest level)
parents	The list of known parent levels identifier

*: Labels are optionally translatables

You can contribute your country specific levels. Currently geozones support the following levels:

Common levels

identifier	administrative level	description
`country-group`	10	Groups of countries (`World`, `UE`...)
`contry`	20	A country
`country-subset`	30	An administrative subset of a country

French levels

identifier	administrative level	description
`fr:region`	40	Regions of France
`fr:epci`	68	Intercommunality of France
`fr:departement`	60	Departements of France
`fr:collectivite`	60	French overseas collectivities
`fr:arrondissement`	70	Arrondissements of France
`fr:commune`	80	Communes of France
`fr:canton`	98	Cantons of France
`fr:iris`	98	Iris of France

Luxembourguish levels

identifier	administrative level	description
`lu:district`	40	District of Luxembourg
`lu:canton`	60	Canton of France
`lu:commune`	80	Communes of France

Zones

A zone is a spatial polygon for a given level. It has at least one unique code (unique on its level) and a name. It can have many known keys, that are not necessarily unique (ie. postal codes can be shared by many towns).

Labels are optionally translatable.

Some zones are defined as an aggregation of other zones. They are called aggregation in geozones and built after all data are loaded.

The following properties are exported in the GeoJSON output:

Property	Description
id	A unique identifier defined by `<level>:<code>[@creation]`
code	The zone unique identifier in this level
level	The level identifier
name	The zone display name (can be translatable)
population	Estimated/approximative population (optional)
area	Estimated/approximative area in km² (optional)
wikidata	A Wikidata node identifier (optional)
wikipedia	A Wikipedia reference (optional)
dbpedia	A DBPedia reference (optional)
flag	A DBPedia reference to a flag (optional)
blazon	A DBPedia reference to a blazon (optional)
keys	A dictionary of known keys/code for this zone
parents	A list of every known parent zone identifier
ancestors	A list of ancestors (optional)
successors	A list of successors (optional)
validity	A date range validity (`start`/`end`) (optional)

Note that you can choose via the keys option which properties you would like to export during the distribution step.

Translations

Level names and some territories are translatable. They are provided as gettext files. Translations are handled on transifex.

Here’s the workflow:

# Ensure you have the optionnal tools to process translations
$ pip install -e .[i18n]
# Extract translatabls labels
$ pybabel extract -F babel.cfg -o geozones/translations/geozones.pot .
# Push updated translations template to Transifex
$ tx push -s
# Fetch last translations from Transifex
$ tx pull
# Compile translations for packaging/distribution
$

To add an extra language:

$ pybabel init -D geozones -i geozones/translations/geozones.pot -d geozones/translations -l <language code>
$ tx push -t -l <language code>

Commands

A set of commands are provided for the build process. You can list them all with:

$ geozones --help

`download`

Download the required datasets. Datasets will be stored into a downloads subdirectory.

`load`

Load and process datasets into database.

`aggregate`

Perform zones aggregations for zones defined as aggregation of others.

`postprocess`

Perform some non geospatial processing (ex: set the postal codes, attach the parents…).

--exclude and --only options make possible to run a set of postprocess function(s).

`dist`

Dump the produced dataset as GeoJSON files for distribution. Files are dumped in a build subdirectory.

`full`

All in one task equivalent to:

# Perform all tasks from download to distibution
$ geozones download preload load aggregate postprocess dist

`explore`

Serve a web interface to explore the generated data.

`status`

Display some useful informations and statistics.

Commands are chainable so you can write:

# Perform all tasks from download to distibution
$ geozones download load -d aggregate postprocess dist dist -s status

`sourceslist`

Generate a datasets donwload list for external usage.

This allows using an external download manager by example.

Ex: using 10 parallels threads with curl:

mkdir download && cd download && geozones sourceslist | xargs -P 10 -n 1 curl -O

`logos`

Fetch zones logos/flags/blazons from Wikipedia when available.

Options

`serialization`

You can export data in (Geo)JSON or msgpack formats.

The msgpack format consumes more CPU on deserialization but does not take many gigabytes of RAM given that it can iterate over data without loading the whole file.

Reused datasets

Using in docker

Middleware only

If you only want a MongoDB instance in docker and continue using a native Python environment, just use the provided docker-compose.yml as it is:

docker-compose up -d

Your MongoDB instance will be available on localhost:27017.

Complete stack

If you want to run the entire application within docker, you can use a docker-compose.override.yml to add an extra docker instance for geozones.

A sample docker-compose.override.yml is provided in docker-compose.geozones.yml.

cp docker-compose.{geozones,override}.yml
docker-compose up -d

Your MongoDB instance will be available on localhost:27017 and the explore interface localhost:5000.

Then you can run any geozones command with docker-compose run geozones <command>.

Ex:

docker-compose run geozones status

Possible improvements

Build

Incremental downloads, maybe with checksum check
Global post-processor
Post-processor dependencies
Audit trail
Distribute GeoZone as a standalone python executable
Some quality check tools

Fields

Global weight = f(population, area, level)

Output

Different precision output
Localized JSON outputs (Output are english only right now)
Translations as distributable JSON (as an alternative to the current PO/MO format)
Translations as Python package
Model versioning
Statistics/coverages in levels

Web interface

Querying
Only fetch zones for viewport (less intensive for lower layers)
A full web-service as a separate project

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
.tx		.tx
geozones		geozones
js		js
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.remarkrc		.remarkrc
Dockerfile		Dockerfile
LISEZMOI.md		LISEZMOI.md
README.md		README.md
babel.cfg		babel.cfg
docker-compose.geozones.yml		docker-compose.geozones.yml
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
package-lock.json		package-lock.json
package.json		package.json
setup.cfg		setup.cfg
setup.py		setup.py
webpack.config.js		webpack.config.js

etalab/geozones

Folders and files

Latest commit

History

Repository files navigation

GeoZones

Requirements

Getting started

Model

Levels

Common levels

French levels

Luxembourguish levels

Zones

Translations

Commands

download

load

aggregate

postprocess

dist

full

explore

status

sourceslist

logos