https://topo.transport.data.gouv.fr/ a semantic database of transit objects (stops, lines, networks…).
This program is a toolkit to populate that database from a GTFS file.
It allows to populate missing features in the database.
The tool is designed to be idempotent: importing twice the same file, or different files from the same producer won’t generate any duplicate.
Transit topo tools are written in Rust.
You need an up to date rust tool-chain (commonly installed with rustup).
Note: all binaries expose a --help
cli argument to document all the available arguments.
You can use the tool import-gtfs
to import a GTFS in TOPO.
Identifiers of entities can be the same across different producers. That is why we require to tell which producer
is providing the GTFS.
The producer
needs to be already added to the transport TOPO instance.
cargo run --release --bin import-gtfs -- --api <url of the wikibase api> --sparql <url of the sparql api> --producer <id of the producer> -i <path to gtfs.zip>
You can use the tool entities
to add or search for entity in TOPO.
This can be useful to explore or manage TOPO with cli tool.
You can search for entities with some claims with the search
endpoint.
Eg. to get the id of the item with the topo_id_id (P1
) "route":
cargo run --bin entities -- search --api <url of the wikibase api> --sparql <url of the sparql api> --claim 'P1="route"'
Note: the --claim
is directly passed to the sparql endpoint, so you need to know a bit sparql to use this.
Note: The string must be quoted with ""
, the URL with <>
-
query entities with the label "bob" (note the
""
around the label, and the@en
telling where looking for the english label):cargo run --bin entities -- search --api --sparql --claim 'rdfs:label="bob"@en'
-
query all producers:
cargo run --bin entities -- search --api --sparql --claim '@instance_of=@producer'
-
query entities that have property P42 with value
https://transport.data.gouv.fr/datasets/5bfd2e81634f4122b3023260
, which is of typeurl
(note the<>
around the url):cargo run --bin entities -- search --api --sparql --claim 'P42=https://transport.data.gouv.fr/datasets/5bfd2e81634f4122b3023260'
You can create entities with the create
endpoint.
-
create a property "data_gouv_id" of type url:
cargo run --bin entities create "data_gouv_url" --type urlproperty
-
To create an item "bob", which is an instance of
producer
(and we want only one producer named "bob"), with a property data_gouv_url "https://www.data.gouv.fr/datasets/5dc41db9634f417610c24a9d" (If the property does not exists yet, we create it) :cargo run --bin entities create "bob" --type item --unique-claim "@instance_of=@producer" --claim "$(cargo run --bin entities create "data_gouv_url" --type urlproperty)=https://www.data.gouv.fr/datasets/5dc41db9634f417610c24a9d"
To build the project, run:
make build
The integration tests are based on docker and docker-compose, you need those tools installed.
To run the tests run:
make test
Note: docker need some root privileges, you might need to run this with more privileges (or use other controversial means)
This project needs a running wikibase instance. For dev purpose, you can use the provided docker-compose.
To set up a wikibase instance, you can use the Makefile target:
make docker-up
Note: the docker files are split between a minimal one (used in the integration tests) and another one used to ease use. So if you want to run custom docker-compose
command, use:
docker-compose -f tests/minimal-docker-compose.yml -f local-compose.yml <your-command>
The wikibase instance is quite long to start, you'll need to wait a bit (several minutes). You know the services are available by querying the wikibase api:
curl --head http://localhost:8181/api.php # This need to return a http response, with a `200` status code
When the service is available, you can prepopulate the base (to add the mandatory data, like the instance of
property, ...)
cargo run --release --bin prepopulate -- --api http://localhost:8181/api.php
The idea is that each GTFS provider needs to have its own producer
page in transit_topo.
This way all data added by this producer
will be attached to it.
To create a producer, you can use the cli tool provided:
cargo run --release --bin entities -- create <name of the producer> --type item --unique-claim @instance_of=@producer --api http://localhost:8181/api.php --sparql http://localhost:8989/bigdata/sparql
The cli tool will give you an ID. Note this id, it will be needed by the other cli tools.
Note: if you forgot the id, you can call again the cli tool, it will not recreate a producer with the same label.
Once this is done, you can import the GTFS.
So to import the GTFS run:
cargo run --release --bin import-gtfs -- --api http://localhost:8181/api.php --sparql http://localhost:8989/bigdata/sparql --producer <id of the producer> -i <path to gtfs.zip>
All datasets from https://transport.data.gouv.fr/ have been loaded in https://topo.transport.data.gouv.fr/.
A graphical interface to test can be found here. The queries can also be done directly on the sparql end point: https://sparql.topo.transport.data.gouv.fr/bigdata/sparql?format=json&query=<query>
You can query all the stops around a point and the line that passes though eacg stops with this query (or with curl):
select ?place ?gtfsName ?gtfsId ?location ?lineLabel ?modeLabel
WHERE {
?place wdt:P7 ?gtfsName.
?place wdt:P2 ?gtfsId.
?place wdt:P15 ?line.
?line wdt:P8 ?mode.
SERVICE wikibase:around {
?place wdt:P50 ?location.
bd:serviceParam wikibase:center "Point(-1.0253558 45.6309576)"^^geo:wktLiteral.
bd:serviceParam wikibase:radius "0.5" .
bd:serviceParam wikibase:distance ?dist.
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
You can query all the TOPO 'hardcoded' relations (those can be items or properties) with this query:
select ?o ?topoProp
WHERE {
?o wdt:P1 ?topoProp. # wdt:P1 is the property used to mark all properties / items used by TOPO
}
You can do a bulk query to get all stops with their routes by doing a paginated query (be careful the queries can become quite slow) with this query (or just with curl):
select ?place ?gtfsName ?gtfsId ?location ?lineLabel ?modeLabel
WHERE {
?place wdt:P7 ?gtfsName.
?place wdt:P2 ?gtfsId.
?place wdt:P15 ?line.
?line wdt:P8 ?mode.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?gtfsName
LIMIT 1000
OFFSET 500