GTFS to Linked Connections
Transforms a GTFS file into a directed acyclic graph of actual connections.
A connection is the combination of a departure and its successive arrival of the same trip. Our goal is to retrieve a list of connections that is sorted by departure time, better known as a Directed Acyclic Graph. This way, route planning algorithms can be performed.
More information and live demo at https://linkedconnections.org
Converting your GTFS to (linked) connections
Step 0: Installation
Install it using the Node Package Manager (npm).
npm install -g gtfs2lc
Step 1: discover a GTFS file
If you haven’t yet picked a GTFS file you want to work with, different repositories exist. Our favorite ones:
Yet, you may also directly ask your local public transport authority for a copy.
Mind that we have not tested our code with all GTFS files yet, and there are known limitations.
Step 2: unzip your GTFS
You can use your favorite unzipper. E.g.,
unzip gtfs.zip should work fine.
Step 3: Order and clean your CSV files
This process is now run automatically so you can skip to Step 4. But you can still use it independently using the enclosed bash script
gtfs2lc-clean <path>. Next to cleaning and sorting, it also unifies newlines and removes UTF-8 artifacts.
If step 4 would not give the desired result, you might want to tweak the script manually. In order for our script to work:
- stop_times.txt must be ordered by
- calendar.txt must be ordered by
- calendar_dates.txt must be ordered by
Step 4: Generate connections!
Successfully finished the previous steps? Then you can now generate actual departure and arrival pairs (connections) as follows:
gtfs2lc /path/to/extracted/gtfs -f json
We support other formats such as
csv as well.
For big GTFS files, your memory may not be sufficient. Luckily, we’ve implemented a way to use your hard disk instead of your RAM. You can enable this with an option:
gtfs2lc /path/to/extracted/gtfs -f json --store LevelStore.
Step 5: Generate Linked Connections!
When you download a new GTFS file, all identifiers in there might change and conflict with your previous export. Therefore, we need to think about a way to create global identifiers for the connections, trips, routes and stops in our system. As we are publishing our data on the Web, we will also use Web addresses for these global identifiers.
baseUris-example.json for an example on URI templates of what a stable identifier strategy could look like. Copy it and edit it to your likings. For a more detailed explanation of how to use the URI templates see the description at our
GTFS-RT2LC tool, which uses the same strategy.
Now you can generate Linked Data in JSON-LD as follows:
gtfs2lc /path/to/extracted/gtfs -f jsonld -b baseUris.json
That’s it! Want to serve your Linked Connections over HTTP? Take a look at our work over here: The Linked Connection’s server (WIP)
Post-processing joining connections, and adding nextConnection properties
In GTFS, joining and splitting trains are fixed in a horrible way. See https://support.google.com/transitpartners/answer/7084064?hl=en for more details.
In Linked Connections, we can solve this gracefully by adding a nextConnection array to every connection. A splitting train is then, on the last connection before it is split, indicate 2 nextConnection items.
On your newline delimited jsonld file, you can perform this script in order to make that work:
Next to the jsonld format, we’ve also implement the “
mongold” format. It can be directly used by the command
mongoimport as follows:
gtfs2lc /path/to/extracted/gtfs -f mongold -b baseUris.json | mongoimport -c myconnections
Mind that only MongoDB starting version 2.6 is supported and mind that it doesn’t work at this moment well together with the post-processing step of joining trips.
Even more options
For more options, check
How it works (for contributors)
We first convert
stop_times.txt to connection rules called
Service dates are processed through
calendar.txt, that was processed at the same time.
In the final step, the connection rules are expanded towards connections by joining the days, service ids and connectionRules.
Post-processing steps work directly on the output stream, and can map the output stream to Linked Data. Connections2JSONLD is the main class to look at.
Another post-processing step is introduced to fix joining and splitting trips.
Not yet implemented
At this moment we've only implemented a conversion from the Stop Times to connections. However, in future work we will also implement a system for describing trips and routes, a system for transit stops and a system for transfers in Linked Data.
frequencies.txt is not supported at this time. We hope to support this in the future though.