Creates a RocksDB key-value store of each version of OSM objects found in OSM history files. This history index can then be used to augment GeoJSON files of OSM objects to add a
@history property that includes a record of all previous edits.
osm-wayback is currently designed to support large(ish)-scale historical analysis of OpenStreetMap edits, specifically focused on how objects change overtime (and who is editing them).
Current Development Notes:
The history is index is keyed by
version(with separate column families for nodes, ways, and relations).
add_historywill lookup every previous version of an object passed into it. If an object is passed in at version 3, it will look up versions 1,2, and 3. This is necessary for the tag comparisons. In the event there exists a version 4 in the index, it will not be included because version 3 was fed into
add_historyis driven by a stream of (current, valid) GeoJSON objects, deleted objects are not yet supported.
Install mason to manage dependencies
git submodule init git submodule update
Then build with
mkdir build cd build cmake .. make
To use the
run.sh script, also run the following:
.mason/mason install osmium 1.9.1 .mason/mason link osmium 1.9.1 .mason/mason install tippecanoe 1.31.0 .mason/mason link tippecanoe 1.31.0 cd geometry-reconstruction npm install
A canned workflow:
run.sh script automates all of the steps to turn OSM history files into historical vector tiles with only 2 inputs:
For example, to run generate historical vector tiles from the albany example file included in
$ ./run.sh example/history_of_albany.osh.pbf example/albany
This will create the following files in the
example directory (in the following order):
|albany.osm.pbf||Latest version of (all) objects in
|albany.geojsonseq||GeoJSON sequence of objects exported by osmium export with the
|albany_INDEX||The RocksDB Index of
|albany.history||Each OSM object from
|albany.history.geometries||Each feature from
|Each feature from
||Historical vector tiles rendered at zoom 15 for albany!|
Note that once run, each of these files are standalone and can be deleted in the order they are generated. Each file is used only as the input to the next function. This workflow is the result of each utility here relyong on standalone input. For example, you could build a North-America INDEX and then lookup history for just new_york.geojsonseq. Looking up node locations will always require a second pass after histories are built. Separating these files and steps adds negligible time cost and allows tag-only history analysis.
The complete workflow
First build up a historic lookup index.
Note: For large files (Country / Planet), increase
ulimit so that RocksDB can have many files open at once (>4000 for the full planet history file).
build_lookup_index INDEX_DIR OSM_HISTORY_FILE
Second, pass a stream of GeoJSON features as produced by osmium-export to the
cat features.geojsonseq | add_history INDEX_DIR
The output is a stream of augmented GeoJSON features with an additional
@history array (see HISTORICAL_SCHEMA.md) for more on the schema of
@history. Note: If a feature is not in the input file, it's history will not be in the output file.
A fourth column family storing node locations can be created during
build_lookup_index, depending on the value of the variable,
If the node location column family exists, the
HISTORY GEOJSONSEQ may be passed to
add_geometry. This function looks up every version of every node in each historical version of the object. It adds
nodeLocations as a top-level dictionary, keyed by
node ID and then
changeset ID for each node.
cat <HISTORY GEOJSONSEQ> | add_geometry <ROCKSDB>
Will create a line-delimited stream of GeoJSON OSM objects with the
Reconstructing historical geometries (available for nodes & ways) is then done in a separate process in
node geometry-reconstruction/index.js <HISTORY GEOJSONSEQ with Node Locations>
Currently, multiple output types are supported, see
geometry-reconstruction/README.md for more information about the following output types:
- Every major and minor version are independent objects (Best for rendering historical geometries)
- Entries in the
geometryattribute (Best for historical analysis)
@historyobject is a TopoJSON object, storing every version of the object. (More efficient than 2.)