Permalink
000507f Jul 10, 2018
@joto @simonpoole @simon04 @mojodna
586 lines (432 sloc) 16.5 KB

Osmium Node Module Tutorial

Using the node-osmium Library

Install with

npm install osmium

and add

var osmium = require('osmium');

in your Javascript file to access it.

OSM File Access

node-osmium can read all the popular types of OSM files. It can read the XML format (without compression (.osm) or with gzip (.osm.gz) or bzip2 (.osm.bz2) compression), the PBF format (.osm.pbf), files with (.osh) or without (.osm) historic data and change files (.osc). It can also read OSM files with changesets.

To open such a file you need a File object:

var file = new osmium.File("some_file.osm");

Osmium will detect the file format from the filename suffix. It understands all the usual suffixes. If it doesn't understand the suffix, you can add a second parameter with the format. For instance the following example will force the uncompressed XML format:

var file = new osmium.File("some_file", "osm");

This will force PBF:

var file = new osmium.File("some_file", "pbf");

You can read from STDIN, but you have to set the format then:

var file = new osmium.File("-", "pbf");

You can also read directly from a URL like this:

var file = new osmium.File("http://example.com/data.osm", "osm");

Accessing OSM Data From a Node Buffer

Sometimes you have OSM data not in a file on disk but somewhere else. If you can get it into a node.Buffer you can get it into Osmium. Simply give the buffer to the osmium.File constructor instead of a file name:

var buffer = new node.Buffer();
// fill buffer with data
var file = osmium.File(buffer, "pbf"); // buffer contents in PBF format

Creating a Reader

Once you have defined a File you can open a Reader to access its data:

var file = new osmium.File("data.osm");
var reader = new osmium.Reader(file);

In this simple case you can also directly open the Reader with a file name:

var reader = new osmium.Reader("data.osm");

But if you need to specify a format or want to read from a node.Buffer you have to initialize the osmium.File first.

Defining Which Object Types to Read

OSM files usually contain nodes, ways, and/or relations. Some special OSM files can contain changeset entries. If you only want to read some of those object types, you can specify which by adding an extra object parameter when initializing the Reader.

In case you are only interested in nodes and ways it would look like this:

var reader = new osmium.Reader("data.osm", { node: true, way: true });

The default is to read all the different object types. But to improve performance it is recommened to only read the object types you really need. That way other object types are discarded on the C++ side avoiding the costly C++-to-Javascript conversion.

Getting the File Header

The Reader object gives you access to the OSM file header:

var reader = new osmium.Reader("data.osm");
var header = reader.header();

The header object contains the generator, the software that created this file:

header.generator; // ===> 'CGImap 0.2.0'

And it contains the bounding box(es) of the data. (OSM files can have zero or more bounding boxes, usually they have one.)

var bounds = header.bound[0];
var min_lon = bounds.left();
var max_lon = bounds.right();
var min_lat = bounds.bottom();
var max_lat = bounds.top();

Defining a Handler

Osmium will generate events for each object it reads. Those events can be handled by your code. For this you need to define a handler and define some callbacks on it:

var handler = new osmium.Handler();
handler.on('node', function(node) {
    console.log("got a node with id ", node.id);
});

There are callbacks like the above for each type of OSM object: node, way, relation, and changeset.

Sometimes you need to run initialization or finalization code. Use the init and done callbacks:

handler.on('init', function() { ... });
handler.on('done', function() { ... });

There are also "before" and "after" callbacks called between objects of different types:

handler.on('before_nodes', function() { ... });
handler.on('after_nodes', function() { ... });
handler.on('before_ways', function() { ... });
...

Handler Options

Sometimes you are only interested in nodes with tags. But the majority of nodes do not have any tags at all. As an optimization you can tell the handler to only give you tagged nodes:

var handler = new osmium.Handler();
handler.options({ 'tagged_nodes_only': true });

Accessing OSM Objects

Through the node, way and relation callbacks you get access to the OSM objects.

OSM objects are read-only and you can not create them yourself. Think of them not as real objects, but convenient accessors to the data in the OSM file. You can not keep the objects around outside the handler callback. If you need to retain some data from the objects, extract the data you need and put it in your own data structure. OSM files can be quite large, so make sure you'll store the data efficiently.

Accessing OSM Object Attributes

Each object has the usual attributes:

var handler = new osmium.Handler();
handler.on('node', function(node) {
    console.log(node.id);        // unique id of this object
    console.log(node.version);   // version of this object
    console.log(node.changeset); // changeset id of this object
    console.log(node.uid);       // user id
    console.log(node.user);      // name of user
});

Accessing the timestamp when this object version was created is a bit more complex. To get a Date object with this information call the timestamp() function:

    console.log(node.timestamp());

Because this is an expensive operation and often a full Date object is not needed, you can also get the timestamp as a simple number counting the seconds since midnight January 1, 1970:

    console.log(node.timestamp_seconds_since_epoch);

The node.visible attribute tells you whether an object is visible or has been deleted. For normal OSM files it is always true, but if the file contains old versions of the OSM data ("history file") or is a "change file" (.osc), this attribute can be true or false.

The same works for ways and relations. Changeset objects are different and not documented here.

Accessing the Tags

Of course you can also get access to the tags:

way.tags(); // ==> { "highway": "residential", "maxspeed": "50" }

or ask for a specific key:

way.tags("highway"); // ==> "residential"

Use the second form if you are only interested in a few tags, because it is faster.

Accessing Node Locations And the Coordinates Property

OSM node objects contain the location of the node, the coordinates. You can access them in several ways:

node.lon;           // ==> Number between -180.0 and 180.0
node.lat;           // ==> Number between -90.0 and 90.0
node.coordinates;   // ==> Coordinates object

The Coordinates object returned by the coordinates property has lon and lat properties:

var c = node.coordinates;
console.log(c.lon);
console.log(c.lat);

Unlike the Node object, the Coordinates object can created by you and it can be copied around and used like any normal Javascript object.

Accessing Way Nodes

Ways have a reference to the IDs of the nodes they use:

way.node_refs();  // ==> [12345, 629375, 273054] (Array with node IDs)
way.node_refs(1); // ==> 629375  (ID of 2nd node)

You can not access a node object this way, only the ID. See below for accessing the node locations.

If all you need is the number of referenced nodes, use the nodes_count property of the way object:

way.nodes_count;  // ==> number of nodes in this way

Accessing Relation Members

Relations have data about their members. To access them use the members() function:

relation.members();   // ==> Array of members
relation.members(2);  // ==> Third member

A member is represented as an Object with three keys:

  1. type - te type of the member ('n', 'w', or 'r')
  2. ref - the ID of the member
  3. role - the role of the member

If all you need is the number of members, use the members_count property of the relation object:

relation.members_count;  // ==> number of members in this way

The LocationHandler

OSM ways contain only references to the nodes, but in most cases you don't need the ID but the location. This is where the LocationHandler comes into play. Initialize and call the LocationHandler like this:

var reader = new osmium.Reader("some_file.osm");
var location_handler = new osmium.LocationHandler();
var handler = new osmium.Handler();
// set up your handler callbacks
osmium.apply(reader, location_handler, handler);

This will call the location handler before your handler. When the location handler encounters nodes, it will store their location. Later, reading the same file it will "add" the locations to the ways so that your handler will see ways with added node locations. You can access them like this:

way.node_coordinates();

This will return an Array with osmium.Coordinates objects. It is also the basis for the geometry functions described below.

The LocationHandler can use different strategies for storing the node locations. Which strategy is the best depends on the size of the input file and the amount of main memory you have. Here are some of the available options:

  • sparse_mem_array (default) - use this for small (city) to medium (country) sized data files.
  • sparse_mmap_array - a bit more efficient than sparse_mem_array, but only available on Linux.
  • dense_mmap_array - best memory efficiency for large countries and planet sized data files. You will need main memory of size (8 bytes times the highest node ID), for a planet thats currently on the order of 32 GByte main memory! Only available on Linux.
  • sparse_file_array (default) - use this for small (city) to medium (country) sized data files if you are short on memory.
  • dense_file_array - best memory efficiency for large countries and planet sized data files, but uses hard disk instead of memory. Use for very large data files if you are on OSX (and therefore can't use the "mmap" type) or if you don't have enough main memory. Will, of course, be slow compared to the other strategies.

To set the strategy initialize the LocationHandler with its name:

var location_handler = new osmium.LocationHandler("dense_mmap_array");

Geometry Functions

Usually you don't want (arrays of) raw coordinates, but you want geometries in some standard format. Osmium supports WKT (Well Known Text), WKB (Well Known Binary) and GeoJSON representations of geometries for nodes and ways. You have to use a LocationHandler as described above for these functions to work for ways.

WKT

The wkt() function called on a node or way returns a String with a representation of the geometry:

node.wkt(); // ===> "POINT(1.56 9.20)"
way.wkt();  // ===> "LINESTRING(1.56 9.20, 4.56 10.29)"

See http://en.wikipedia.org/wiki/Well-known_text for a description of the WKT and WKB format.

WKB

The wkb() function called on a node or way returns a node.Buffer with a binary representation of the geometry.

node.wkb();
way.wkb();

See http://en.wikipedia.org/wiki/Well-known_text for a description of the WKT and WKB format.

GeoJSON

The geojson function called on a node or way returns an Object with a representation of the geometry according to the GeoJSON spec:

node.geojson();

will result in something like this:

{
    type: 'Point',
    coordinates: [-120.1891610, 48.4655800]
}

and

way.geojson();

will result in something like this:

{
    type: 'LineString',
    coordinates: [
        [-120.1796227, 48.4798110],
        [-120.1787663, 48.4802976]
    ]
}

This is, of course, only the "geometry part" of a full feature, you have to add the rest yourself. See the demo/geojson-stream example for a complete program using the GeoJSON function.

Working with Multipolygons

OSM doesn't have a data type for areas or polygons. Instead areas are stored as closed ways (ie first node == last node) or relations tagged as type=multipolygon. Osmium can hide this complexity and create pseudo-objects called "Areass" that are either based on closed ways or on those multipoygon relations. Usually OSM files have to be read twice to allow this, though. In the first pass, relations are read and prepared, in the second pass nodes and ways are read and everything is assembled. In addition to the node, way, and relation callbacks, you can define an area callback.

Here is an example:

var handler = new osmium.Handler();

handler.on('area', function(area) {
    var landuse = area.tags('landuse');
    if (landuse) {
        console.log(area.wkt() + ' ' + landuse);
    }
});

var mp = new osmium.MultipolygonCollector();

var reader = new osmium.Reader(input_filename);
mp.read_relations(reader);
reader.close();

reader = new osmium.Reader(input_filename);
var location_handler = new osmium.LocationHandler();
osmium.apply(reader, location_handler, handler, mp.handler(handler));
reader.close();

The code looks a bit complicated, because it is modelled after the C++ it is based on. It is likely we'll make this easier at some point. You can not re-use the reader from the first pass, you have to create a new one for the second pass.

Working With Buffers

Instead of calling apply() you can call read() on the Reader and you'll get an osmium.Buffer with zero or more OSM entities in it. If there is no more data in the file, read() returns undefined:

var reader = new osmium.Reader("foo.osm");
var buffer;
while (buffer = reader.read()) {
    // do something here
}

You can call apply() with a Buffer and handlers, just like you would do with the Reader:

osmium.apply(buffer, handler1, handler2);

There are no guarantees how many entities are in one of those buffers. They can even be empty, but usually they will contain a few thousand entities, so they are "small" in comparison to a big OSM data file. So if you call apply() on a Buffer it will return much sooner than when calling apply() on the Reader. This allows you some amount of control over how fast the input is read.

var reader = new osmium.Reader("foo.osm");
var buffer;
while (buffer = reader.read()) {
    osmium.apply(buffer, handler1, handler2);
    // wait here if needed to slow down reading the file
}

Note that calling apply() repeatedly in this way on the buffers instead of once on the Reader, confuses the done, init, and before_* and after_* handlers. You will get those handlers called for each apply() separately.

Iterating Over the Contents of a Buffer

Instead of calling apply() you can iterate over the contents of the buffer by calling next():

var object;
while (object = buffer.next()) {
    console.log(object.id());
}

buffer.next() returns undefined when there is no more data.

If you use the buffer.next() call, you'll get the next object, whatever that is, you have to check with object instanceof osmium.Node or so if you got the right type. Note that you can not use the LocationHandler this way.

Together you can read the content of a file like this:

var reader = new osmium.Reader("foo.osm");
var buffer;
while (buffer = reader.read()) {
    var object;
    while (object = buffer.next()) {
        // do something here with object
    }
}

Creating an Osmium Buffer From a Node Buffer

You can also create an osmium.Buffer from a node.Buffer:

var node_buffer = new node.Buffer;
// fill it with OSM data somehow
var osmium_buffer = new osmium.Buffer(node_buffer);

This buffer can now be used like the buffers we got from the Reader.

A Complete Example

Finally here is a complete example to get you started: This parses an OSM file and creates a node handler callback to count the total number of nodes:

var osmium = require('osmium');
var reader = new osmium.Reader("test/data/winthrop.osm");
var handler = new osmium.Handler();
var nodes = 0;
handler.on('node', function(node) {
    ++nodes;
});
osmium.apply(reader, handler);
console.log(nodes);

Result:

1525