# Working and Analysing OpenStreetMap

## for Data Science Projects

by Nikolai Janakiev [@njanakiev](https://twitter.com/njanakiev/)

# OpenStreetMap

![](assets/osm.png)

# OpenStreetMap in Numbers ([Source](https://www.openstreetmap.org/stats/data_stats.html))

- __Started 2004__
- Number of users: __7.393.705__
- Number of uploaded GPS points: __8.554.200.885__
- Number of nodes: __6.849.901.910__
- Number of ways: __759.507.867__
- Number of relations: __8.873.864__

# OpenStreetMap Elements

![](assets/osm_elements.png)

# Data Structure of OpenStreetMap

- Stored as XML file, typically found compressed as [Protocol Buffers](https://developers.google.com/protocol-buffers).

```xml
<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="osmium/1.8.0">
  <bounds minlat="47.04774" minlon="9.471078" maxlat="47.27128" maxlon="9.636217"/>
  <node id="26863444" version="6" timestamp="2019-05-19T15:23:01Z" lat="47.1666716" lon="9.5608307">
    <tag k="ele" v="2122"/>
    <tag k="name" v="Kuhgrat"/>
    <tag k="natural" v="peak"/>
    <tag k="wikidata" v="Q4244296"/>
  </node>
  ...
```

# OpenStreetMap Ways

```xml
  <way id="4781367" version="3" timestamp="2012-05-14T16:51:34Z">
    <nd ref="30604007"/>
    <nd ref="1752681738"/>
    <nd ref="30604015"/>
    <nd ref="30604017"/>
    <nd ref="30604019"/>
    <nd ref="1752681852"/>
    <nd ref="1752681861"/>
    <nd ref="30604020"/>
    <nd ref="1743684563"/>
    <tag k="name" v="In den Äusseren"/>
    <tag k="highway" v="residential"/>
  </way>
  ...
```

# OpenStreetMap Relations

```xml
  <relation id="12473573" version="1" timestamp="2021-03-21T14:35:25Z">
    <member type="way" ref="920149763" role="inner"/>
    <member type="way" ref="44963556" role="outer"/>
    <tag k="addr:city" v="Triesen"/>
    <tag k="addr:country" v="LI"/>
    <tag k="addr:housenumber" v="16"/>
    <tag k="addr:postcode" v="9495"/>
    <tag k="addr:street" v="Gässle"/>
    <tag k="amenity" v="school"/>
    <tag k="building" v="school"/>
    <tag k="name" v="Primarschule Triesen"/>
    <tag k="type" v="multipolygon"/>
    <tag k="wheelchair" v="yes"/>
  </relation>
  ...
</osm>
```

![OSM Nodes](assets/osm_node_example.png)

![OSM Ways](assets/osm_way_example.png)

![OSM Ways](assets/osm_relation_example.png)

# Search for OpenStreetMap Elements

__Nominatim__ ([nominatim.openstreetmap.org](https://nominatim.openstreetmap.org/ui/search.html)) official OpenStreetMap search engine for forward and reverse geographic search.

Access elements directly via:

- `www.openstreetmap.org/node/[OSM_ID]`
- `www.openstreetmap.org/way/[OSM_ID]`
- `www.openstreetmap.org/relation/[OSM_ID]`

__Warnding__: OpenStreetMap ids can change over time!

# Metadata in OpenStreetMap

- Stored as key-value pairs, e.g. [Key:amenity](https://wiki.openstreetmap.org/wiki/Key:amenity)

![](assets/osm_key_amenity.png)

# Loading Data from OpenStreetMap

# Data Sources

- __Planet.osm__ ([planet.openstreetmap.org](https://planet.openstreetmap.org/)) Full OpenStreetMap data set of the whole world
    - planet-latest.osm.pbf __(57 GB)__
    - planet-latest.osm.bz2 __(102 GB)__

- __Geofabrik__ ([download.geofabrik.de](http://download.geofabrik.de/)) Free and regularly updated OpenStreetMap Extracts of various regions around the world 

- __BBBike__ ([download.bbbike.org](https://download.bbbike.org/))
    - [OSM ready extracts](https://download.bbbike.org/osm/bbbike/)
    - [OSM cusrom extracts](https://extract.bbbike.org/)

# TagInfo ([taginfo.openstreetmap.org](https://taginfo.openstreetmap.org/))

- Daily updated statistics of OpenStreetMap Tags

![](assets/taginfo.png)

# Overpass API

In [2]:
import requests

overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
    [out:json];
    area["ISO3166-1"="AT"][admin_level=2]->.search;
    node[amenity="restaurant"](area.search);
    out count;
"""
response = requests.get(overpass_url, params={'data': overpass_query})
response.json()

{'version': 0.6,
 'generator': 'Overpass API 0.7.56.9 76e5016d',
 'osm3s': {'timestamp_osm_base': '2021-04-12T08:42:11Z',
  'timestamp_areas_base': '2021-04-12T07:52:27Z',
  'copyright': 'The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.'},
 'elements': [{'type': 'count',
   'id': 0,
   'tags': {'nodes': '12610',
    'ways': '0',
    'relations': '0',
    'areas': '0',
    'total': '12610'}}]}

# PyOsmium

In [3]:
import osmium

class RestaurantHandler(osmium.SimpleHandler):
    def __init__(self):
        osmium.SimpleHandler.__init__(self)
        self.num_restaurants = 0

    def node(self, n):
        if n.tags.get('amenity') == 'restaurant':
            self.num_restaurants += 1
        
handler = RestaurantHandler()
handler.apply_file('data/liechtenstein-latest.osm.pbf')
print('Number of Restaurants: ', handler.num_restaurants)

Number of Restaurants:  45


# ogr2ogr

In [20]:
!ogrinfo data/Wien.osm.pbf

INFO: Open of `data/Wien.osm.pbf'
      using driver `OSM' successful.
1: points (Point)
2: lines (Line String)
3: multilinestrings (Multi Line String)
4: multipolygons (Multi Polygon)
5: other_relations (Geometry Collection)


# osmconf.ini

Find it in Linux when using Anaconda with `find ~/anaconda3/envs/wdl/ | grep osmconf.ini`. In this case under `~/anaconda3/envs/wdl/share/gdal/osmconf.ini`. Default configuration can be seen [here](https://github.com/OSGeo/gdal/blob/master/gdal/data/osmconf.ini).

Add desired tags in the `attributes` setting under each layer. Available layers are:

- points
- lines
- multilinestrings
- multipolygons
- other_relations

# Convert OpenStreetMap data with ogr2ogr

Convert OpenStreetMap data to GeoPackage with:

```bash
ogr2ogr -f "GPKG" \
    data/vienna-amenities.gpkg \
    data/Wien.osm.pbf \
    -where "amenity is not null" \
    POINTS \
    -nln amenity
```

# GeoPandas

Load geospatial data with [GeoPandas](https://geopandas.org/):

In [21]:
import geopandas as gpd

gdf = gpd.read_file("data/vienna-amenities.gpkg", 
    driver='GPKG')
gdf.head(5)

Unnamed: 0,osm_id,name,barrier,highway,ref,address,is_in,place,man_made,amenity,other_tags,geometry
0,1634625,,,,,,,,,recycling,"""recycling:glass_bottles""=>""yes"",""recycling:gr...",POINT (16.29701 48.18111)
1,15079895,,,,,,,,,telephone,,POINT (16.28689 48.19691)
2,15337840,OMV,,,,,,,,fuel,"""addr:city""=>""Wien"",""addr:country""=>""AT"",""addr...",POINT (16.27995 48.19791)
3,29801740,McDonald's,,,,,,,,fast_food,"""addr:city""=>""Wien"",""addr:country""=>""AT"",""addr...",POINT (16.39496 48.23346)
4,31582372,,,,,,,,,parking,"""access""=>""private"",""parking""=>""surface"",""whee...",POINT (16.14163 48.19482)


# OpenStreetMap Data Completeness

- Barrington-Leigh, Christopher, and Adam Millard-Ball. ["The world’s user-generated road map is more than 80% complete."](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0180698) PloS one 12.8 (2017): e0180698.

- List of OpenStreetMap Completeness resources: [wiki.openstreetmap.org/wiki/Completeness](https://wiki.openstreetmap.org/wiki/Completeness)