# Data Science with OpenStreetMap and Wikidata

### Nikolai Janakiev [@njanakiev](https://twitter.com/njanakiev/)

# Outline

## Part I: Wikidata and OpenStreetMap

- Difference between Wikidata and OpenStreetMap
- Ways to connect data between Wikidata and OpenStreetMap

## Part II: Data Science with Wikidata and OSM

- Used Libraries and Tools
- Exhibition of Various Analyses and Results

# OpenStreetMap Elements

![OSM Elements](assets/osm_elements.png)

# Metadata in OpenStreetMap

![OSM Key Amenity](assets/osm_key_amenity.png)

![OSM Salzburg](assets/osm_salzburg.png)

# Wikidata is a Knowledge Graph

![Knowledge Graph](assets/knowledge_graph.png)

![Wikidata Map](assets/wikidata_july_2019.png)
[Wikidata Map July 2019](https://addshore.com/2019/07/wikidata-map-july-2019/) by Addshore

# Semantic Web and Linked Data

- Linking data sets and entities on the Web

- Core W3C standard: [Resource Description Format (RDF)](https://en.wikipedia.org/wiki/Resource_Description_Framework)

- __1,229 datasets__ with __16,125 links__ (as of June 2018)

![Linked Open Data](assets/linked_open_data.png)

[https://lod-cloud.net/](https://lod-cloud.net/)

![Wikipedia Wikidata Link](assets/wikipedia_wikidata_link.png)

![Wikidata Data Model](assets/wikidata_data_model.png)

# Querying Wikidata with SPARQL

- [https://query.wikidata.org/](https://query.wikidata.org/)

![Wikidata Query](assets/wikidata_query.png)

# All Windmills in Wikidata

```sparql
SELECT ?item ?itemLabel ?image ?location ?country ?countryLabel
WHERE {
  ?item wdt:P31 wd:Q38720.
  OPTIONAL { ?item wdt:P18 ?image. }
  OPTIONAL { ?item wdt:P625 ?location. }
  OPTIONAL { ?item wdt:P17 ?country. }
  SERVICE wikibase:label { 
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". 
  }
}
```
[Query link](https://w.wiki/5cv)

![Wikidata Windmills](assets/wikidata_windmills.png)

# Linking OpenStreetMap with Wikidata?

![OSM Wikidata Bridge](assets/osm_wikidata_bridge.jpg)

[File:WdOsm-semanticBridge.jpg](https://wiki.openstreetmap.org/wiki/File:WdOsm-semanticBridge.jpg)

# OpenStreetMap to Wikidata

- `wikidata=*`, `wikipedia=*` tags _(stable)_

# Wikidata to OpenStreetMap

- [OSM relation ID (P402)](https://www.wikidata.org/wiki/Property:P402) _(unstable)_ 

- [OSM tag or key (P1282)](https://www.wikidata.org/wiki/Property:P1282) mapping of OSM key-values to Wikidata entities (e.g. [lighthouse](https://www.wikidata.org/wiki/Q39715) and [Tag:man_made=lighthouse](https://wiki.openstreetmap.org/wiki/Tag:man_made=lighthouse))

- [Permanent ID](https://wiki.openstreetmap.org/wiki/Permanent_ID) proposal

# Data Science

# Used Tools and Libraries

- [Jupyter](https://jupyter.org/) - interactive notebook development environment
- [PostGIS](https://postgis.net/) - spatial database extender for [PostgreSQL](http://postgresql.org/)

# Python Libraries

- [NumPy](https://www.numpy.org/) - numerical and scientific computing
- [Pandas](https://pandas.pydata.org/) - data analysis library
- [Matplotlib](https://matplotlib.org/) - 2D plotting library
- [Shapely](https://shapely.readthedocs.io/en/stable/manual.html) - analysis and manipulation of [GEOS](http://trac.osgeo.org/geos/) features
- [GeoPandas](http://geopandas.org/) - Pandas extension for spatial operations and geometric types
- [PySAL](https://pysal.org/) - spatial analysis library
- [Datashader](http://datashader.org/) - graphics pipeline system for large datasets
- [Keras](https://keras.io/) - high-level deep learning library

<img alt="wikidata europe points" src="assets/wikidata_europe_points.png" style="width: 80%; height: 80%;" />

<img alt="wikidata europe osm points" src="assets/wikidata_europe_osm_points.png" style="width: 80%; height: 80%;" />

<img alt="wikidata europe most common instances" src="assets/wikidata_europe_most_common_instances.png" style="width: 90%; height: 90%;" />

<img alt="wikidata europe companies most common instances" src="assets/wikidata_europe_companies_most_common_instances.png" style="width: 90%; height: 90%;" />

<img alt="wikidata uk companies most common instances" src="assets/wikidata_uk_companies_most_common_instances.png" style="width: 90%; height: 90%;" />

# Image Classification with Keras

- [Keras Applications](https://keras.io/applications/)

```python
# load model weights
vgg_model = vgg16.VGG16(weights='imagenet')

# load and transform image
original = load_img('cat.jpg', target_size=(224, 224))
numpy_image = img_to_array(original)

image_batch = np.expand_dims(numpy_image, axis=0)

# prepare the image for the VGG model
processed_image = vgg16.preprocess_input(image_batch.copy())

# get the predicted probabilities for each class
predictions = vgg_model.predict(processed_image)

# convert the probabilities to class labels
# We will get top 5 predictions which is the default
labels = decode_predictions(predictions)
```

# Classifying Wikimedia Commons Images

![wikidata companies images classification](assets/wikidata_companies_images_classification.png)

# Data Science with OpenStreetMap and Wikidata

### Nikolai Janakiev [@njanakiev](https://twitter.com/njanakiev/)

- Slides @ [https://janakiev.com/slides/data-science-osm-wikidata](https://janakiev.com/slides/wikidata-mayors)

## Resources

- [Wikidata - OpenStreetMap Wiki](https://wiki.openstreetmap.org/wiki/Wikidata)
- FOSSGIS 2016: [OpenStreetMap und Wikidata](https://www.youtube.com/watch?v=Zcv_7t7RcNM) - Michael Maier
- FOSDEM 2019: [Linking OpenStreetMap and Wikidata A semi-automated, user-assisted editing tool](https://www.youtube.com/watch?v=UWcZ1WKXHNo) - Edward Betts
- [WDTools](https://github.com/njanakiev/wdtools) - Wikidata Utilities and Tools