# OpenStreetMap Data Case Study

### Map Area
I decided to explore and edit the osm data of [Hamburg in Germany](https://en.wikipedia.org/wiki/Hamburg). 
It is the city I live and work in and would like to contribute to. The osm extract was downloaded [via mapzen](https://mapzen.com/data/metro-extracts/metro/hamburg_germany/).


## Problems Encountered in the Data

The data donwloaded was passed through the subset creation code and elements were filtered for only those that contain `'node', 'way', 'relation'`.
When running the data.py on the remaining data, the following problems came up: 

- Some elements did not contain a user name or uid

In [20]:
import xml.etree.cElementTree as ET
import pprint
import re
from collections import defaultdict
import csv
import cerberus
import schema
import string
import sqlite3

OSMFILE = "/Users/lt/Git/portfolio-projects/open_streetmap_data_wrangling/hamburg.osm"
OSMSAMPLE = "/Users/lt/Git/portfolio-projects/open_streetmap_data_wrangling/sample.osm"

In [21]:
def count_tags(filename):
    output = defaultdict(int)
    for event, elem in ET.iterparse(filename):
        output[elem.tag] += 1
    return output
tags = count_tags(OSMSAMPLE)
print("Tags Numbers: ")
pprint.pprint(tags)

Tags Numbers: 


defaultdict(int,
            {'member': 12081,
             'nd': 375075,
             'node': 282964,
             'osm': 1,
             'relation': 899,
             'tag': 227612,
             'way': 51875})

In [7]:
def process_map_users(filename):
    users = set()
    for _, element in ET.iterparse(filename):
        if element.get("uid"):
            uid = element.get("uid")
            users.add(uid)
        pass
    return users

In [9]:
users = process_map_users(OSMSAMPLE)
print('Number of Contributers: ', len(users))

Number of Contributers:  2675


In [16]:
lower = re.compile(r'^([a-z]|_)*$')
lower_colon = re.compile(r'^([a-z]|_)*:([a-z]|_)*$')
problemchars = re.compile(r'[=\+/&<>;\'"\?%#$@\,\. \t\r\n]')

def key_type(element, keys):
    if element.tag == "tag":
        k_tag = element.get('k')
        if re.search(lower, k_tag):
            keys["lower"] += 1
        elif re.search(lower_colon, k_tag):
            keys["lower_colon"] += 1
        elif re.search(problemchars, k_tag):
            keys["problemchars"] += 1
        else:
            keys["other"] += 1
        pass
        
    return keys

def process_map(filename):
    keys = {"lower": 0, "lower_colon": 0, "problemchars": 0, "other": 0}
    for _, element in ET.iterparse(filename):
        keys = key_type(element, keys)

    return keys

In [19]:
keys = process_map('sample.osm')
print("Regex check of tags: ")
pprint.pprint(keys)

Regex check of tags: 
{'lower': 116420, 'lower_colon': 107144, 'other': 4044, 'problemchars': 4}
