# ** Data Wrangling with MongoDB**#
###### by Ly Vinh Hung (Tommy) in fulfillment of Udacity’s Data Analyst Nanodegree, Project 3

## Project Summary <a name="top"></a>
Name: Ly Vinh Hung

**Map area:**
+ Location: San Jose, California
- <a href=https://s3.amazonaws.com/metro-extracts.mapzen.com/san-jose_california.osm.bz2> Mapzen URl for San Jose, U.S.A. </a> 

Objective: Audit, clean the OSM dataset, convert from XML to JSON format and analyze insight within the data.

**References:**

Udacity "Data Wrangling with MongoDB" - Lesson 6

<a href=http://www.cceo.org/addressing/documents/StreetAbbreviationsGuide.pdf> CCEO Street Abbreviations Guide PDF </a> 

<a href=https://docs.mongodb.org/manual/reference/program/mongoimport/> MongoDB Importing XML to JSON Guide </a> 


<hr>

Table of Contents 
----
[](#top)
0. [Data Audit](#audit)
1. [Problems encountered](#problems)
 - [Street address abbreviations](#street)
 - [Zip codes](#postal)
2. [Data Overview with MongoDB](#data_overview)
3. [Additional Insight using MongoDB](#exploration)
4.  [Conclusion](#conclusion)

<hr>

<h2><a name="audit"></a> **1. Data Audit**</h2>

In [13]:
import xml.etree.cElementTree as ET
import pprint
import re
import codecs
import json
import collections
import pymongo

In [8]:
import os
datadir = "data"
datafile = "san-jose_california.osm"
cal_data = os.path.join(datadir, datafile)

> I parse through the San Jose dataset with ElementTree and count the number of unique element types to get an overall understanding of the data by using count_tags function.

In [10]:
#Parse through the file with ElementTree and count the number of unique element types to understand overall structure.
def count_tags(filename):
        tags = {}
        for event, elem in ET.iterparse(filename):
            if elem.tag in tags: 
                tags[elem.tag] += 1
            else:
                tags[elem.tag] = 1
        return tags
cal_tags = count_tags(cal_data)
pprint.pprint(cal_tags)

{'bounds': 1,
 'member': 7489,
 'nd': 1165081,
 'node': 997402,
 'osm': 1,
 'relation': 848,
 'tag': 542050,
 'way': 124444}


> For the follinwg function: key_type & process_map. We check the "k"
value for each "<tag>" and see if they can be valid keys in MongoDB, as well as see if there are any other potential problems.
As we saw in the quiz earlier, we would like to change the data
model and expand the "addr:street" type of keys to a dictionary like this:
{"address": {"street": "Some value"}}
So, we have to see if we have such tags, and if we have any tags with
problematic characters.

> For the function 'key_type', we have a count of each of
three tag categories in a dictionary:
  "lower", for tags that contain only lowercase letters and are valid,
  "lower_colon", for otherwise valid tags with a colon in their names,
  "problemchars", for tags with problematic characters, and


In [13]:
import re

lower = re.compile(r'^([a-z]|_)*$')
lower_colon = re.compile(r'^([a-z]|_)*:([a-z]|_)*$')
problemchars = re.compile(r'[=\+/&<>;\'"\?%#$@\,\. \t\r\n]')


def key_type(element, keys):
    if element.tag == "tag":
        for tag in element.iter('tag'):
            k = tag.get('k')
            if lower.search(k):
                keys['lower'] += 1
            elif lower_colon.search(k):
                keys['lower_colon'] += 1
            elif problemchars.search(k):
                keys['problemchars'] += 1
            else:
                keys['other'] += 1
    return keys


def process_map(filename):
    keys = {"lower": 0, "lower_colon": 0, "problemchars": 0, "other": 0}
    for _, element in ET.iterparse(filename):
        keys = key_type(element, keys)

    return keys

cal_keys = process_map(cal_data)
pprint.pprint(cal_keys)

{'lower': 286485, 'lower_colon': 238128, 'other': 17437, 'problemchars': 0}


> The following task is a fun one - find out how many unique users
have contributed to the map in San Jose area, we have 1022 uniques users have already worked on this



In [15]:
#people invovlved in the map editing.
def process_map(filename):
    users = set()
    for _, element in ET.iterparse(filename):
        for e in element:
            if 'uid' in e.attrib:
                users.add(e.attrib['uid'])
    return users
users = process_map(cal_data)
len(users)

1022

<hr>

[<div align="center">Back to top</div>](#top)

<h2><a name="problems"></a> **2. Problem encountered**</h2>

<h3><a name="street"></a> **2.1 Street address abbreviation **</h3>

>  The main problem we encountered in this dataset come from the street name abbreviation inconsistency. In this following code, we build the regex matching the last element in the string, where usually the street type is based. Then we come up with a list of mapping that need not to be cleaned.  

In [43]:
from collections import defaultdict

street_type_re = re.compile(r'\b\S+\.?$', re.IGNORECASE)

expected = ["Avenue", "Boulevard", "Commons", "Court", "Drive", "Lane", "Parkway", 
                         "Place", "Road", "Square", "Street", "Trail"]

mapping = {'Ave'  : 'Avenue',
           'Blvd' : 'Boulevard',
           'Dr'   : 'Drive',
           'Ln'   : 'Lane',
           'Pkwy' : 'Parkway',
           'Rd'   : 'Road',
           'Rd.'   : 'Road',
           'St'   : 'Street',
           'street' :"Street",
           'Ct'   : "Court",
           'Cir'  : "Circle",
           'Cr'   : "Court",
           'ave'  : 'Avenue',
           'Hwg'  : 'Highway',
           'Hwy'  : 'Highway',
           'Sq'   : "Square"}



> + audit_street_type function search the input string for the regex. If there is a match and it is not within the "expected" list, add the match as a key and add the string to the set.
+ is_street_name function looks at the attribute k if k="addre:street" 
+ audit functio will return the list that match previous two functions. After that, we would do a pretty print the output of the audit. With the list of all the abbreviated street types we can understand and fill-up our "mapping" dictionary as a preparatio to convert these street name into proper form.


In [15]:
def audit_street_type(street_types, street_name):
    m = street_type_re.search(street_name)
    if m:
        street_type = m.group()
        if street_type not in expected:
            street_types[street_type].add(street_name)

def is_street_name(elem):
    return (elem.attrib['k'] == "addr:street")

def audit(osmfile):
    osm_file = open(osmfile, "r")
    street_types = defaultdict(set)
    for event, elem in ET.iterparse(osm_file, events=("start",)):

        if elem.tag == "node" or elem.tag == "way":
            for tag in elem.iter("tag"):
                if is_street_name(tag):
                    audit_street_type(street_types, tag.attrib['v'])

    return street_types


In [27]:
cal_street_types = audit(cal_data)

> We then take a brief look at the unclean street name using pretty print below 

In [29]:
pprint.pprint(dict(cal_street_types))

{'0.1': set(['Ala 680 PM 0.1']),
 '1': set(['Stewart Drive Suite #1']),
 '114': set(['West Evelyn Avenue Suite #114']),
 '7.1': set(['Hwy 17 PM 7.1']),
 'Alameda': set(['The Alameda']),
 'Ave': set(['1425 E Dunne Ave',
             'Blake Ave',
             'Cabrillo Ave',
             'Cherry Ave',
             'Foxworthy Ave',
             'Meridian Ave',
             'N Blaney Ave',
             'Saratoga Ave',
             'Seaboard Ave',
             'The Alameda Ave',
             'Walsh Ave']),
 'Barcelona': set(['Calle de Barcelona']),
 'Bascom': set(['S. Bascom']),
 'Bellomy': set(['Bellomy']),
 'Blvd': set(['McCarthy Blvd',
              'Mission College Blvd',
              'N McCarthy Blvd',
              'Palm Valley Blvd',
              'Santa Teresa Blvd',
              'Stevens Creek Blvd']),
 'CA': set(['Zanker Rd., San Jose, CA', 'Zanker Road, San Jose, CA']),
 'Cir': set(['Celadon Cir']),
 'Circle': set(['Bobolink Circle',
                'Calabazas Creek Circle',
  

> This last function update_name is the last step of the process, which take the old name and update them with a better name

In [44]:
def update_name(name, mapping, regex):
    m = regex.search(name)
    if m:
        street_type = m.group()
        if street_type in mapping:
            name = re.sub(regex, mapping[street_type], name)

    return name

for street_type, ways in cal_street_types.iteritems():
    for name in ways:
        better_name = update_name(name, mapping, street_type_re)
        print name, "=>", better_name

Winchester => Winchester
Gaundabert Ln => Gaundabert Lane
Park Circle West => Park Circle West
Vanderbilt Court West => Vanderbilt Court West
Saratoga Los Gatos Rd => Saratoga Los Gatos Road
Homestead Rd => Homestead Road
West Evelyn Avenue Suite #114 => West Evelyn Avenue Suite #114
Hwy 17 PM 7.1 => Hwy 17 PM 7.1
Blossom Hill => Blossom Hill
Devonshire Way => Devonshire Way
Flicker Way => Flicker Way
Saich Way => Saich Way
Clifden Way => Clifden Way
Arata Way => Arata Way
Shelburne Way => Shelburne Way
Erin Way => Erin Way
Normandy Way => Normandy Way
John Way => John Way
Brahms Way => Brahms Way
Senate Way => Senate Way
Lilac Way => Lilac Way
Primrose Way => Primrose Way
Cisco Way => Cisco Way
Marie P. DeBartolo Way => Marie P. DeBartolo Way
Dunnock Way => Dunnock Way
Squirewood Way => Squirewood Way
Ward Way => Ward Way
Forge Way => Forge Way
Prince Edward Way => Prince Edward Way
Moreland Way => Moreland Way
Almaden Express Way => Almaden Express Way
Allison Way => Allison Way
Big 

<hr>

[<div align="center">Back to top</div>](#top)

<h3><a name="postal"></a> **2.2 Zip codes **</h3>

> We can re-use part of the code in street abbreviation problem and briefly modify it to use it here. Although most of the zip code is correct, there're still a lot of zip code with incorrect 5 digit formats. 

In [65]:
from collections import defaultdict

def audit_zipcode(invalid_zipcodes, zipcode):
    twoDigits = zipcode[0:2]
    
    if not twoDigits.isdigit():
        invalid_zipcodes[twoDigits].add(zipcode)
    
    elif twoDigits != 95:
        invalid_zipcodes[twoDigits].add(zipcode)
        
def is_zipcode(elem):
    return (elem.attrib['k'] == "addr:postcode")

def audit_zip(osmfile):
    osm_file = open(osmfile, "r")
    invalid_zipcodes = defaultdict(set)
    for event, elem in ET.iterparse(osm_file, events=("start",)):

        if elem.tag == "node" or elem.tag == "way":
            for tag in elem.iter("tag"):
                if is_zipcode(tag):
                    audit_zipcode(invalid_zipcodes,tag.attrib['v'])

    return invalid_zipcodes

cal_zipcode = audit_zip(cal_data)



In [66]:
pprint.pprint(dict(cal_zipcode))

{'94': set(['94084',
            '94085',
            '94086',
            '94086-6406',
            '94087',
            u'94087\u200e',
            '94088-3707',
            '94089',
            '94089-2701',
            '94807']),
 '95': set(['95002',
            '95008',
            '95013',
            '95014',
            '95014-0200',
            '95014-0202',
            '95014-0236',
            '95014-0238',
            '95014-0240',
            '95014-030',
            '95014-0337',
            '95014-0353',
            '95014-0355',
            '95014-0358',
            '95014-0400',
            '95014-0431',
            '95014-0433',
            '95014-0434',
            '95014-0436',
            '95014-0437',
            '95014-0438',
            '95014-0439',
            '95014-0440',
            '95014-0444',
            '95014-0445',
            '95014-0446',
            '95014-0447',
            '95014-0448',
            '95014-0449',
            '95014-0450',
       

> The output of the clean zip code is summarised below. There are the format of 5 digits, 4 digits and 5 digits - 5 digits which are valid and need no formatting.

In [68]:

def update_name(zipcode):
    testNum = re.findall('[a-zA-Z]*', zipcode)
    if testNum:
        testNum = testNum[0]
    testNum.strip()
    if testNum == "CA":
        convertedZipcode = (re.findall(r'\d+', zipcode))
        if convertedZipcode:
            if convertedZipcode.__len__() == 2:
                return (re.findall(r'\d+', zipcode))[0] + "-" +(re.findall(r'\d+', zipcode))[1]
            else:
                return (re.findall(r'\d+', zipcode))[0]

for street_type, ways in cal_zipcode.iteritems():
    for name in ways:
        better_name = update_name(name)
        print name, "=>", better_name



CA 95054 => 95054
CA 94088-3453 => 94088-3453
CA 95110 => 95110
CA 95113 => 95113
CA 95116 => 95116
CA 94085 => 94085
95014-1899 => None
95014-5398 => None
95014-3456 => None
95014-3457 => None
95014-1968 => None
95014-1960 => None
95014-1961 => None
95014-1962 => None
95014-1963 => None
95014-1964 => None
95014-1965 => None
95014-1966 => None
95037-4530 => None
95014-4664 => None
95014-4665 => None
95014-4666 => None
95014-4667 => None
95014-0549 => None
95014-0548 => None
95014-4662 => None
95014-4663 => None
95014-0545 => None
95014-0544 => None
95014-0547 => None
95014-0546 => None
95014-0541 => None
95014-0540 => None
95014-0543 => None
95014-0542 => None
95014-3012 => None
95014-3010 => None
95014-3011 => None
95014-3014 => None
95014-3015 => None
95014-3018 => None
95014-2947 => None
95014-4449 => None
95014-2945 => None
95014-2944 => None
95014-2943 => None
95014-2942 => None
95014-2941 => None
95014-2940 => None
95014-4440 => None
95014-4441 => None
95014-4442 => None
95014-44

<hr>

[<div align="center">Back to top</div>](#top)

##### Preparing for MongoDB by converting XML to JSON

> In order to transform the data from XML to JSON, we need to follow these rules:
+ Process only 2 types of top level tags: "node" and "way"
+ All attributes of "node" and "way" should be turned into regular key/value pairs, except: attributes in the CREATED array should be added under a key "created", attributes for latitude and longitude should be added to a "pos" array, for use in geospacial indexing. Make sure the values inside "pos" array are floats and not strings. 
+ If second level tag "k" value contains problematic characters, it should be ignored
+ If second level tag "k" value starts with "addr:", it should be added to a dictionary "address"
+ If second level tag "k" value does not start with "addr:", but contains ":", you can process it
  same as any other tag.
+ If there is a second ":" that separates the type/direction of a street,
  the tag should be ignored

> After all the cleaning and data transformation are done, we would use last function process_map and convert the file from XML into JSON format

In [None]:
import re
import codecs
import json

lower = re.compile(r'^([a-z]|_)*$')
lower_colon = re.compile(r'^([a-z]|_)*:([a-z]|_)*$')
problemchars = re.compile(r'[=\+/&<>;\'"\?%#$@\,\. \t\r\n]')
address_regex = re.compile(r'^addr\:')
street_regex = re.compile(r'^street')

CREATED = [ "version", "changeset", "timestamp", "user", "uid"]


def shape_element(element):
    node = {}
    if element.tag == "node" or element.tag == "way" :
        # YOUR CODE HERE
        node['type'] = element.tag
        # initialize empty address
        address = {}
        # parsing through attributes
        for a in element.attrib:
            if a in CREATED:
                if 'created' not in node:
                    node['created'] = {}
                node['created'][a] = element.get(a)
            elif a in ['lat', 'lon']:
                continue
            else:
                node[a] = element.get(a)
        # populate position
        if 'lat' in element.attrib and 'lon' in element.attrib:
            node['pos'] = [float(element.get('lat')), float(element.get('lon'))]

        # parse second-level tags for nodes
        for e in element:
            # parse second-level tags for ways and populate `node_refs`
            if e.tag == 'nd':
                if 'node_refs' not in node:
                    node['node_refs'] = []
                if 'ref' in e.attrib:
                    node['node_refs'].append(e.get('ref'))

            # throw out not-tag elements and elements without `k` or `v`
            if e.tag != 'tag' or 'k' not in e.attrib or 'v' not in e.attrib:
                continue
            key = e.get('k')
            val = e.get('v')

            # skip problematic characters
            if problemchars.search(key):
                continue

            # parse address k-v pairs
            elif address_regex.search(key):
                key = key.replace('addr:', '')
                address[key] = val

            # catch-all
            else:
                node[key] = val
        # compile address
        if len(address) > 0:
            node['address'] = {}
            street_full = None
            street_dict = {}
            street_format = ['prefix', 'name', 'type']
            # parse through address objects
            for key in address:
                val = address[key]
                if street_regex.search(key):
                    if key == 'street':
                        street_full = val
                    elif 'street:' in key:
                        street_dict[key.replace('street:', '')] = val
                else:
                    node['address'][key] = val
            # assign street_full or fallback to compile street dict
            if street_full:
                node['address']['street'] = street_full
            elif len(street_dict) > 0:
                node['address']['street'] = ' '.join([street_dict[key] for key in street_format])
        return node
    else:
        return None


def process_map(file_in, pretty = False):
    file_out = "{0}.json".format(file_in)
    data = []
    with codecs.open(file_out, "w") as fo:
        for _, element in ET.iterparse(file_in):
            el = shape_element(element)
            if el:
                data.append(el)
                if pretty:
                    fo.write(json.dumps(el, indent=2)+"\n")
                else:
                    fo.write(json.dumps(el) + "\n")
    return data
process_map(cal_data)

<hr>

[<div align="center">Back to top</div>](#top)

<h2><a name="data_overview"></a> **3. Data Overview with MongoDB**</h2>

In [151]:
import signal
import subprocess
pro = subprocess.Popen('mongod', preexec_fn = os.setsid)

In [70]:
from pymongo import MongoClient

db_name = 'openstreetmap'

# Connect to Mongo DB
client = MongoClient('localhost:27017')
db = client[db_name]

In [79]:
# Build mongoimport command instead of using homebrew
collection = cal_data[:cal_data.find('.')]
json_file = cal_data + '.json'

mongoimport_cmd = 'mongoimport -h 127.0.0.1:27017 ' + \
                  '--db ' + db_name + \
                  ' --collection ' + collection + \
                  ' --file ' + json_file

# Before importing, drop collection if it is already running 
if collection in db.collection_names():
    print 'Dropping collection: ' + collection
    db[collection].drop()
    
# Execute the command
print 'Executing: ' + mongoimport_cmd
subprocess.call(mongoimport_cmd.split())

Executing: mongoimport -h 127.0.0.1:27017 --db openstreetmap --collection data/san-jose_california --file data/san-jose_california.osm.json


0

In [80]:
sanjose_california = db[collection]


#### File sizes

In [85]:
import os
print 'The original OSM file is {} MB'.format(os.path.getsize(cal_data)/1.0e6) # convert from bytes to megabytes
print 'The JSON file is {} MB'.format(os.path.getsize(cal_data + ".json")/1.0e6) # convert from bytes to megabytes

The original OSM file is 220.389582 MB
The JSON file is 251.264751 MB


#### Number of documents

In [81]:
sanjose_california.find().count()


1121846

#### Number of unique users

In [83]:
len(sanjose_california.distinct('created.user'))


1017

#### Number of Nodes and Ways

In [106]:
print "Number of nodes:",sanjose_california.find({'type':'node'}).count()
print "Number of ways:",sanjose_california.find({'type':'way'}).count()

Number of nodes: 997394
Number of ways: 124414


#### Name of top 5 contributors

In [122]:
result = sanjose_california.aggregate( [
                                        { "$group" : {"_id" : "$created.user", 
                                        "count" : { "$sum" : 1} } },
                                        { "$sort" : {"count" : -1} }, 
                                        { "$limit" : 5 } ] )

print(list(result))


[{u'count': 270237, u'_id': u'nmixter'}, {u'count': 167163, u'_id': u'mk408'}, {u'count': 81442, u'_id': u'Bike Mapper'}, {u'count': 73982, u'_id': u'n76_cupertino_import'}, {u'count': 60124, u'_id': u'n76'}]


<h2><a name="exploration"></a> **4. Further data explaration with MongoDB**</h2>

#### List of top 20 amenities in San Jose

In [123]:
amenity = sanjose_california.aggregate([{'$match': {'amenity': {'$exists': 1}}}, \
                                {'$group': {'_id': '$amenity', \
                                            'count': {'$sum': 1}}}, \
                                {'$sort': {'count': -1}}, \
                                {'$limit': 10}])
print(list(amenity))


[{u'count': 1593, u'_id': u'parking'}, {u'count': 832, u'_id': u'restaurant'}, {u'count': 542, u'_id': u'school'}, {u'count': 415, u'_id': u'fast_food'}, {u'count': 341, u'_id': u'place_of_worship'}, {u'count': 214, u'_id': u'fuel'}, {u'count': 213, u'_id': u'cafe'}, {u'count': 170, u'_id': u'bank'}, {u'count': 166, u'_id': u'bicycle_parking'}, {u'count': 163, u'_id': u'bench'}]


#### List of top 5 cuisine in San Jose

In [125]:
cuisine = sanjose_california.aggregate([{"$match":{"amenity":{"$exists":1},
                                 "amenity":"restaurant",}},      
                      {"$group":{"_id":{"Food":"$cuisine"},
                                 "count":{"$sum":1}}},
                      {"$project":{"_id":0,
                                  "Food":"$_id.Food",
                                  "Count":"$count"}},
                      {"$sort":{"Count":-1}}, 
                      {"$limit":6}])
print(list(cuisine))

[{u'Food': None, u'Count': 242}, {u'Food': u'mexican', u'Count': 75}, {u'Food': u'chinese', u'Count': 61}, {u'Food': u'vietnamese', u'Count': 51}, {u'Food': u'pizza', u'Count': 50}, {u'Food': u'japanese', u'Count': 38}]


#### List of top 10 post code in San Jose

In [133]:
postcode = sanjose_california.aggregate( [ 
    { "$match" : { "address.postcode" : { "$exists" : 1} } }, 
    { "$group" : { "_id" : "$address.postcode", "count" : { "$sum" : 1} } },  
    { "$sort" : { "count" : -1}},
      {"$limit":10}] )
print(list(postcode))

[{u'count': 325, u'_id': u'95014'}, {u'count': 231, u'_id': u'95070'}, {u'count': 208, u'_id': u'94087'}, {u'count': 181, u'_id': u'94086'}, {u'count': 140, u'_id': u'95051'}, {u'count': 88, u'_id': u'95127'}, {u'count': 83, u'_id': u'95129'}, {u'count': 76, u'_id': u'95035'}, {u'count': 74, u'_id': u'95135'}, {u'count': 73, u'_id': u'95125'}]


#### Total users have unique post (post only one time)

In [141]:
users = sanjose_california.aggregate( [
    { "$group" : {"_id" : "$created.user", 
                "count" : { "$sum" : 1} } },
    { "$group" : {"_id" : "$count",
                "num_users": { "$sum" : 1} } },
    { "$sort" : {"_id" : 1} },
    { "$limit" : 1} ] )
print(list(users))

[{u'num_users': 207, u'_id': 1}]


In [142]:
building = sanjose_california.aggregate([
       {'$match': {'building': { '$exists': 1}}}, 
        {'$group': {'_id': '$building',
                    'count': {'$sum': 1}}}, 
        {'$sort': {'count': -1}},
        {'$limit': 5}])
print(list(building))

[{u'count': 53176, u'_id': u'yes'}, {u'count': 4857, u'_id': u'house'}, {u'count': 3924, u'_id': u'residential'}, {u'count': 413, u'_id': u'apartments'}, {u'count': 325, u'_id': u'roof'}]


<hr>

[<div align="center">Back to top</div>](#top)

<h2><a name="conclusion"></a> **4. Conclusion**</h2>

> **_Ideas to improve data quality of OSM:_**

> When we audit the data, it was very clear that although there are minor error caused by human input, the dataset is fairly well-cleaned. Considering there're hundreds of contributors for this map, there is a great numbers of human errors in this project. I'd recommend a srtuctured input form so everyone can input the same data format to reduce this error or we can create a more robust script to clean the data regularly on a bi-weekly basis. Moreover, we can incentivize users by gamify the contribution process (e.g users with the most contribution with least errors), then we can create a recommendation engine to leverage these data (restaurant recommendation, building  Last, since OpenStreetMaps is an open source project, there're still a lot of areas left unexplored as people tend to focus on a certain key areas and left other part outdated. we can resolve this issue by cross-referencing/cross-validating missing data from other database like Google API. Since each node has a coordinate (lattitude & longtitude), this process is definitely do-able. 

> **_Potential cost of the implementation:_**

> There're few potential issues could you see that may arise from the implementation of this solution. One of which is the amount of effort to engineer all these processes and the cost of creating, auditing & maintaining these initiatives could be so overwhelm and require a dedicated team responsible for all these projects.