## 1. Area of investigation 
Map Area: New York, Manhattan area(USA)

<img src="pictures/New York.png" height="500" width="800">

As my computer has limited computing power I created a custom query from <a href="http://overpass-turbo.eu/">OverPass turbo</a> where the OSM output file has a reasonable size.
I limited the extraction to only the restaurant.

Here is the query

````javascript
[out:xml][timeout:25];
(
  node["amenity"="restaurant"]({{bbox}});
  way["amenity"="restaurant"]({{bbox}});
  relation["amenity"="restaurant"]({{bbox}});
);
out body;
>;
out skel qt;
````
The size of this extract is very small (1.12 MB)
````shell
ls -l new-york_new-york.osm
-rw-r-----@ 1 mic0331  staff  1121968 Mar 13 16:40 new-york_new-york.osm
````

In [110]:
from __future__ import absolute_import

osm_file = "./data/new-york_new-york.osm"

<hr>

## 2. `mapparser.py` results

This script is iteratively parsing the OSM file to find-out what and how many tags are available.

In [111]:
from scripts import mapparser

mapparser.run(osm_file)

{'member': 2,
 'meta': 1,
 'nd': 5175,
 'node': 6386,
 'note': 1,
 'osm': 1,
 'relation': 1,
 'tag': 14000,
 'way': 576}


<hr>

##3. `tags.py` results

Check the `k` value for each `<tag>` and see if they can be valid keys in MongoDB, as well as see if there are any other potential problems.

In [112]:
from scripts import tags

tags.run(osm_file)

{'lower': 8636, 'lower_colon': 5363, 'other': 1, 'problemchars': 0}


<hr>

## 4. `audit.py` results

Audit the OSMFILE and change the variable 'mapping' to reflect the changes needed to fix the unexpected street types to the appropriate ones in the expected list.

In [113]:
from scripts import audit
import pprint

st_types = audit.audit(osm_file)
pprint.pprint(dict(st_types))      

{'1': {'US 1'},
 '10': {'Nwe Jersey 10'},
 '15': {'Rt 15'},
 '17': {'New Jersey 17'},
 '206': {'RT 206'},
 '27': {'Route 27'},
 '35': {'Route 35'},
 '36': {'NJ Route 36'},
 '46': {'Route 46', 'US 46'},
 '683': {'County Road 683'},
 'Alley': {'Freeman Alley'},
 'Americas': {'Avenue Of The Americas'},
 'Ave': {'5th Ave',
         '6th Ave',
         'Centennial Ave',
         'East Rock Ave',
         'Mount Hope Ave',
         'Plainfield Ave',
         'W Crescent Ave',
         'W Mt Pleasant Ave'},
 'Ave,': {'Franklin Ave,'},
 'Ave.': {'Fire Island Ave.',
          'Franklin Ave.',
          'Springfield Ave.',
          'Washington Ave.'},
 'Avene': {'Madison Avene'},
 'B': {'Avenue B'},
 'Blvd': {'Bell Blvd',
          'College Point Blvd',
          'Manorhaven Blvd',
          'Orchard Beach Blvd',
          'Queens Blvd',
          'Woodhaven Blvd'},
 'Broadway': {'West Broadway', 'East Broadway', 'Broadway'},
 'Center': {'Theatre Center'},
 'Chestnut': {'Chestnut'},
 'East': {'

As we can see, we have the similar problem faced in the Lesson 6, some street names are spelled differently.
Let's implement the update process on this dataset.
We use an improoved version of string replacement using a more complex regrex pattern compare to the implementation in `audit.py`

In [114]:
import re

mapping=[('Street', ['St']), 
         ('Road', ['Rd']),
         ('Avenue', ['Avene', 'Ave', 'avenue']),
         ('Boulvard', ['Blvd']),
         ('West', ['W']),
         ('Mount', ['Mt'])]

d={ k : "\\b(?:" + "|".join(v) + ")\\b" for k,v in mapping}
pprint.pprint(d)

for st_type, ways in st_types.items():
    for name in ways:
        for k,r in d.items(): 
            better_name = re.sub(r, k, name)
            
            if name != better_name:    
                if better_name.endswith(","): better_name = better_name[:-1]
                if better_name.endswith("."): better_name = better_name[:-1]  
                print(name, "=>", better_name)          
                
                

{'Avenue': '\\b(?:Avene|Ave|avenue)\\b',
 'Boulvard': '\\b(?:Blvd)\\b',
 'Mount': '\\b(?:Mt)\\b',
 'Road': '\\b(?:Rd)\\b',
 'Street': '\\b(?:St)\\b',
 'West': '\\b(?:W)\\b'}
11th St. => 11th Street
East 86th St. => East 86th Street
Broad St. => Broad Street
Washington St. => Washington Street
Warren St. => Warren Street
Franklin Ave, => Franklin Avenue
West 32nd St => West 32nd Street
9th St => 9th Street
6th St => 6th Street
Fire Island Ave. => Fire Island Avenue
Franklin Ave. => Franklin Avenue
Washington Ave. => Washington Avenue
Springfield Ave. => Springfield Avenue
Madison Avene => Madison Avenue
W Main => West Main
Schalks Crossing Rd => Schalks Crossing Road
Valley Rd => Valley Road
Hampton House Rd => Hampton House Road
Iroquois Rd => Iroquois Road
Macopin Rd => Macopin Road
Tunxis Hill Rd => Tunxis Hill Road
Bell Blvd => Bell Boulvard
College Point Blvd => College Point Boulvard
Queens Blvd => Queens Boulvard
Manorhaven Blvd => Manorhaven Boulvard
Woodhaven Blvd => Woodhaven 

<hr>