# Applied Data Science Capstone - Neighborhoods geospatial data in Toronto

This notebook will be mainly used for the capstone project: geospatial data in Toronto

In [89]:
#import numpy and pandas
import numpy as np
import pandas as pd

#!conda install -c conda-forge geocoder 
#!pip install geocoder
import geocoder # import geocoder
import os

print("libs imported!")

libs imported!


I'll use pandas to read the wikipedia table using the method read_html and extracting the first table with css class 'wikitable'. (es: pandas.read_html(url, attrs={'class': 'wikitable'}))   

In [90]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
data = pd.read_html(url, attrs={'class': 'wikitable'})   
df = pd.concat(data)

#3.1 The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
df.columns = ['PostalCode', 'Borough', 'Neighborhood']

#3.2 Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
#df = df[df['Borough'] != 'Not assigned']
df.drop(df[df['Borough']=="Not assigned"].index, axis=0, inplace=True)

#3.3  More than one neighborhood can exist in one postal code area. 
#     For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: 
#     Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
df = pd.DataFrame(df.groupby('PostalCode').agg(lambda x:', '.join(x.unique())))
df.reset_index(inplace=True)

#3.4 If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. 
#    So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
df[df['Neighborhood'] == 'Not assigned']

#for row in df.values:
#    if (row[2] ==  'Not assigned'):
#        row[2] = row[1]

df.loc[df.Neighborhood == 'Not assigned', 'Neighborhood'] = df.Borough
df[df['PostalCode'] == 'M9A']

df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [91]:
df.shape

(103, 3)

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

In [92]:
# The code was removed by Watson Studio for sharing.

In [None]:
def geocoder_google(postal_code, city= 'Toronto', country='Ontario'):
    
    # initialize your variable to None
    lat_lng_coords = None
    i = 0
    
    # loop until you get the coordinates
    while(lat_lng_coords is None and i < 3):
        address = '{}, {}, {}'.format(postal_code, city, country)
        g = geocoder.google(address, key=os.environ["GOOGLE_API_KEY"])
        lat_lng_coords = g.latlng
        i = i + 1
        #print(postal_code, lat_lng_coords)
    return lat_lng_coords   

In [93]:
df['Coords'] = df.apply(lambda x: geocoder_google(x['PostalCode']), axis=1)
df['Latitude'] = df.apply(lambda x: x['Coords'][0], axis=1)
df['Longitude'] = df.apply(lambda x: x['Coords'][1], axis=1)
df.drop('Coords', axis=1, inplace=True)

df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [94]:
df.shape

(103, 5)