# Segmenting and Clustering Neighborhoods in Toronto. PART 2

### Now that we have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name

### in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

First, we reuse the code from PART 1 to rebuild the DataFrame.

Let's use the wikipedia library since it significantly reduces the code size

In [1]:
import pandas as pd 
import wikipedia as wp

# Load the data from a Wikipedia page using wikipedia library
Toronto_wiki = wp.page("List of postal codes of Canada: M").html().encode("UTF-8")

In [2]:
# Create a DataFrame
Toronto_df = pd.read_html(Toronto_wiki, header = 0)[0]

# Cleanup the DataFrame
# Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
Toronto_df = Toronto_df[Toronto_df.Borough != 'Not assigned']

# More than one neighborhood can exist in one postal code area.
# These rows will be combined into one row with the neighborhoods separated with a comma
Toronto_df = Toronto_df.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(lambda x: ', '.join(x)).reset_index()

# If a cell has a Borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
Toronto_df['Neighbourhood'].replace('Not assigned', Toronto_df['Borough'], inplace = True)
Toronto_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Use the Geocoder Python package to get the latitude and the longitude coordinates of each neighborhood

In [3]:
import geocoder # import geocoder

# in order to make sure that we get the coordinates for all of the neighborhoods, we can run a while loop for each postal code
def get_geoinfo(postal_code):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
    return lat_lng_coords

Retrieve the geocode information for all our postal codes

In [4]:
postal_codes = Toronto_df['Postcode']
coords = [get_geoinfo(postal_code) for postal_code in postal_codes.tolist()]

Create a dataframe containing the geo information for all postal codes

In [5]:
# Create a dataframe with extracted geodata
geo_df = pd.DataFrame(coords, columns = ['Latitude', 'Longitude'])

# Merge DataFrames
Toronto_geo_df = pd.concat([Toronto_df, geo_df], axis=1)
Toronto_geo_df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.811525,-79.195517
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.785665,-79.158725
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.765815,-79.175193
3,M1G,Scarborough,Woburn,43.768369,-79.21759
4,M1H,Scarborough,Cedarbrae,43.769688,-79.23944
5,M1J,Scarborough,Scarborough Village,43.743125,-79.23175
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.726276,-79.263625
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.713054,-79.285055
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.724235,-79.227925
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.69677,-79.259967
