# Segmenting and Clustering Neighborhoods in Toronto - Part 2

Firstly, import the necessary libraries: 

In [2]:
import pandas as pd
import numpy as np
import requests

!pip install geocoder
import geocoder

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 5.8 MB/s  eta 0:00:01
[?25hCollecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


Load the .csv table saved previously into a dataframe using the `pandas` library. This table contained the Postal Codes, Boroughs and Neighborhoods for Toronto.

In [4]:
df = pd.read_csv('postalcodes.csv')
print(df.head())
df.shape

  PostalCode           Borough                                 Neighborhood
0        M3A        North York                                    Parkwoods
1        M4A        North York                             Victoria Village
2        M5A  Downtown Toronto                    Regent Park, Harbourfront
3        M6A        North York             Lawrence Manor, Lawrence Heights
4        M7A  Downtown Toronto  Queen's Park, Ontario Provincial Government


(103, 3)

Run a `for` loop for each row in the dataframe to obtain the Latitude and Longitude values. A `while` loop in needed within the `for` loop as the `geocoder` package does not always give the geographical coordinates of the postal codes. This loop ensures that a value is obtained for each code.

In [None]:
latitude = []
longitude = []

for i in range(103):
    codes = df.PostalCode[i]
    
    lat_lng_coords = None
    
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(codes))
        lat_lng_coords = g.latlng

    latitude.append(lat_lng_coords[0])
    longitude.append(lat_lng_coords[1])
    
df['Latitude'] = latitude
df['Longitude'] = longitude
df.head()

The coordinates of the neighborhoods were unable to be found using the above method. Therefore, the data was loaded from the provided link in the assignment: https://cocl.us/Geospatial_data
This .csv file was loaded and sorted by Postal Code.

In [7]:
df2 = pd.read_csv('https://cocl.us/Geospatial_data')
df2.sort_values(by=['Postal Code'])
df2.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


The dataframe created with the Boroughs and Neighborhoods were also sorted by Postal Code to ensure that both dataframe values corresponded with each other. The shape of both dataframes show that both have the same number or rows.

In [8]:
df.sort_values(by=['PostalCode'], inplace=True)
df.reset_index(inplace=True, drop=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [9]:
print(df.shape)
print(df2.shape)

(103, 3)
(103, 3)


The two tables were merged so that each Postal Code would have their respective Latitude and Longitude value

In [10]:
df_latlng = pd.concat([df, df2], axis=1, sort=False)
df_latlng.drop(['Postal Code'], axis=1, inplace=True)
df_latlng.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


Convert the dataframe to .csv file for future use.

In [11]:
df_latlng.to_csv('postalcodes_.csv', index=False)