# Get geograhical Coordinates

**This is Part 2 of the Segmenting and Clustering Neighborhoods in Toronto Assignment**

This notebook focuses on retrieving the geographical coordinates or each postal code in the Toronto Dataset

# Import necessary libraries and data

In [1]:
import pandas as pd

In [2]:
# Import the data
neighborhoods_in_toronto_dataset = pd.read_csv('neighborhoods_part_1.csv')

# We don't need the first column (titled Unnamed: 0) so we'll just drop it. 
# The Unnamed: 0 column is simply the index of the dataframe prior to saving it
neighborhoods_in_toronto_dataset.drop('Unnamed: 0', axis =1, inplace = True)


# Display the first 5 rows 
neighborhoods_in_toronto_dataset.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


## Get coordinates using the GeoSpatial  Dataset



First we need to create an empty dataframe containing 'Postal Code', 'Borough', 'Neighborhood', 'Latitude', 'Longitude' columns 

In [3]:
# Define column names
column_names = ['PostalCode', 'Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns = column_names)

# Check the empty set
neighborhoods

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude


I intially intend to use the geocoder but it took so long to execute so I used the GeoSpatial Dataset instead. But in case, one decided to use this approach, uncomment the code below.

In [4]:
# import geocoder # import geocoder

# # initialize your variable to None
# lat_lng_coords = None
# postal_code = 'M5G'
# # loop until you get the coordinates
# while(lat_lng_coords is None):
#     g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
#     lat_lng_coords = g.latlng

# latitude = lat_lng_coords[0]
# longitude = lat_lng_coords[1]

I downloaded the geospatial data linked in the instruction page and used 'Geospatial_Coordinates.csv' as a filename. 

In [5]:
# Read geospatial coordinate and assign to geospatialdata dataframe
geospatialdata = pd.read_csv('Geospatial_Coordinates.csv')

# Rename Postal Code into PostalCode to be consistent with our neighborhood_in_toronto_dataset
geospatialdata.rename(columns={'Postal Code':'PostalCode'},inplace=True)

# Display the first 5 rows
geospatialdata.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


# Merging data

Merge the dataframe neighborhood_in_toronto_dataset and the geospatialdata by the column Postal Code

In [6]:
neighborhood = pd.merge(neighborhoods_in_toronto_dataset, geospatialdata, on='PostalCode')
neighborhood

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto Business,Enclave of M4L,43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


# Saving data 

Saving the neighborhood dataframe since it'll be needed in Part 3

In [7]:
neighborhood.to_csv('neighborhoods_part_2.csv')