# Segmenting and Clustering Neighborhoods in Toronto

# Step 3: Get Latitudes and Longitudes

In [7]:
# Import libraries
import pandas as pd
import numpy as np

Now that I have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood. 

In [8]:
# To Install geocoder and geopy uncomment the following
# !pip install geocoder
#!pip install geopy

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code. Taking postal code M5G as an example, your code would look something like this:

In [9]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="ca_explorer")

location = geolocator.geocode(address)

latitude = location.latitude

longitude = location.longitude

print('The geographical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geographical coordinate of Toronto are 43.6534817, -79.3839347.


Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

In [10]:
lat_long = pd.read_csv('Geospatial_Coordinates.csv')

In [11]:
lat_long.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [12]:
lat_long.shape

(103, 3)

Previously, we have saved the neighbourhood data to csv and now we will read it again to merge along with Latitude and Longitude of each neighbourhoods.

In [29]:
# Read the neighbourhoods data
neighbourhoods = pd.read_csv('toronto_neighbourhoods.csv')

Now we will merge the two dataframes together with <code> merge </code> method if the two postal code matches

In [30]:
# Merge the lat_long with the above dataframe I have built so far.
toronto_neighbourhoods = neighbourhoods.merge(lat_long, left_on='Postcode', right_on='Postal Code', left_index=False, right_index=False)
toronto_neighbourhoods = toronto_neighbourhoods.drop(['Postal Code'], axis=1) # drop Postal Code 
toronto_neighbourhoods.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Woodbine Gardens,Parkview Hill",43.706397,-79.309937
9,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937


Finally we will save our dataframe into the csv file or update to the current csv file for later use.

In [31]:
# Save this data to_csv for later use(It will update to the current csv files along with lat and long)
toronto_neighbourhoods.to_csv('toronto_neighbourhoods.csv', index=False)