# Applied Data Science Capstone

## Assignment 2: Segmenting and Clustering Neighborhoods in Toronto - Part  3

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

Just make sure:
 - To add enough Markdown cells to explain what you decided to do and to report any observations you make.
 - To generate maps to visualize you neighborhoods and how they cluster together.

Once you are happy with your analysis, submit a link to a new Notebook on your Github repository. (3 marks)

### Import the necessary libraries

In [2]:
# for handling data in a vectorized manner
import numpy as np  

# for data analysis
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# to handle json files
import json  

# to convert address to latitude and longitude
from geopy.geocoders import Nominatim

# to handle requests
import requests

# to transform JSON file to pandas DataFrame
from pandas.io.json import json_normalize

# For data visualization
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means for clustering
from sklearn.cluster import KMeans

# map rendering library
import folium

print('Libraries are imported! You are Good to Go --> ')

Libraries are imported! You are Good to Go --> 


In [3]:
df = pd.read_csv('final_data.csv', index_col = 0)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### To count the number of Boroughs and Neighborhoods

In [6]:
boroughs_count = len(df['Borough'].unique())
neighborhoods_count = df.shape[0]
print(f'The dataframe has {boroughs_count} and {neighborhoods_count} neighborhoods!')

The dataframe has 10 and 103 neighborhoods!


### Use geopy library to get the latitude and longitude values of Toronto, Canada

In [9]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent = 'toronto_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f'The geographical coordinate of Toronto, Canada is {latitude}, {longitude}')

The geographical coordinate of Toronto, Canada is 43.6534817, -79.3839347


### Create a map of Toronto, Canada with neighborhoods superimposed on top

In [12]:
# create map of Toronto, Canada using latitude and longitude values

map_toronto = folium.Map(locations = [latitude, longitude], zoom_start = 10)

# add markers to map

for lat, long, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = f'{neighborhood, borough}'
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker([lat, long], radius = 5, popup = label, color = 'blue', fill = True, fill_color = '#3186cc', fill_opacity = 0.7,
                       parse_html = False).add_to(map_toronto)

map_toronto

In [15]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [16]:
scarborough_data = df.query('Borough == "Scarborough"')
scarborough_data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### Let's get the Geographical coordinates of Scarborough

In [17]:
address = 'Scarborough, Ontario'

geolocator = Nominatim(user_agent = 'scarborough_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f'The geographical coordinate of Scarborough, Ontario, Toronto is {latitude}, {longitude}')

The geographical coordinate of Scarborough, Ontario, Toronto is 43.773077, -79.257774


In [19]:
# create map of Manhattan using latitude and longitude values
map_scarborough = folium.Map(location = [latitude, longitude], zoom_start = 10)

# add markers to map
for lat, long, label in zip(scarborough_data['Latitude'], scarborough_data['Longitude'], scarborough_data['Neighborhood']):
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker([lat, long], popup = label, color = 'blue', fill = True, fill_color = '#3186cc', fill_opacity = 0.7, parse_html = False).add_to(map_scarborough)

map_scarborough

### Utilizing the Foursquare API to explore the neighborhoods and segment them

In [20]:
CLIENT_ID = 'FCOTPNVLCVK0EFB5WQEW3TE51Y5AG1MHARD2W0OQ4Q2VG4WG'
CLIENT_SECRET = 'KPIH3QP3A4OEDQ05DBH41XG2NNNMTVTBUDRWAFONAQ2H3SEI'
VERSION = '20180605'

print('Credentials:')
print(f'CLIENT_ID: {CLIENT_ID}')
print(f'CLIENT_SECRET: {CLIENT_SECRET}')

Credentials:
CLIENT_ID: FCOTPNVLCVK0EFB5WQEW3TE51Y5AG1MHARD2W0OQ4Q2VG4WG
CLIENT_SECRET: KPIH3QP3A4OEDQ05DBH41XG2NNNMTVTBUDRWAFONAQ2H3SEI


### Exploring the First Neighborhood in the Dataframe

In [21]:
scarborough_data.loc[0, 'Neighborhood']

'Malvern, Rouge'

In [23]:
neighborhood_lat = scarborough_data.loc[0, 'Latitude']
neighborhood_long = scarborough_data.loc[0, 'Longitude']
neighborhood_name = scarborough_data.loc[0, 'Neighborhood']

print(f'Latitude and Longitude values of {neighborhood_name} are {neighborhood_lat}, {neighborhood_long}')

Latitude and Longitude values of Malvern, Rouge are 43.8066863, -79.19435340000003


### Getting the top 100 venues that are in Malvern within a radius of 500 metres 

In [26]:
LIMIT = 100
RADIUS = 500

# creating the url for the query
url = f'https://api.foursquare.com/v2/venues/explore?&client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&v={VERSION}&ll={neighborhood_lat},{neighborhood_long}&radius={RADIUS}&limit={LIMIT}'

url

'https://api.foursquare.com/v2/venues/explore?&client_id=FCOTPNVLCVK0EFB5WQEW3TE51Y5AG1MHARD2W0OQ4Q2VG4WG&client_secret=KPIH3QP3A4OEDQ05DBH41XG2NNNMTVTBUDRWAFONAQ2H3SEI&v=20180605&ll=43.8066863,-79.19435340000003&radius=500&limit=100'

In [32]:
results = requests.get(url).json()['response']['groups'][0]['i']
results

KeyError: 'venue'