# Data Science Capstone Project
## Peer-graded Assigment: Segmenting and Clustering Neighborhoods in Toronto


## Overall Process - Part 3

The analysis of Toronto neighborhoods will proceed along the following steps:

1. Initialize Libraries
2. Read Toronto neighborhoods data created in Part 2
3. Transform json data into a DataFrame
4. Create map of Toronto using latitude and longitude values
5. Explore the Rouge, Malvern neighborhoods
6. Find nearby venues using Foursquare API
7. Find how many venues categories exist in Rouge, Malvern



## 1 Initialise Libraries

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from pandas.io.json import json_normalize

import json
import numpy as np
import requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

Solving environment: done

# All requested packages already installed.



## 2. Read the json file created in Part 2

In [2]:
with open('toronto_neighborhoods.txt') as json_data:
    toronto_data = json.load(json_data)

## 3. Transform json data into a DataFrame

In [3]:
neighborhoods = pd.DataFrame(toronto_data)

In [4]:
neighborhoods.head()

Unnamed: 0,Borough,Latitude,Longitude,Neighborhood,Postcode
0,Scarborough,43.653963,-79.387207,"Rouge, Malvern",M1B
1,Scarborough,43.653963,-79.387207,"Highland Creek, Rouge Hill, Port Union",M1C
2,Scarborough,,,"Guildwood, Morningside, West Hill",M1E
3,Scarborough,43.765717,-79.221898,Woburn,M1G
4,Scarborough,,,Cedarbrae,M1H


In [5]:
address = 'Toronto, Canada'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto, Canada are {}, {}.'.format(latitude, longitude))




The geograpical coordinate of Toronto, Canada are 43.653963, -79.387207.


## 4. Create map of Toronto using latitude and longitude values above

In [6]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

In [7]:
#To simplify thing a bit, let us create a subset of the neighborhoods dataframe
#Specifically, take into account only the ones with Latitide not equal NaN

neighborhoods_subset = neighborhoods[neighborhoods['Latitude'] > 0.0].reset_index(drop=True)

In [8]:
neighborhoods_subset.head()

Unnamed: 0,Borough,Latitude,Longitude,Neighborhood,Postcode
0,Scarborough,43.653963,-79.387207,"Rouge, Malvern",M1B
1,Scarborough,43.653963,-79.387207,"Highland Creek, Rouge Hill, Port Union",M1C
2,Scarborough,43.765717,-79.221898,Woburn,M1G
3,Scarborough,43.81547,-79.327734,"L'Amoreaux West, Steeles West",M1W
4,North York,43.779772,-79.366185,"Fairview, Henry Farm, Oriole",M2J


In [9]:
for lat, lng, borough, neighborhood in zip(\
        neighborhoods_subset['Latitude'], 
        neighborhoods_subset['Longitude'], 
        neighborhoods_subset['Borough'], 
        neighborhoods_subset['Neighborhood']):
    
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

In [10]:
map_toronto

In [11]:
neighborhoods_subset = neighborhoods[neighborhoods['Latitude'] > 0.0].reset_index(drop=True)

In [13]:
neighborhoods_subset.head()

Unnamed: 0,Borough,Latitude,Longitude,Neighborhood,Postcode
0,Scarborough,43.653963,-79.387207,"Rouge, Malvern",M1B
1,Scarborough,43.653963,-79.387207,"Highland Creek, Rouge Hill, Port Union",M1C
2,Scarborough,43.765717,-79.221898,Woburn,M1G
3,Scarborough,43.81547,-79.327734,"L'Amoreaux West, Steeles West",M1W
4,North York,43.779772,-79.366185,"Fairview, Henry Farm, Oriole",M2J


## 5. Explore the Rouge, Malvern neighborhoods

In [14]:
# Let's explore the first neighborhood in our dataframe.
# Get the neighborhood's name.
neighborhoods_subset.loc[0,'Neighborhood']

'Rouge, Malvern'

In [15]:
neighborhood_latitude = neighborhoods_subset.loc[0, 'Latitude']
neighborhood_longitude = neighborhoods_subset.loc[0, 'Longitude']
neighborhood_name = neighborhoods_subset.loc[0, 'Neighborhood']

## 6. Find nearby venues using Foursquare API

In [16]:
# Let's get the top 100 venues that are in Marble Hill within a radius of 500 meters.

LIMIT = 100
radius = 500

CLIENT_ID = '1P3ABELERY4BBWQCQBMGCGBZKG5YF1UAN2NTAGWIIGY2AEPA'
CLIENT_SECRET = 'BAC4OWK3U5OUHGHZU4HW0UJ1VZJFTP3Z4ZCOFMZKXTJPQXNO'
VERSION = '20180605'

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

In [17]:
# Send the GET request and examine the resutls

results = requests.get(url).json()

In [18]:
# function that extracts the category of the venue

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']


In [19]:
venues = results['response']['groups'][0]['items']

In [20]:
nearby_venues = json_normalize(venues) # flatten JSON

In [21]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

In [22]:
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Textile Museum of Canada,Art Museum,43.654396,-79.3865
2,Japango,Sushi Restaurant,43.655268,-79.385165
3,Sansotei Ramen 三草亭,Ramen Restaurant,43.655157,-79.386501
4,Tsujiri,Tea Room,43.655374,-79.385354


In [23]:
nearby_venues.shape

(74, 4)

## 7. Find how many venues categories exist in Rouge, Malvern

In [24]:

print('There are {} uniques categories.' \
      .format(len(nearby_venues['categories'].unique())))


There are 53 uniques categories.
