<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto</font></h1>

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

Just make sure:
* to add enough Markdown cells to explain what you decided to do and to report any observations you make.
* to generate maps to visualize your neighborhoods and how they cluster together.
* Once you are happy with your analysis, submit a link to the new Notebook on your Github repository.

Analysis done for New York City data:
1. Download and Explore Dataset
2. Explore Neighborhoods in New York City
3. Analyze Each Neighborhood
4. Cluster Neighborhoods
5. Examine Clusters

Analysis planned for Toronto data:
1. Download and Explore Dataset
2. Explore Neighborhoods in Toronto
3. Analyze Each Neighborhood
4. Cluster Neighborhoods
5. Examine Clusters

Explore the shelter information published for the city of Toronto.
https://www.toronto.ca/ext/open_data/catalog/data_set_files/SMIS_Daily_Occupancy_2018.json

Information about Toronto: City of Toronto has Wards (44), Former Municipalities (6), and Neighbourhoods (140) 

In [3]:
conda install anaconda

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


In [4]:
# Prepare the libraries and data set
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge

The following packages will be UPDATED:

    certifi: 2019.11.28-py36_0 --> 2019.11.28-py36_0 conda-forge

The following packages will be DOWNGRADED:

    openssl: 1.1.1d-h7b6447c_3 --> 1.1.1d-h516909a_0 conda-forge


Downloading and Extracting Packages
certifi-2019.11.28   | 149 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Libraries imported.


In [None]:
# Toronto Crime OpenData is no longer available in that data set format from arcgis; Use a local copy
#!wget -q -O 'Neighbourhood_Crime_Rates_Boundary_File_.geojson' https://opendata.arcgis.com/datasets
#print('Data downloaded!')

In [5]:
# use the locally downloaded copy to ensure the data is available for the example
with open('Neighbourhood_Crime_Rates_Boundary_File_.geojson') as json_data:
    torontocrime_data = json.load(json_data)

In [8]:
neighborhoods_data = torontocrime_data['features']

In [9]:
neighborhoods_data[0]

{'type': 'Feature',
 'properties': {'OBJECTID': 1,
  'Neighbourhood_Crime_Rates_Neigh': 'Yonge-St.Clair',
  'Neighbourhood_Crime_Rates_Hood_': '097',
  'Hood_ID': 97,
  'Neighbourhood': 'Yonge-St.Clair',
  'Assault_2014': 58,
  'Assault_2015': 38,
  'Assault_2016': 51,
  'Assault_2017': 46,
  'Assault_2018': 61,
  'Assault_AVG': 50.8,
  'Assault_CHG': '33%',
  'Assault_Rate_2018': 1912.8,
  'Auto_Theft_2014': 28,
  'Auto_Theft_2015': 32,
  'Auto_Theft_2016': 22,
  'Auto_Theft_2017': 46,
  'Auto_Theft_2018': 69,
  'AutoTheft_AVG': 39.4,
  'AutoTheft_CHG': '50%',
  'AutoTheft_Rate_2018': 2163.7,
  'BreakandEnter_2014': 29,
  'BreakandEnter_2015': 16,
  'BreakandEnter_2016': 28,
  'BreakandEnter_2017': 32,
  'BreakandEnter_2018': 23,
  'BreakandEnter_AVG': 25.6,
  'BreakandEnter_CHG': '-28%',
  'BreakandEnter_Rate_2018': 721.2,
  'Robbery_2014': 12,
  'Robbery_2015': 25,
  'Robbery_2016': 14,
  'Robbery_2017': 21,
  'Robbery_2018': 19,
  'Robbery_AVG': 18.2,
  'Robbery_CHG': '-10%',
  'Ro

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [10]:
# define the dataframe columns
column_names = ['HoodID','Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
#column_names = ['SegmentName','Street', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [11]:
# check the column headers
neighborhoods

Unnamed: 0,HoodID,Borough,Neighborhood,Latitude,Longitude


In [13]:
#properties,Hood_ID=97
#properties,Neighbourhood=Yonge-St.Clair
#geometry,coordinates=[-79.3911544456691, 43.6810810203342]
# York University Heights: 43.762°N 79.500°W
# example output:
# 97
# Yonge-St.Clair
# [-79.3911544456691, 43.6810810203342]
# 43.6810810203342
# -79.3911544456691
# 27
# York University Heights
# [-79.5052468376673, 43.759873985768]
# 43.759873985768
# -79.5052468376673
# 38
# Lansing-Westgate
# [-79.4399431313348, 43.7615578230112]
# 43.7615578230112
# -79.4399431313348

for data in neighborhoods_data:
    #borough = neighborhood_name = data['properties']['borough']
    hood_id = neighborhood_name = data['properties']['Hood_ID']
    neighborhood_name = data['properties']['Neighbourhood']
    neighborhood_latlon = data['geometry']['coordinates'][0][0]
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    #absneighborhoods = neighborhoods.append({'Borough': borough
    neighborhoods = neighborhoods.append({'HoodID': hood_id,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [14]:
# Toronto City Open Data project identifies 140 "social planning neighborhoods" for the data provided.
neighborhoods.head()

Unnamed: 0,HoodID,Borough,Neighborhood,Latitude,Longitude
0,97,,Yonge-St.Clair,43.681081,-79.391154
1,27,,York University Heights,43.759874,-79.505247
2,38,,Lansing-Westgate,43.761558,-79.439943
3,31,,Yorkdale-Glen Park,43.70561,-79.439647
4,16,,Stonegate-Queensway,43.647437,-79.492583


In [15]:
#### temporary
# link to a developer's rendering of Toronto's FSA codes to geocodes in geojson format
#https://raw.githubusercontent.com/ag2816/Visualizations/master/data/Toronto2.geojson

# The dataframe has 140 hood IDs and 280 neighborhoods
print('The dataframe has {} hoodIDs and {} neighborhoods.'.format(
        len(neighborhoods['HoodID'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 140 hoodIDs and 280 neighborhoods.


In [16]:
# export the data to a file
export_csv = neighborhoods.to_csv (r'socialNeighborhoodsToronto_dataframe.csv', index = None, header=True)

In [17]:
# 43.653963, -79.387207
# Get info to create a map
address = 'Toronto, Ontario, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto, Ontario, Canada are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto, Ontario, Canada are 43.653963, -79.387207.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [18]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, hoodID, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['HoodID'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, hoodID)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [19]:
# Let's look more closely at Mimico, Etobicoke, Toronto, Ontario, Canada
mimico_data = neighborhoods[neighborhoods['Neighborhood'] == 'Mimico'].reset_index(drop=True)
mimico_data.head()

Unnamed: 0,HoodID,Borough,Neighborhood,Latitude,Longitude
0,17,,Mimico,43.621073,-79.480357
1,17,,Mimico,43.621073,-79.480357


In [20]:
address = 'Mimico, Etobicoke, Toronto, Ontario'
# The snippet below outputs the data: The geograpical coordinate of Mimico are 43.603656, -79.4931782.

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Mimico are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Mimico are 43.6166773, -79.4968048.


In [21]:
# create map of Mimico using latitude and longitude values
map_mimico = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(mimico_data['Latitude'], mimico_data['Longitude'], mimico_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mimico)  
    
map_mimico

In [22]:
# @hidden_cell
#CLIENT_ID = 'your-client-ID' # your Foursquare ID
#CLIENT_SECRET = 'your-client-secret' # your Foursquare Secret
CLIENT_ID = 'QLLKZDNWIULN1FCEWM1V1GSRCUVGNZRVTPP4JJBBW5Y5H0ZX' # your Foursquare ID
CLIENT_SECRET = 'YBCINTWP3QLL2PNXOBDHIXAITN01IYTVMPOJLUI5STLST2D2' # your Foursquare Secret
VERSION = '20180604' # Foursquare API version

#### Prepare the connection to the Foursquare API

In [23]:
mimico_data.loc[0, 'Neighborhood']

'Mimico'

In [24]:
neighborhood_latitude = mimico_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = mimico_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = mimico_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Mimico are 43.6210731543947, -79.4803574485564.


In [26]:
# @hidden_cell
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)

In [27]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e3bad9f29ce6a001b9c48a6'},
 'response': {'headerLocation': 'Humber Bay',
  'headerFullLocation': 'Humber Bay, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 6,
  'suggestedBounds': {'ne': {'lat': 43.6211773045, 'lng': -79.49060068153028},
   'sw': {'lat': 43.6121772955, 'lng': -79.50300891846973}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ba3fa87f964a520d17338e3',
       'name': 'SanRemo Bakery',
       'location': {'address': '374 Royal York Rd',
        'crossStreet': 'at Simpson Ave',
        'lat': 43.618542136521064,
        'lng': -79.49948481145465,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.618542136521064,
          'lng': -79.49948481145465}],
        'distance': 

In [28]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [29]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,SanRemo Bakery,Bakery,43.618542,-79.499485
1,The Blue Goose,Bar,43.616789,-79.49587
2,Mimico Arena,Skating Rink,43.612739,-79.498682
3,Boats And Ho's Diner,Diner,43.616466,-79.497033
4,Audley Street Studios,Performing Arts Venue,43.619189,-79.49439


In [30]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

6 venues were returned by Foursquare.


In [31]:
mimico_venues = nearby_venues

## 2. Explore Neighborhoods in Mimico

Repeat the query for the remaining neighborhoods

In [33]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [34]:
mimico_venues = getNearbyVenues(names=mimico_data['Neighborhood'],
                                   latitudes=mimico_data['Latitude'],
                                   longitudes=mimico_data['Longitude']
                                  )

Mimico
Mimico


In [35]:
print(mimico_venues.shape)
mimico_venues.head()

(36, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mimico,43.621073,-79.480357,Humber Bay Park,43.622396,-79.478389,Park
1,Mimico,43.621073,-79.480357,Metro,43.622937,-79.482695,Supermarket
2,Mimico,43.621073,-79.480357,Beauty Boutique by Shoppers Drug Mart,43.622796,-79.482037,Cosmetics Shop
3,Mimico,43.621073,-79.480357,Avenue Café + Bistro,43.624208,-79.48435,Café
4,Mimico,43.621073,-79.480357,TD Canada Trust,43.62238,-79.482086,Bank


In [36]:
mimico_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Mimico,36,36,36,36,36,36


In [37]:
#Find the unique categories
print('There are {} uniques categories.'.format(len(mimico_venues['Venue Category'].unique())))

There are 17 uniques categories.
