### APPLIED DATA SCIENCE CAPSTONE ###
#### BY SUDHARSHAN P.R. ####

_WEEK 3 - SEGMENTING TORONTO NEIGHBORHOODS_

Toronto Neighborhoods - Wikipedia page
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

### PART 1 - WEB SCRAPING WIKIPEDIA PAGE

**-------------------------------------------------------------------------------------------------------------------**

_**Below Cell imports all Required Packages**_

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np

_**Getting the URL loaded to prepare the object for BeautifulSoup**_

In [2]:
url ='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
tnto_data = requests.get(url).text

_**Creating the BeautifulSoup object**_

In [3]:
soup = BeautifulSoup(tnto_data,"html.parser")

In [4]:
tnto_tables = soup.find_all('table')

_**Collecting the relevant data from the BeautifulSoup Object**_

In [5]:
tnto_contents = []

In [6]:
tnto_table = soup.find('table')

In [7]:
for row in tnto_table.findAll('td'):
    cell = {}
    if row.span.text == 'Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        tnto_contents.append(cell)

In [8]:
tnto_contents[0-10]

{'PostalCode': 'M8W',
 'Borough': 'Etobicoke',
 'Neighborhood': 'Alderwood, Long Branch'}

_**Converting the List of Dictionaries to a Pandas DataFrame**_

In [9]:
tnto_df = pd.DataFrame(tnto_contents)

In [10]:
tnto_df['Borough']=tnto_df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

In [11]:
tnto_df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


_**Getting the shape of the DataFrame**_

In [12]:
tnto_df.shape[0]

103

**-------------------------------------------------------------------------------------------------------------------**

### PART 2 - RETRIEVING LATITUDE AND LONGITUDE OF EVERY TORONTO NEIGHBORHOOD 

**-------------------------------------------------------------------------------------------------------------------**

In [13]:
# Importing the csv file containing the Latitudes and Longitudes
lat_lng_df = pd.read_csv('D:\My Documents\IBM Data Science\Applied DS Capstone\Geospatial_Coordinates.csv')

In [14]:
lat_lng_df.columns = ['PostalCode', 'Latitude', 'Longitude']

In [15]:
lat_lng_df.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [16]:
# Merging the Latitudes and Longitudes with the Toronto Neighborhood DataFrame
tnto_df = tnto_df.merge(lat_lng_df, on = 'PostalCode')

In [17]:
tnto_df.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [18]:
tnto_df['Borough'].unique()

array(['North York', 'Downtown Toronto', "Queen's Park", 'Etobicoke',
       'Scarborough', 'East York', 'York', 'East Toronto', 'West Toronto',
       'East York/East Toronto', 'Central Toronto', 'Mississauga',
       'Downtown Toronto Stn A', 'Etobicoke Northwest',
       'East Toronto Business'], dtype=object)

### PART 3 - EXPLORING TORONTO NEIGHBORHOODS

In [19]:
# Importing Matplotlib, kmeans and Folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
from geopy.geocoders import Nominatim
import json
from pandas.io.json import json_normalize

In [20]:
address = 'Toronto, TO'

geolocator = Nominatim(user_agent="tnto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.65238435, -79.38356765.


In [21]:
# Creating a Map of Toronto using Latitudes and Longitudes on Folium
map_tnto = folium.Map(location = [latitude, longitude], zoom_start=10)

In [22]:
# Adding Markers to the Map to locate neighborhoods and boroughs
for lat, lng, borough, neighborhood in zip(tnto_df['Latitude'], tnto_df['Longitude'], tnto_df['Borough'], tnto_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tnto)  
    
map_tnto

**Choosing Etobicoke to Explore Neighborhoods in that Borough**

In [23]:
tnto_et_df = tnto_df[tnto_df['Borough'] == 'Etobicoke'].reset_index(drop=True)

In [24]:
tnto_et_df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
1,M9B,Etobicoke,"West Deane Park, Princess Gardens, Martin Grov...",43.650943,-79.554724
2,M9C,Etobicoke,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",43.643515,-79.577201
3,M9P,Etobicoke,Westmount,43.696319,-79.532242
4,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
5,M8V,Etobicoke,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321
6,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437
7,M8W,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484
8,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
9,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


In [25]:
# Getting the geographical co-ordinates of Etobicoke
address = 'Etobicoke, TO'

geolocator = Nominatim(user_agent="tnto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Etobicoke are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Etobicoke are 43.6435559, -79.5656326.


_Creating the Map of Etobicoke_

In [26]:
map_eto=folium.Map(location = [latitude, longitude], zoom_start=10)

In [27]:
for lat, lng, borough, neighborhood in zip(tnto_et_df['Latitude'], tnto_et_df['Longitude'], tnto_et_df['Borough'], tnto_et_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_eto)  
    
map_eto

***My FourSquare Credentials***

In [28]:
CLIENT_ID = 'K0GYSKKQI0BC1RHBTXDSR0QWWVVDQYABARNBOYRTITACST2T' # your Foursquare ID
CLIENT_SECRET = 'C2P5LP0OAC0HU3QCKYSYYHW15S4OSHAUUMZT3WFEZ1PLV1BQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: K0GYSKKQI0BC1RHBTXDSR0QWWVVDQYABARNBOYRTITACST2T
CLIENT_SECRET:C2P5LP0OAC0HU3QCKYSYYHW15S4OSHAUUMZT3WFEZ1PLV1BQ


In [29]:
# Getting the first neighborhood in Etobicoke
tnto_et_df.loc[0,'Neighborhood']

'Islington Avenue'

In [30]:
neighborhood_latitude = tnto_et_df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = tnto_et_df.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = tnto_et_df.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Islington Avenue are 43.6678556, -79.5322424.


### Getting the Top 100 Venues Around Islington Avenue

In [31]:
# Changing radius to 1000 since there were not many results in a 500 radius
LIMIT = 100
radius = 1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=K0GYSKKQI0BC1RHBTXDSR0QWWVVDQYABARNBOYRTITACST2T&client_secret=C2P5LP0OAC0HU3QCKYSYYHW15S4OSHAUUMZT3WFEZ1PLV1BQ&v=20180605&ll=43.6678556,-79.5322424&radius=1000&limit=100'

In [32]:
# Getting the results for the Top venues near Islington Avenue
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60fc595c44485170286ce7ce'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Edenbridge - Humber Valley',
  'headerFullLocation': 'Edenbridge - Humber Valley, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 12,
  'suggestedBounds': {'ne': {'lat': 43.676855609000015,
    'lng': -79.51982358836783},
   'sw': {'lat': 43.65885559099999, 'lng': -79.54466121163217}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bfd53764cf820a13849ecf4',
       'name': "Java Joe's Village Cafe",
       'location': {'address': '1500 Islington Ave',
        'crossStreet': 'at Rathburn Rd',
        'lat': 43.662460906352436,
        'lng': -7

In [33]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [34]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues)

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues)


Unnamed: 0,name,categories,lat,lng
0,Java Joe's Village Cafe,Café,43.662461,-79.532054
1,St Georges Golf and Country Club,Golf Course,43.674395,-79.537142
2,TD Canada Trust,Bank,43.662545,-79.531749
3,Shoppers Drug Mart,Pharmacy,43.663067,-79.531753
4,COBS Bread,Bakery,43.66494,-79.520485


Repeating the above for all nieghborhoods

In [35]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [36]:
tnto_eto_venues = getNearbyVenues(names = tnto_et_df['Neighborhood'],
                                 latitudes = tnto_et_df['Latitude'],
                                 longitudes = tnto_et_df['Longitude'])

Islington Avenue
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Westmount
Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens
New Toronto, Mimico South, Humber Bay Shores
South Steeles, Silverstone, Humbergate, Jamestown, Mount Olive, Beaumond Heights, Thistletown, Albion Gardens
Alderwood, Long Branch
The Kingsway, Montgomery Road, Old Mill North
Old Mill South, King's Mill Park, Sunnylea, Humber Bay, Mimico NE, The Queensway East, Royal York South East, Kingsway Park South East
Mimico NW, The Queensway West, South of Bloor, Kingsway Park South West, Royal York South West


In [37]:
tnto_eto_venues.shape

(70, 7)

In [38]:
tnto_eto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"West Deane Park, Princess Gardens, Martin Grov...",43.650943,-79.554724,Shawarma Club,43.64891,-79.549611,Middle Eastern Restaurant
1,"West Deane Park, Princess Gardens, Martin Grov...",43.650943,-79.554724,Marius Bakery,43.648965,-79.549381,Bakery
2,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",43.643515,-79.577201,LCBO,43.642099,-79.576592,Liquor Store
3,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",43.643515,-79.577201,Starbucks,43.641312,-79.576924,Coffee Shop
4,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",43.643515,-79.577201,The Beer Store,43.641313,-79.576925,Beer Store


In [39]:
tnto_eto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Alderwood, Long Branch",8,8,8,8,8,8
"Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood",8,8,8,8,8,8
"Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens",4,4,4,4,4,4
"Mimico NW, The Queensway West, South of Bloor, Kingsway Park South West, Royal York South West",13,13,13,13,13,13
"New Toronto, Mimico South, Humber Bay Shores",11,11,11,11,11,11
"Old Mill South, King's Mill Park, Sunnylea, Humber Bay, Mimico NE, The Queensway East, Royal York South East, Kingsway Park South East",2,2,2,2,2,2
"South Steeles, Silverstone, Humbergate, Jamestown, Mount Olive, Beaumond Heights, Thistletown, Albion Gardens",10,10,10,10,10,10
"The Kingsway, Montgomery Road, Old Mill North",3,3,3,3,3,3
"West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale",2,2,2,2,2,2
Westmount,9,9,9,9,9,9


#### Analyzing Each Neighborhood

In [40]:
# One-hot encoding
tnto_eto_onehot = pd.get_dummies(tnto_eto_venues[['Venue Category']], prefix="", prefix_sep="")

tnto_eto_onehot['Neighborhood'] = tnto_eto_venues['Neighborhood']

fixed_columns = [tnto_eto_onehot.columns[-1]] + list(tnto_eto_onehot.columns[:-1])
tnto_eto_onehot = tnto_eto_onehot[fixed_columns]

tnto_eto_onehot.head()

Unnamed: 0,Neighborhood,Bakery,Baseball Field,Beer Store,Burger Joint,Bus Line,Business Service,Café,Chinese Restaurant,Coffee Shop,Convenience Store,Discount Store,Fast Food Restaurant,Flower Shop,Fried Chicken Joint,Grocery Store,Gym,Hardware Store,Intersection,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pool,Pub,Restaurant,River,Sandwich Place,Supplement Shop,Tanning Salon,Video Store,Wings Joint
0,"West Deane Park, Princess Gardens, Martin Grov...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"West Deane Park, Princess Gardens, Martin Grov...",1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [41]:
tnto_eto_onehot.shape

(70, 37)

In [42]:
tnto_eto_grouped = tnto_eto_onehot.groupby('Neighborhood').mean().reset_index()

In [43]:
tnto_eto_grouped

Unnamed: 0,Neighborhood,Bakery,Baseball Field,Beer Store,Burger Joint,Bus Line,Business Service,Café,Chinese Restaurant,Coffee Shop,Convenience Store,Discount Store,Fast Food Restaurant,Flower Shop,Fried Chicken Joint,Grocery Store,Gym,Hardware Store,Intersection,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pool,Pub,Restaurant,River,Sandwich Place,Supplement Shop,Tanning Salon,Video Store,Wings Joint
0,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.25,0.0,0.125,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0
1,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.125,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Kingsview Village, St. Phillips, Martin Grove ...",0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
3,"Mimico NW, The Queensway West, South of Bloor,...",0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.076923,0.076923,0.076923,0.076923,0.0,0.076923,0.076923,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.076923,0.076923,0.0,0.076923
4,"New Toronto, Mimico South, Humber Bay Shores",0.090909,0.0,0.0,0.0,0.0,0.090909,0.181818,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0
5,"Old Mill South, King's Mill Park, Sunnylea, Hu...",0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"South Steeles, Silverstone, Humbergate, Jamest...",0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.1,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0
7,"The Kingsway, Montgomery Road, Old Mill North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0
8,"West Deane Park, Princess Gardens, Martin Grov...",0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Westmount,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.222222,0.111111,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0


In [44]:
tnto_eto_grouped.shape

(10, 37)

In [45]:
num_top_venues = 5

for hood in tnto_eto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = tnto_eto_grouped[tnto_eto_grouped['Neighborhood']==hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq']=temp['freq'].astype(float)
    temp = temp.round({'freq':2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alderwood, Long Branch----
            venue  freq
0     Pizza Place  0.25
1             Pub  0.12
2             Gym  0.12
3  Sandwich Place  0.12
4        Pharmacy  0.12


----Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood----
          venue  freq
0  Liquor Store  0.12
1    Beer Store  0.12
2          Park  0.12
3     Pet Store  0.12
4      Pharmacy  0.12


----Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens----
               venue  freq
0  Mobile Phone Shop  0.25
1               Park  0.25
2           Bus Line  0.25
3     Sandwich Place  0.25
4             Bakery  0.00


----Mimico NW, The Queensway West, South of Bloor, Kingsway Park South West, Royal York South West----
             venue  freq
0           Bakery  0.08
1   Discount Store  0.08
2    Tanning Salon  0.08
3  Supplement Shop  0.08
4   Sandwich Place  0.08


----New Toronto, Mimico South, Humber Bay Shores----
                  venue  freq
0                  Café  0.18
1      

#### Converting the above to a DataFrame

In [46]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [47]:
# DataFrame for the Top 10 venues in each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = tnto_eto_grouped['Neighborhood']

for ind in np.arange(tnto_eto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(tnto_eto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Alderwood, Long Branch",Pizza Place,Pub,Gym,Sandwich Place,Pharmacy,Coffee Shop,Pool,Playground,Mobile Phone Shop,Park
1,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",Liquor Store,Beer Store,Park,Pet Store,Pharmacy,Café,Pizza Place,Coffee Shop,Pub,Playground
2,"Kingsview Village, St. Phillips, Martin Grove ...",Mobile Phone Shop,Park,Bus Line,Sandwich Place,Bakery,Pool,Pet Store,Pharmacy,Pizza Place,Playground
3,"Mimico NW, The Queensway West, South of Bloor,...",Bakery,Discount Store,Tanning Salon,Supplement Shop,Sandwich Place,Hardware Store,Gym,Grocery Store,Flower Shop,Fast Food Restaurant
4,"New Toronto, Mimico South, Humber Bay Shores",Café,Bakery,Fast Food Restaurant,Restaurant,Pizza Place,Pharmacy,Mexican Restaurant,Gym,Liquor Store,Business Service


#### Running Clustering on Neighborhoods

In [48]:
k = 5

tnto_eto_grouped_clustering = tnto_eto_grouped.drop('Neighborhood',1)

kmeans = KMeans(n_clusters = k, random_state = 0).fit(tnto_eto_grouped_clustering)

kmeans.labels_[0:10]

array([0, 0, 3, 0, 0, 1, 0, 2, 4, 0])

In [49]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

tnto_eto_merged = tnto_et_df

tnto_eto_merged = tnto_eto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
tnto_eto_merged = tnto_eto_merged[tnto_eto_merged['Cluster Labels'].notna()]
tnto_eto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,M9B,Etobicoke,"West Deane Park, Princess Gardens, Martin Grov...",43.650943,-79.554724,4.0,Bakery,Middle Eastern Restaurant,Baseball Field,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pool
2,M9C,Etobicoke,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",43.643515,-79.577201,0.0,Liquor Store,Beer Store,Park,Pet Store,Pharmacy,Café,Pizza Place,Coffee Shop,Pub,Playground
3,M9P,Etobicoke,Westmount,43.696319,-79.532242,0.0,Pizza Place,Discount Store,Intersection,Sandwich Place,Chinese Restaurant,Coffee Shop,Middle Eastern Restaurant,Playground,Pool,Park
4,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724,3.0,Mobile Phone Shop,Park,Bus Line,Sandwich Place,Bakery,Pool,Pet Store,Pharmacy,Pizza Place,Playground
5,M8V,Etobicoke,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321,0.0,Café,Bakery,Fast Food Restaurant,Restaurant,Pizza Place,Pharmacy,Mexican Restaurant,Gym,Liquor Store,Business Service


***Visualizing Clustering Results***

In [51]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(tnto_eto_merged['Latitude'], tnto_eto_merged['Longitude'], tnto_eto_merged['Neighborhood'], tnto_eto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Examining Each Cluster

**Cluster 1**

In [52]:
tnto_eto_merged.loc[tnto_eto_merged['Cluster Labels'] == 0, tnto_eto_merged.columns[[1] + list(range(5, tnto_eto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Etobicoke,0.0,Liquor Store,Beer Store,Park,Pet Store,Pharmacy,Café,Pizza Place,Coffee Shop,Pub,Playground
3,Etobicoke,0.0,Pizza Place,Discount Store,Intersection,Sandwich Place,Chinese Restaurant,Coffee Shop,Middle Eastern Restaurant,Playground,Pool,Park
5,Etobicoke,0.0,Café,Bakery,Fast Food Restaurant,Restaurant,Pizza Place,Pharmacy,Mexican Restaurant,Gym,Liquor Store,Business Service
6,Etobicoke,0.0,Grocery Store,Fried Chicken Joint,Beer Store,Video Store,Pharmacy,Sandwich Place,Pizza Place,Coffee Shop,Fast Food Restaurant,Bakery
7,Etobicoke,0.0,Pizza Place,Pub,Gym,Sandwich Place,Pharmacy,Coffee Shop,Pool,Playground,Mobile Phone Shop,Park
10,Etobicoke,0.0,Bakery,Discount Store,Tanning Salon,Supplement Shop,Sandwich Place,Hardware Store,Gym,Grocery Store,Flower Shop,Fast Food Restaurant


**Cluster 2**

In [53]:
tnto_eto_merged.loc[tnto_eto_merged['Cluster Labels'] == 1, tnto_eto_merged.columns[[1] + list(range(5, tnto_eto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Etobicoke,1.0,Pool,Baseball Field,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place,Playground,Bakery


**Cluster 3**

In [54]:
tnto_eto_merged.loc[tnto_eto_merged['Cluster Labels'] == 2, tnto_eto_merged.columns[[1] + list(range(5, tnto_eto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Etobicoke,2.0,Pool,Park,River,Playground,Middle Eastern Restaurant,Mobile Phone Shop,Pet Store,Pharmacy,Pizza Place,Bakery


**Cluster 4**

In [55]:
tnto_eto_merged.loc[tnto_eto_merged['Cluster Labels'] == 3, tnto_eto_merged.columns[[1] + list(range(5, tnto_eto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Etobicoke,3.0,Mobile Phone Shop,Park,Bus Line,Sandwich Place,Bakery,Pool,Pet Store,Pharmacy,Pizza Place,Playground


**Cluster 5**

In [56]:
tnto_eto_merged.loc[tnto_eto_merged['Cluster Labels'] == 4, tnto_eto_merged.columns[[1] + list(range(5, tnto_eto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Etobicoke,4.0,Bakery,Middle Eastern Restaurant,Baseball Field,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pool


### As you can see, Etobicoke is not a really busy neighborhood in Toronto