# This is an Independent Project

In this project, popular next venue destinations from coffee shops in different neighborhoods will be explored.

### Objective
Identifying popular next destination via clustering method can help characterize each neighborhood in Toronto. 
For example, in some neighborhoods people will mostly go shopping after having coffee, in other neighborhoods, people may head over to a train station instead. These information can be useful when recommending ads specific to user-location.

### Data
To solve the problem, location data has to be sourced. Foursquare API can be used to identify popular next destinations from a given venue. Based on this, we can get a list of all next venues for all the coffee shops in each neighborhood in Toronto. The API call has to be made on each coffee shop while keeping track of which neighborhood the coffee shop is located.

### Methodology
First, the coordinates of each neighborhood in Toronto is obtained via web scraping. Every coffee shop in Toronto is searched and the results are saved. Next, the top 3 popular next venues were identified for every coffee shop in each neighborhood in Toronto. Clustering was applied to group neighborhoods that have similar results. The clustered results are displayed on a map.


### Results and Discussions
Clustering identified five groups. In one of the groups, shopping mall was the most popular next destination. In another group, heading to the local park or farm was the first choice. The train station was also a popular destination in some neighborhoods. Based on these results, we can identify which neighborhoods are in downtown(ex: where popular next venue is shopping mall) or closer to residential areas(ex: where popular next venues are park/farm), or perhaps near a transportation hub. If a shopping mall wants to advertise upcoming sales more effectively, then now we have a better understanding of which neighborhoods need to be targeted; it should focus more advertisement on neighborhoods where people are more likely to come shopping after having coffee. Likewise, if a farm market wants to advertise its events, it should do so in the neighborhoods where people are more likely come to the farm market .


### Conclusion
Toronto neighborhoods were analyzed by looking at popular next venues from coffee shops in the city. Patterns were discovered via clustering and certain traits for each neighborhood is identified. These traits make it easier to understand what people do in each neighborhood.

In [1]:
import pandas as pd
import numpy as np

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [2]:
#!conda install -c anaconda beautifulsoup4

In [3]:
from bs4 import BeautifulSoup
import requests

## 1. Load Toronto Information

This will load neighborhood coordinates via web scraping

In [4]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(source,'lxml')
#print(soup.prettify())

In [5]:
my_table = soup.find(class_='wikitable sortable')
#my_table

In [6]:
headers = my_table.find_all('th')
header_list = []
for header in headers:
    header_list.append(header.text.strip('\n'))
header_list

['Postcode', 'Borough', 'Neighbourhood']

In [7]:
entries = my_table.find_all('td')
entries

entry_list = []
for entry in entries:
    entry_list.append(entry.text.strip('\n'))    
#entry_list

In [8]:
df = pd.DataFrame(data=np.reshape(entry_list,(-1,3)), columns=header_list)
df.head(20)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


In [9]:
# drop rows that does not have an assigned borough
# if neighbourhood is not assigned, assign the borough name
df.drop(index = df[df['Borough']=='Not assigned'].index, axis=0, inplace=True)
df.reset_index(drop=True,inplace=True)


df.loc[df['Neighbourhood']=='Not assigned','Neighbourhood']=df[df['Neighbourhood']=='Not assigned']['Borough']
df.head(20)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


In [10]:
df_new = pd.DataFrame(columns=header_list)
count=0
for postcode in df['Postcode'].unique():
    df_temp = df[df['Postcode']==postcode]
    borough = df_temp['Borough'].iloc[0]
    neighbourhood = ', '.join(df_temp['Neighbourhood'])
    #print(postcode,borough,neighbourhood )
    df_new.loc[count]=[postcode,borough,neighbourhood]
    count=count+1;
df_new.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


In [11]:
df_new.shape

(103, 3)

In [12]:
# download csv file from http://cocl.us/Geospatial_data
# this file contains coordinates for each postal code
df_coordinates = pd.read_csv('Geospatial_Coordinates.csv')
df_coordinates.rename(columns={'Postal Code':'Postcode'}, inplace=True)
df_coordinates.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


merge neighbourhood information with the coordinates


In [13]:
df_toronto_all = df_new.join(df_coordinates.set_index('Postcode'),on='Postcode',how='left')
df_toronto_all.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937


List all the boroughs in Toronto

In [14]:
df_toronto_all.Borough.unique()

array(['North York', 'Downtown Toronto', "Queen's Park", 'Etobicoke',
       'Scarborough', 'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

In [15]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium

In [16]:
# create map of Toronto using latitude and longitude values
latitude = 43.6532
longitude = -79.3832
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto_all['Latitude'], df_toronto_all['Longitude'], df_toronto_all['Borough'], df_toronto_all['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(    
       [lat, lng],         
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Let's limit our data to toronto: 'Downtown Toronto', 'East Toronto', 'West Toronto', 'Central Toronto'

In [17]:
df_toronto=df_toronto_all[df_toronto_all['Borough'].str.contains('Toronto')].reset_index(drop=True)
df_toronto

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
1,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M4E,East Toronto,The Beaches,43.676357,-79.293031
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564
7,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568
8,M6H,West Toronto,"Dovercourt Village, Dufferin",43.669005,-79.442259
9,M5J,Downtown Toronto,"Harbourfront East, Toronto Islands, Union Station",43.640816,-79.381752


## 2. Run Venue Search on Each Neighbourhood via Foursquare API

#### Define Foursquare Credentials and Version

In [18]:
CLIENT_ID = 'FUUIXDDOWXKV1CRKHNVNXQQEVB2QGPM4E01AUOVMOF2YJU1P' # your Foursquare ID
CLIENT_SECRET = 'K1DR1ZVHI52F3YRSCP0BXDWAODLKSAWV3W2CVDNELKWQUBCV' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FUUIXDDOWXKV1CRKHNVNXQQEVB2QGPM4E01AUOVMOF2YJU1P
CLIENT_SECRET:K1DR1ZVHI52F3YRSCP0BXDWAODLKSAWV3W2CVDNELKWQUBCV


Get the neighborhood's latitude and longitude values.

In [19]:
neighbourhood_latitude = df_toronto.loc[0, 'Latitude'] # neighborhood latitude value
neighbourhood_longitude = df_toronto.loc[0, 'Longitude'] # neighborhood longitude value
neighbourhood_name = df_toronto.loc[0, 'Neighbourhood'] # neighborhood name
print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of Harbourfront, Regent Park are 43.6542599, -79.3606359.


#### Now, let's get the top 100 venues that are in Harbourfront within a radius of 500 meters.

In [20]:
# type your answer here
LIMIT=100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighbourhood_latitude, 
    neighbourhood_longitude, 
    radius, 
    LIMIT)
url



'https://api.foursquare.com/v2/venues/explore?&client_id=FUUIXDDOWXKV1CRKHNVNXQQEVB2QGPM4E01AUOVMOF2YJU1P&client_secret=K1DR1ZVHI52F3YRSCP0BXDWAODLKSAWV3W2CVDNELKWQUBCV&v=20180605&ll=43.6542599,-79.3606359&radius=500&limit=100'

Send the GET request and examine the results

In [21]:
results = requests.get(url).json()

In [22]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [23]:
import json
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Toronto Cooper Koo Family Cherry St YMCA Centre,Gym / Fitness Center,43.653191,-79.357947
3,Body Blitz Spa East,Spa,43.654735,-79.359874
4,Morning Glory Cafe,Breakfast Spot,43.653947,-79.361149


In [24]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

46 venues were returned by Foursquare.


#### Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&day=any'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *toronto_venues*.

In [26]:
toronto_venues = getNearbyVenues(names=df_toronto['Neighbourhood'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude']
                                  )


Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Adelaide, King, Richmond
Dovercourt Village, Dufferin
Harbourfront East, Toronto Islands, Union Station
Little Portugal, Trinity
The Danforth West, Riverdale
Design Exchange, Toronto Dominion Centre
Brockton, Exhibition Place, Parkdale Village
The Beaches West, India Bazaar
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North, Forest Hill West
High Park, The Junction South
North Toronto West
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
Harbord, University of Toronto
Runnymede, Swansea
Moore Park, Summerhill East
Chinatown, Grange Park, Kensington Market
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown, St. James Town
Fir

#### Let's check the size of the resulting dataframe

In [27]:
print(toronto_venues.shape)
toronto_venues.head()

(1699, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Harbourfront, Regent Park",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Harbourfront, Regent Park",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Harbourfront, Regent Park",43.65426,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
3,"Harbourfront, Regent Park",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Harbourfront, Regent Park",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


Let's check how many venues were returned for each neighborhood

In [28]:
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,55,55,55,55,55,55
"Brockton, Exhibition Place, Parkdale Village",22,22,22,22,22,22
Business Reply Mail Processing Centre 969 Eastern,17,17,17,17,17,17
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",14,14,14,14,14,14
"Cabbagetown, St. James Town",46,46,46,46,46,46
Central Bay Street,83,83,83,83,83,83
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,16,16,16,16,16,16
Church and Wellesley,86,86,86,86,86,86


#### Let's find out how many unique categories can be curated from all the returned venues


In [29]:
print('There are {} unique categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 237 unique categories.


## 3. Analyze Each Neighborhood

In [30]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [31]:
toronto_onehot.shape

(1699, 238)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [32]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.0,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.012048,0.0,0.0,0.012048,0.0,0.0,0.012048
7,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.05,0.0,0.04,0.01,0.0,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.0,0.011628,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.011628,0.011628,0.0,0.011628,0.0,0.023256


In [33]:
toronto_grouped.shape

(38, 238)

#### Let's print each neighborhood along with the top 5 most common venues

In [34]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
             venue  freq
0      Coffee Shop  0.06
1       Steakhouse  0.04
2              Bar  0.04
3             Café  0.04
4  Thai Restaurant  0.04


----Berczy Park----
                venue  freq
0         Coffee Shop  0.07
1        Cocktail Bar  0.05
2      Farmers Market  0.04
3  Seafood Restaurant  0.04
4         Cheese Shop  0.04


----Brockton, Exhibition Place, Parkdale Village----
            venue  freq
0  Breakfast Spot  0.09
1            Café  0.09
2     Coffee Shop  0.09
3     Yoga Studio  0.05
4             Bar  0.05


----Business Reply Mail Processing Centre 969 Eastern----
              venue  freq
0       Yoga Studio  0.06
1     Auto Workshop  0.06
2              Park  0.06
3        Comic Shop  0.06
4  Recording Studio  0.06


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue  freq
0    Airport Lounge  0.14
1   Airport Service  0.14
2  Airport Termin

#### Let's find out how many unique categories can be curated from all the returned venues

First, let's write a function to sort the venues in descending order.

In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [36]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Thai Restaurant,Bar,Steakhouse,Burger Joint,Gym,Asian Restaurant,Hotel,Bakery
1,Berczy Park,Coffee Shop,Cocktail Bar,Seafood Restaurant,Bakery,Restaurant,Steakhouse,Pub,Farmers Market,Café,Cheese Shop
2,"Brockton, Exhibition Place, Parkdale Village",Breakfast Spot,Café,Coffee Shop,Yoga Studio,Bar,Burrito Place,Restaurant,Caribbean Restaurant,Climbing Gym,Pet Store
3,Business Reply Mail Processing Centre 969 Eastern,Yoga Studio,Auto Workshop,Garden Center,Garden,Light Rail Station,Fast Food Restaurant,Farmers Market,Comic Shop,Park,Recording Studio
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport Terminal,Airport Service,Harbor / Marina,Plane,Sculpture Garden,Boutique,Boat or Ferry,Airport Gate,Airport


In [37]:
neighbourhoods_venues_sorted.shape

(38, 11)

## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [38]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5
toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [39]:
kmeans.labels_

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4,
       0, 1, 0, 0, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [40]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_toronto

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,0,Coffee Shop,Café,Bakery,Pub,Park,Breakfast Spot,Restaurant,Mexican Restaurant,Theater,Bank
1,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,0,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Middle Eastern Restaurant,Restaurant,Tea Room,Japanese Restaurant,Italian Restaurant,Bubble Tea Shop
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Restaurant,Café,Hotel,Breakfast Spot,Clothing Store,Gastropub,Cosmetics Shop,Bakery,Italian Restaurant
3,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Coffee Shop,Health Food Store,Pub,Neighborhood,Diner,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Coffee Shop,Cocktail Bar,Seafood Restaurant,Bakery,Restaurant,Steakhouse,Pub,Farmers Market,Café,Cheese Shop


Finally, let's visualize the resulting clusters

In [41]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster.

#### Cluster 1

In [42]:
toronto_merged.shape

(38, 16)

In [43]:
toronto_merged.columns[[1,2,3,4]]

Index(['Borough', 'Neighbourhood', 'Latitude', 'Longitude'], dtype='object')

In [44]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,Coffee Shop,Café,Bakery,Pub,Park,Breakfast Spot,Restaurant,Mexican Restaurant,Theater,Bank
1,Downtown Toronto,0,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Middle Eastern Restaurant,Restaurant,Tea Room,Japanese Restaurant,Italian Restaurant,Bubble Tea Shop
2,Downtown Toronto,0,Coffee Shop,Restaurant,Café,Hotel,Breakfast Spot,Clothing Store,Gastropub,Cosmetics Shop,Bakery,Italian Restaurant
3,East Toronto,0,Coffee Shop,Health Food Store,Pub,Neighborhood,Diner,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
4,Downtown Toronto,0,Coffee Shop,Cocktail Bar,Seafood Restaurant,Bakery,Restaurant,Steakhouse,Pub,Farmers Market,Café,Cheese Shop
5,Downtown Toronto,0,Coffee Shop,Sandwich Place,Italian Restaurant,Bar,Bubble Tea Shop,Café,Burger Joint,Spa,Japanese Restaurant,Chinese Restaurant
6,Downtown Toronto,0,Grocery Store,Café,Park,Convenience Store,Nightclub,Baby Store,Diner,Restaurant,Athletics & Sports,Italian Restaurant
7,Downtown Toronto,0,Coffee Shop,Café,Thai Restaurant,Bar,Steakhouse,Burger Joint,Gym,Asian Restaurant,Hotel,Bakery
8,West Toronto,0,Supermarket,Pharmacy,Discount Store,Bakery,Liquor Store,Gym / Fitness Center,Pool,Music Venue,Middle Eastern Restaurant,Café
9,Downtown Toronto,0,Coffee Shop,Hotel,Aquarium,Café,Italian Restaurant,Fried Chicken Joint,Pizza Place,Scenic Lookout,Brewery,Restaurant


#### Cluster 2

In [45]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,Central Toronto,1,Playground,Gym,Diner,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


#### Cluster 3

In [46]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Central Toronto,2,Garden,Yoga Studio,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


#### Cluster 4

In [47]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,Central Toronto,3,Mexican Restaurant,Trail,Sushi Restaurant,Jewelry Store,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


#### Cluster 5

In [48]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Central Toronto,4,Gym / Fitness Center,Park,Swim School,Bus Line,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
32,Downtown Toronto,4,Park,Trail,Playground,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Yoga Studio


## 6. Get Nearby Coffee Shops

Make an API call that searches coffee shops

In [49]:
def getNearbyCategory(names, latitudes, longitudes, section, radius=500):
    #section = 'coffee'
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&section={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            section,
            LIMIT)
       # GET https://api.foursquare.com/v2/venues/VENUE_ID/nextvenues


        
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['id'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue ID',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Search for coffee shops in each neighborhood in Toronto

In [50]:
toronto_coffee_venues = getNearbyCategory(names=df_toronto['Neighbourhood'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude'],
                                   section='coffee'
                                  )
toronto_coffee_venues.shape

(773, 8)

Let's also get nearby food venus just for fun

In [51]:
toronto_food_venues = getNearbyCategory(names=df_toronto['Neighbourhood'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude'],
                                   section='food'
                                  )
toronto_food_venues.shape

(1358, 8)

Now let's put the coffee venues on the map

In [52]:
# create map of Toronto using latitude and longitude values
latitude = 43.6532
longitude = -79.3832
map_toronto_coffee = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, venue in zip(toronto_coffee_venues['Venue Latitude'], toronto_coffee_venues['Venue Longitude'], toronto_coffee_venues['Venue']):
    label = '{}'.format(venue)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(    
       [lat, lng],         
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_coffee)  
    
map_toronto_coffee

put the food place on the map

In [53]:
# create map of Toronto using latitude and longitude values
latitude = 43.6532
longitude = -79.3832
map_toronto_food = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, venue in zip(toronto_food_venues['Venue Latitude'], toronto_food_venues['Venue Longitude'], toronto_food_venues['Venue']):
    label = '{}'.format(venue)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(    
       [lat, lng],         
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_food)  
    
map_toronto_food

## 7. Search Next Venues from Coffee Shops

Next, we write a function the get next venue locations for a given venue ID

In [54]:
def getNextVenue(names, latitudes, longitudes, venue_IDs):
    
    venues_list=[]
    for name, lat, lng, venue_ID in zip(names, latitudes, longitudes, venue_IDs):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/{}/nextvenues?&client_id={}&client_secret={}&v={}'.format(
            venue_ID,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION)

        
        # make the GET request
        results = requests.get(url).json()["response"]
        #print(results)
        if results:
            results = results['nextVenues']['items']
        
            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['name'], 
                v['location']['lat'], 
                v['location']['lng'],  
                v['categories'][0]['name']) for v in results])
            
            
    #print(venues_list)
    next_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    next_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(next_venues)

let's see what we result we get in the Garden District

In [55]:
Garden_index = toronto_coffee_venues['Neighbourhood'].str.contains('Garden District')
toronto_coffee_venues.loc[Garden_index].shape

(49, 8)

In [56]:
toronto_coffee_next = getNextVenue(names = toronto_coffee_venues.loc[Garden_index ,'Neighbourhood'],
                              latitudes = toronto_coffee_venues.loc[Garden_index ,'Neighbourhood Latitude'],
                              longitudes = toronto_coffee_venues.loc[Garden_index ,'Neighbourhood Longitude'],
                              venue_IDs = toronto_coffee_venues.loc[Garden_index ,'Venue ID'])

In [57]:
toronto_coffee_next.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Ryerson, Garden District",43.657162,-79.378937,CF Toronto Eaton Centre,43.654877,-79.380633,Shopping Mall
1,"Ryerson, Garden District",43.657162,-79.378937,Tacos 101,43.656636,-79.376968,Taco Place
2,"Ryerson, Garden District",43.657162,-79.378937,Metro,43.658404,-79.376748,Supermarket
3,"Ryerson, Garden District",43.657162,-79.378937,CF Toronto Eaton Centre,43.654877,-79.380633,Shopping Mall
4,"Ryerson, Garden District",43.657162,-79.378937,Cineplex Cinemas Yonge-Dundas,43.656126,-79.38039,Movie Theater


In [58]:
toronto_coffee_next.groupby('Venue Category').count().sort_values(by='Neighbourhood',ascending=False).head(5)

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Shopping Mall,22,22,22,22,22,22
Plaza,18,18,18,18,18,18
Movie Theater,10,10,10,10,10,10
Electronics Store,6,6,6,6,6,6
Supermarket,4,4,4,4,4,4


Shopping Mall seems to be most popular next venue, closely followed by the Plaza. Movie Theater is also popular

In [59]:
toronto_coffee_next.groupby('Venue').count().sort_values(by='Neighbourhood',ascending=False)

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue Latitude,Venue Longitude,Venue Category
Venue,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CF Toronto Eaton Centre,22,22,22,22,22,22
Yonge-Dundas Square,15,15,15,15,15,15
Cineplex Cinemas Yonge-Dundas,8,8,8,8,8,8
Nathan Phillips Square,3,3,3,3,3,3
Loblaws,3,3,3,3,3,3
Apple Eaton Centre,3,3,3,3,3,3
Best Buy,3,3,3,3,3,3
Ryerson Theatre,2,2,2,2,2,2
Union Station,2,2,2,2,2,2
Salad King,2,2,2,2,2,2


Above result shows that the shopping mall CF Toronto Eaton Centre and the plaza is Yonge-Dundas Square

Let's do the same search for the Victoria Hotel neighborhood

In [60]:
Victoria_index = toronto_coffee_venues['Neighbourhood'].str.contains('Victoria Hotel')
toronto_coffee_venues.loc[Victoria_index].shape

(98, 8)

In [61]:
toronto_coffee_next2 = getNextVenue(names = toronto_coffee_venues.loc[Victoria_index ,'Neighbourhood'],
                              latitudes = toronto_coffee_venues.loc[Victoria_index ,'Neighbourhood Latitude'],
                              longitudes = toronto_coffee_venues.loc[Victoria_index ,'Neighbourhood Longitude'],
                              venue_IDs = toronto_coffee_venues.loc[Victoria_index ,'Venue ID'])

In [62]:
toronto_coffee_next2.groupby('Venue').count().sort_values(by='Neighbourhood',ascending=False).head(10)

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue Latitude,Venue Longitude,Venue Category
Venue,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CF Toronto Eaton Centre,28,28,28,28,28,28
Union Station,26,26,26,26,26,26
First Canadian Place,10,10,10,10,10,10
Hudson's Bay,9,9,9,9,9,9
Brookfield Place,8,8,8,8,8,8
CN Tower,7,7,7,7,7,7
St. Lawrence Market (South Building),6,6,6,6,6,6
Scotiabank Arena,5,5,5,5,5,5
Nathan Phillips Square,4,4,4,4,4,4
Hockey Hall Of Fame (Hockey Hall of Fame),4,4,4,4,4,4


In [63]:
toronto_coffee_next2.groupby('Venue Category').count().sort_values(by='Neighbourhood',ascending=False).head(5)

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Shopping Mall,37,37,37,37,37,37
Train Station,26,26,26,26,26,26
Building,10,10,10,10,10,10
Department Store,9,9,9,9,9,9
Monument / Landmark,7,7,7,7,7,7


The shopping mall is also the most popular next venue for this neighborhood as well. Howevere, it has two possible shopping mall destinations: CF Toronto Eaton Centre and First Canadian Place. THe Train Station is the next popular next venue category

#### Let's apply the next venue function on all the toronto neighborhoods now.

In [64]:
toronto_coffee_next = getNextVenue(names = toronto_coffee_venues['Neighbourhood'],
                              latitudes = toronto_coffee_venues['Neighbourhood Latitude'],
                              longitudes = toronto_coffee_venues['Neighbourhood Longitude'],
                              venue_IDs = toronto_coffee_venues['Venue ID'])

In [65]:
toronto_coffee_next.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",95,95,95,95,95,95
Berczy Park,51,51,51,51,51,51
"Brockton, Exhibition Place, Parkdale Village",3,3,3,3,3,3
"Cabbagetown, St. James Town",12,12,12,12,12,12
Central Bay Street,55,55,55,55,55,55
"Chinatown, Grange Park, Kensington Market",53,53,53,53,53,53
Christie,8,8,8,8,8,8
Church and Wellesley,60,60,60,60,60,60
"Commerce Court, Victoria Hotel",148,148,148,148,148,148
Davisville,5,5,5,5,5,5


Some neighbuorhoods have little data available. Drop the neighbourhoods wih less than 10 search results

In [66]:
drop_list = toronto_coffee_next.groupby('Neighbourhood').count()['Venue Category']

In [67]:
drop_list = drop_list[drop_list<10].index.tolist()
print(drop_list)

['Brockton, Exhibition Place, Parkdale Village', 'Christie', 'Davisville', 'Harbord, University of Toronto', 'High Park, The Junction South', 'North Toronto West', 'Parkdale, Roncesvalles', 'Runnymede, Swansea', 'The Annex, North Midtown, Yorkville']


In [68]:
for neighbourhood in drop_list:
    drop_index = toronto_coffee_next[toronto_coffee_next['Neighbourhood']==neighbourhood].index.tolist()
    toronto_coffee_next.drop(drop_index, inplace=True)
    toronto_coffee_next.reset_index()

In [69]:
toronto_coffee_next.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",95,95,95,95,95,95
Berczy Park,51,51,51,51,51,51
"Cabbagetown, St. James Town",12,12,12,12,12,12
Central Bay Street,55,55,55,55,55,55
"Chinatown, Grange Park, Kensington Market",53,53,53,53,53,53
Church and Wellesley,60,60,60,60,60,60
"Commerce Court, Victoria Hotel",148,148,148,148,148,148
"Design Exchange, Toronto Dominion Centre",119,119,119,119,119,119
"First Canadian Place, Underground city",149,149,149,149,149,149
"Harbourfront East, Toronto Islands, Union Station",63,63,63,63,63,63


In [70]:
# one hot encoding
toronto_coffee_next_onehot = pd.get_dummies(toronto_coffee_next[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_coffee_next_onehot['Neighbourhood'] = toronto_coffee_next['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_coffee_next_onehot.columns[-1]] + list(toronto_coffee_next_onehot.columns[:-1])
toronto_coffee_next_onehot = toronto_coffee_next_onehot[fixed_columns]

toronto_coffee_next_onehot.head()

Unnamed: 0,Neighbourhood,American Restaurant,Antique Shop,Aquarium,Art Gallery,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,...,Supermarket,Taco Place,Thai Restaurant,Theater,Thrift / Vintage Store,Train Station,University,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [71]:
toronto_coffee_grouped = toronto_coffee_next_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_coffee_grouped

Unnamed: 0,Neighbourhood,American Restaurant,Antique Shop,Aquarium,Art Gallery,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,...,Supermarket,Taco Place,Thai Restaurant,Theater,Thrift / Vintage Store,Train Station,University,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,"Adelaide, King, Richmond",0.010526,0.0,0.0,0.010526,0.0,0.0,0.0,0.010526,0.0,...,0.0,0.0,0.010526,0.021053,0.0,0.115789,0.010526,0.0,0.0,0.0
1,Berczy Park,0.019608,0.019608,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.098039,0.0,0.0,0.0,0.0
2,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Central Bay Street,0.0,0.0,0.0,0.072727,0.0,0.0,0.0,0.0,0.018182,...,0.018182,0.0,0.0,0.0,0.0,0.054545,0.018182,0.0,0.0,0.0
4,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.09434,0.0,0.056604,0.169811,0.0,0.075472,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,...,0.133333,0.0,0.0,0.066667,0.0,0.033333,0.0,0.0,0.0,0.0
6,"Commerce Court, Victoria Hotel",0.006757,0.0,0.006757,0.0,0.0,0.0,0.0,0.006757,0.0,...,0.0,0.0,0.0,0.013514,0.0,0.175676,0.0,0.0,0.006757,0.0
7,"Design Exchange, Toronto Dominion Centre",0.016807,0.0,0.008403,0.0,0.0,0.0,0.0,0.008403,0.0,...,0.0,0.0,0.0,0.0,0.0,0.210084,0.0,0.0,0.008403,0.0
8,"First Canadian Place, Underground city",0.013423,0.0,0.006711,0.0,0.0,0.0,0.0,0.006711,0.0,...,0.0,0.0,0.0,0.013423,0.0,0.181208,0.0,0.0,0.006711,0.0
9,"Harbourfront East, Toronto Islands, Union Station",0.0,0.0,0.031746,0.015873,0.0,0.0,0.0,0.0,0.0,...,0.031746,0.0,0.0,0.0,0.0,0.190476,0.0,0.0,0.0,0.0


#### Limit to top3 next venue categories for each neighbourhood

In [72]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_coffee_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_coffee_venues_sorted['Neighbourhood'] = toronto_coffee_grouped['Neighbourhood']

for ind in np.arange(toronto_coffee_grouped.shape[0]):
    neighbourhoods_coffee_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_coffee_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_coffee_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,"Adelaide, King, Richmond",Shopping Mall,Train Station,Building
1,Berczy Park,Farmers Market,Shopping Mall,Train Station
2,"Cabbagetown, St. James Town",Farm,Gastropub,Grocery Store
3,Central Bay Street,Shopping Mall,Plaza,Art Gallery
4,"Chinatown, Grange Park, Kensington Market",Bakery,Furniture / Home Store,Café
5,Church and Wellesley,Shopping Mall,Supermarket,Gay Bar
6,"Commerce Court, Victoria Hotel",Shopping Mall,Train Station,Building
7,"Design Exchange, Toronto Dominion Centre",Shopping Mall,Train Station,Building
8,"First Canadian Place, Underground city",Shopping Mall,Train Station,Monument / Landmark
9,"Harbourfront East, Toronto Islands, Union Station",Train Station,Basketball Stadium,Shopping Mall


Print out all next venues that correspond to the top3 categories in all the neighbourhoods

In [73]:
coffee_next_venues= neighbourhoods_coffee_venues_sorted.iloc[:,1:].values.tolist()
flatten = lambda l: [item for sublist in l for item in sublist]
coffee_next_venues = set(flatten(coffee_next_venues))
list(coffee_next_venues)

['Bookstore',
 'Gastropub',
 'Café',
 'Event Space',
 'Greek Restaurant',
 'Ice Cream Shop',
 'Farm',
 'Train Station',
 'Art Gallery',
 'Basketball Stadium',
 'Farmers Market',
 'Supermarket',
 'Gay Bar',
 'Shopping Mall',
 'Bakery',
 'Grocery Store',
 'Park',
 'Furniture / Home Store',
 'Movie Theater',
 'Building',
 'Pub',
 'Plaza',
 'Vietnamese Restaurant',
 'Historic Site',
 'Monument / Landmark',
 'Coffee Shop',
 'Pharmacy',
 'Breakfast Spot']

## 8. Cluster Neighbourhoods based on Next Venue

In [74]:
# set number of clusters
kclusters = 5
toronto_coffee_grouped_clustering = toronto_coffee_grouped.drop('Neighbourhood', 1)


from sklearn.preprocessing import StandardScaler
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(StandardScaler().fit_transform(toronto_coffee_grouped_clustering))

# check cluster labels generated for each row in the dataframe
kmeans.labels_ 

array([4, 0, 0, 0, 1, 0, 4, 4, 4, 3, 0, 2, 0, 0, 0, 0, 0])

In [75]:
pd.DataFrame(StandardScaler().fit_transform(toronto_coffee_grouped_clustering[list(coffee_next_venues)]))

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,18,19,20,21,22,23,24,25,26,27
0,0.358991,-0.294984,-0.220923,-0.25,-0.25,-0.327276,-0.25,0.521383,-0.031686,-0.259318,...,0.501716,2.251653,-0.432002,0.777881,-0.25,-0.349414,1.339062,-0.063737,-0.346658,-0.293388
1,-0.47703,-0.294984,-0.558534,-0.25,-0.25,-0.327276,-0.25,0.287081,-0.422224,0.86101,...,-0.509766,-0.041355,-0.432002,-0.778916,-0.25,0.296975,0.502448,-0.701387,0.415448,-0.293388
2,-0.47703,3.981304,2.114223,-0.25,-0.25,-0.327276,4.0,-1.007028,-0.422224,-0.670373,...,-0.509766,-0.639805,-0.432002,-0.778916,-0.25,-0.349414,-0.867867,-0.701387,0.909983,-0.293388
3,-0.47703,-0.294984,0.024613,-0.25,-0.25,-0.327276,-0.25,-0.287033,2.276036,-0.670373,...,0.072603,-0.639805,-0.432002,1.910097,-0.25,-0.349414,-0.232539,-0.150689,-0.404582,-0.293388
4,-0.47703,-0.294984,3.072381,-0.25,-0.25,-0.327276,-0.25,-1.007028,3.077877,-0.670373,...,-0.509766,-0.639805,-0.432002,-0.778916,-0.25,-0.349414,-0.867867,-0.701387,-0.771438,0.464727
5,0.84667,-0.294984,-0.023983,-0.25,-0.25,-0.327276,-0.25,-0.567031,-0.422224,-0.670373,...,1.625586,-0.639805,0.859618,-0.231154,-0.25,-0.349414,-0.867867,1.317839,0.573699,-0.293388
6,-0.208712,-0.121621,-0.341824,-0.25,-0.25,-0.327276,-0.25,1.311876,-0.422224,0.648893,...,-0.076924,1.422421,-0.432002,-0.001686,-0.25,-0.349414,0.784844,-0.496736,-0.498775,-0.293388
7,-0.47703,-0.294984,-0.289013,-0.25,-0.25,-0.327276,-0.25,1.766064,-0.422224,0.970395,...,-0.509766,1.668502,-0.432002,-0.08846,-0.25,-0.349414,0.893967,-0.446863,-0.432328,-0.293388
8,-0.210513,-0.294984,-0.343279,-0.25,-0.25,-0.327276,-0.25,1.384903,-0.422224,0.640039,...,0.135139,1.613419,-0.432002,0.103385,-0.25,-0.349414,1.946337,-0.498109,-0.500605,-0.293388
9,-0.47703,-0.294984,-0.558534,-0.25,-0.25,-0.327276,-0.25,1.507242,0.166682,3.048701,...,-0.509766,-0.639805,-0.432002,-0.518077,-0.25,-0.349414,1.90539,-0.220619,-0.771438,-0.293388


In [76]:
# add clustering labels
neighbourhoods_coffee_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_coffee_merged = df_toronto

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_coffee_merged = df_toronto.join(neighbourhoods_coffee_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood', how='inner')

toronto_coffee_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,0,Historic Site,Pub,Coffee Shop
1,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,0,Shopping Mall,Plaza,Movie Theater
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Shopping Mall,Farmers Market,Plaza
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Farmers Market,Shopping Mall,Train Station
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Shopping Mall,Plaza,Art Gallery


In [77]:
toronto_coffee_merged.reset_index(drop=True)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,0,Historic Site,Pub,Coffee Shop
1,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,0,Shopping Mall,Plaza,Movie Theater
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Shopping Mall,Farmers Market,Plaza
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Farmers Market,Shopping Mall,Train Station
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Shopping Mall,Plaza,Art Gallery
5,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568,4,Shopping Mall,Train Station,Building
6,M5J,Downtown Toronto,"Harbourfront East, Toronto Islands, Union Station",43.640816,-79.381752,3,Train Station,Basketball Stadium,Shopping Mall
7,M6J,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975,2,Park,Vietnamese Restaurant,Event Space
8,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Park,Bookstore
9,M5K,Downtown Toronto,"Design Exchange, Toronto Dominion Centre",43.647177,-79.381576,4,Shopping Mall,Train Station,Building


In [78]:
# create map
map_coffee_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_coffee_merged['Latitude'], toronto_coffee_merged['Longitude'], toronto_coffee_merged['Neighbourhood'], toronto_coffee_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_coffee_clusters)
       
map_coffee_clusters

## 9. Analyze Clusters based on Next Venue

#### Cluster 1

In [79]:
toronto_coffee_merged.loc[toronto_coffee_merged['Cluster Labels'] == 0, toronto_coffee_merged.columns[[1] + list(range(3, toronto_coffee_merged.shape[1]))]]

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Downtown Toronto,43.65426,-79.360636,0,Historic Site,Pub,Coffee Shop
1,Downtown Toronto,43.657162,-79.378937,0,Shopping Mall,Plaza,Movie Theater
2,Downtown Toronto,43.651494,-79.375418,0,Shopping Mall,Farmers Market,Plaza
4,Downtown Toronto,43.644771,-79.373306,0,Farmers Market,Shopping Mall,Train Station
5,Downtown Toronto,43.657952,-79.387383,0,Shopping Mall,Plaza,Art Gallery
11,East Toronto,43.679557,-79.352188,0,Greek Restaurant,Park,Bookstore
16,East Toronto,43.659526,-79.340923,0,Pharmacy,Ice Cream Shop,Breakfast Spot
33,Downtown Toronto,43.646435,-79.374846,0,Shopping Mall,Train Station,Farmers Market
34,Downtown Toronto,43.667967,-79.367675,0,Farm,Gastropub,Grocery Store
36,Downtown Toronto,43.66586,-79.38316,0,Shopping Mall,Supermarket,Gay Bar


#### Cluster 2

In [80]:
toronto_coffee_merged.loc[toronto_coffee_merged['Cluster Labels'] == 1, toronto_coffee_merged.columns[[1] + list(range(3, toronto_coffee_merged.shape[1]))]]

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
29,Downtown Toronto,43.653206,-79.400049,1,Bakery,Furniture / Home Store,Café


#### Cluster 3

In [81]:
toronto_coffee_merged.loc[toronto_coffee_merged['Cluster Labels'] == 2, toronto_coffee_merged.columns[[1] + list(range(3, toronto_coffee_merged.shape[1]))]]

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
10,West Toronto,43.647927,-79.41975,2,Park,Vietnamese Restaurant,Event Space


#### Cluster 4

In [82]:
toronto_coffee_merged.loc[toronto_coffee_merged['Cluster Labels'] == 3, toronto_coffee_merged.columns[[1] + list(range(3, toronto_coffee_merged.shape[1]))]]

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
9,Downtown Toronto,43.640816,-79.381752,3,Train Station,Basketball Stadium,Shopping Mall


#### Cluster 5

In [83]:
toronto_coffee_merged.loc[toronto_coffee_merged['Cluster Labels'] == 4, toronto_coffee_merged.columns[[1] + list(range(3, toronto_coffee_merged.shape[1]))]]

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
7,Downtown Toronto,43.650571,-79.384568,4,Shopping Mall,Train Station,Building
12,Downtown Toronto,43.647177,-79.381576,4,Shopping Mall,Train Station,Building
15,Downtown Toronto,43.648198,-79.379817,4,Shopping Mall,Train Station,Building
35,Downtown Toronto,43.648429,-79.38228,4,Shopping Mall,Train Station,Monument / Landmark
