## Problem Definition
### In this scenario, Person A wants to open an Italian Restaurant in Toronto. The objective is to solve the problem by using location data from Foursquare and by doing a study of restaurants on all neighborhoods on interest through a combination of location profiling and machine learning.

## Data Understanding
### We are going to explore the neighborhoods of Toronto and their venues in order to gain a better understanding on existing places and to narrow down our neighborhood choices.



In [2]:
#Install beautifulsoup
!conda install -c conda-forge beautifulsoup4 --yes

# Import packages
from bs4 import BeautifulSoup
import urllib as ur
import requests as rq

Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs:
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    beautifulsoup4-4.7.1       |        py36_1001         140 KB  conda-forge
    conda-4.6.7                |           py36_0         869 KB  conda-forge
    openssl-1.1.1b             |       h14c3975_0         4.0 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         5.0 MB

The following packages will be UPDATED:

  beautifulsoup4      anaconda::beautifulsoup4-4.7.1-py36_1 --> conda-forge::beautifulsoup4-4.7.1-py36_1001
  conda                                        4.6.4-py36_0 --> 4.6.7-py36_0
  openssl                              1.1.1a-h14c3975_1000 --> 1.1.1b-h14c3975_0



Dow

In [6]:

# Use Beautiful soup
source = rq.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(source, 'lxml')
article = soup.find('table', class_='wikitable sortable')

In [7]:
# Preprocessing to get the right data frame
codes_list=[]
borough_list=[]
neighborhood_list=[]
i=1
for tag in soup.table.find_all('td'):
    if i == 1:
        codes_list.append(tag.text)
    if i == 2:
        borough_list.append(tag.text)
    if i == 3: 
        row = tag.text
        row = row.replace('\n', '')
        neighborhood_list.append(row)
    i = i+1
    if i==4:
        i=1
        
len(neighborhood_list[0:])

289

In [8]:
# Convert list to pandas dataframe
import pandas as pd
Canada_Codes = pd.DataFrame(
    {'Postcode': codes_list,
     'Borough': borough_list,
     'Neighbourhood': neighborhood_list
    })

In [9]:
Canada_Codes2 = Canada_Codes
Canada_Codes2.drop(Canada_Codes2[Canada_Codes2['Borough']=="Not assigned"].index,axis=0, inplace=True)
Canada_Codes2=Canada_Codes2.groupby("Postcode").agg(lambda x:','.join(set(x)))
Canada_Codes2.loc[Canada_Codes2['Neighbourhood']=="Not assigned",'Neighbourhood']=Canada_Codes2.loc[Canada_Codes2['Neighbourhood']=="Not assigned",'Borough']
Canada_Codes2.index.name = 'Postcode'
Canada_Codes2.reset_index(inplace=True)
Canada_Codes2.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern,Rouge"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Morningside,Guildwood,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"Ionview,Kennedy Park,East Birchmount Park"
7,M1L,Scarborough,"Golden Mile,Clairlea,Oakridge"
8,M1M,Scarborough,"Cliffside,Scarborough Village West,Cliffcrest"
9,M1N,Scarborough,"Cliffside West,Birch Cliff"


In [10]:
# Read Geo Data
geo_data=pd.read_csv("https://cocl.us/Geospatial_data")
geo_data.head(10)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


In [11]:
# Create new columns and add it original frame
Canada_Codes2['Latitude']=geo_data['Latitude'].values
Canada_Codes2['Longitude']=geo_data['Longitude'].values

In [12]:
## Final Dataframe with the neighborhood along with latitude and longitude values
Canada_Codes2.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern,Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Morningside,Guildwood,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Ionview,Kennedy Park,East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile,Clairlea,Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside,Scarborough Village West,Cliffcrest",43.716316,-79.239476
9,M1N,Scarborough,"Cliffside West,Birch Cliff",43.692657,-79.264848


In [13]:
## Find out how many rows we have.
Canada_Codes2.shape

(103, 5)

In [14]:
import json # library to handle JSON files

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Importing to use the Foursquare API lab
import folium # map rendering library

print('Foursquare and map plotting Libraries imported.')

Foursquare and map plotting Libraries imported.


In [15]:
print('The dataframe has {} boroughs spanning across {} Postcodes and {}  neighborhood groups.'.format(
        len(Canada_Codes2['Borough'].unique()), len(Canada_Codes2['Postcode'].unique()),
        Canada_Codes2.shape[0]
    )
)

The dataframe has 11 boroughs spanning across 103 Postcodes and 103  neighborhood groups.


In [16]:
import time 
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="capstone_agent")

# Get Toronto geo info
address = 'Toronto, Ontario, Canada'
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [17]:
# Get Toronto codes
Toronto_Codes = Canada_Codes2.drop(Canada_Codes2[Canada_Codes2['Borough'].str.contains("Toronto")==False].index, axis=0, inplace=False)

#Reset Index
Toronto_Codes.index = pd.RangeIndex(len(Toronto_Codes.index))

#to view Dataframe
Toronto_Codes.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"Riverdale,The Danforth West",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
7,M4S,Central Toronto,Davisville,43.704324,-79.38879
8,M4T,Central Toronto,"Summerhill East,Moore Park",43.689574,-79.38316
9,M4V,Central Toronto,"Summerhill West,Forest Hill SE,Deer Park,South...",43.686412,-79.400049


In [18]:
## Get the json results based on Toronot lattude and longitude values
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id=W2RZKFIYGSSLPRR3OE14Q2UYYK5MQZYBV42IYCADG5YZ4PZR&client_secret=2YHEQZDDQ2HBTLH3PFONKEW2DAMNXKSW1HZP5VECDLNFRWWC&ll=43.653963,-79.3872076&v=20180604&radius=500&limit=30'
# get results
results = rq.get(url).json()

In [19]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [53]:
# Get nearby venues for Toronto 
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Textile Museum of Canada,Art Museum,43.654396,-79.3865
2,Sansotei Ramen 三草亭,Ramen Restaurant,43.655157,-79.386501
3,Japango,Sushi Restaurant,43.655268,-79.385165
4,Cafe Plenty,Café,43.654571,-79.38945


In [21]:
# Store Credentials
CLIENT_ID = 'W2RZKFIYGSSLPRR3OE14Q2UYYK5MQZYBV42IYCADG5YZ4PZR' # your Foursquare ID
CLIENT_SECRET = '2YHEQZDDQ2HBTLH3PFONKEW2DAMNXKSW1HZP5VECDLNFRWWC' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

30 venues were returned by Foursquare.


In [22]:
# Get venue and other info for all neighborhoods of Tornoto
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = rq.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [55]:
toronto_venues = pd.DataFrame(getNearbyVenues(names=Toronto_Codes['Neighbourhood'],
                                   latitudes=Toronto_Codes['Latitude'],
                                   longitudes=Toronto_Codes['Longitude']
                                  ))
print(toronto_venues.shape)


(826, 7)


In [24]:
## View the top 10 to make sure it is working
toronto_venues.head(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
1,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
2,The Beaches,43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"Riverdale,The Danforth West",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant
5,"Riverdale,The Danforth West",43.679557,-79.352188,Dolce Gelato,43.677773,-79.351187,Ice Cream Shop
6,"Riverdale,The Danforth West",43.679557,-79.352188,MenEssentials,43.67782,-79.351265,Cosmetics Shop
7,"Riverdale,The Danforth West",43.679557,-79.352188,Messini Authentic Gyros,43.677827,-79.350569,Greek Restaurant
8,"Riverdale,The Danforth West",43.679557,-79.352188,Mezes,43.677962,-79.350196,Greek Restaurant
9,"Riverdale,The Danforth West",43.679557,-79.352188,Cafe Fiorentina,43.677743,-79.350115,Italian Restaurant


In [56]:
# Show the count of venues for each neighborhood. 30 is the maximum limit on number of venues per location request
toronto_venues.groupby('Neighborhood').count().iloc[0:,4]

Neighborhood
Adelaide,Richmond,King                                                                                  30
Berczy Park                                                                                             30
Business Reply Mail Processing Centre 969 Eastern                                                       14
Central Bay Street                                                                                      30
Christie                                                                                                15
Church and Wellesley                                                                                    30
Commerce Court,Victoria Hotel                                                                           30
Davisville                                                                                              30
Davisville North                                                                                        10
Design Exchange,Toronto 

In [57]:
## Explore all the venues returned for the Neighborhood of St.James Town
toronto_venues[toronto_venues.Neighborhood=='St. James Town']


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
289,St. James Town,43.651494,-79.375418,Terroni,43.650927,-79.375602,Italian Restaurant
290,St. James Town,43.651494,-79.375418,Gyu-Kaku Japanese BBQ,43.651422,-79.375047,Japanese Restaurant
291,St. James Town,43.651494,-79.375418,GEORGE Restaurant,43.653346,-79.374445,Restaurant
292,St. James Town,43.651494,-79.375418,Crepe TO,43.650063,-79.374587,Creperie
293,St. James Town,43.651494,-79.375418,Fahrenheit Coffee,43.652384,-79.372719,Coffee Shop
294,St. James Town,43.651494,-79.375418,Triple A Bar (AAA),43.651658,-79.37272,BBQ Joint
295,St. James Town,43.651494,-79.375418,Pearl Diver,43.651481,-79.3736,Gastropub
296,St. James Town,43.651494,-79.375418,Hogtown Smoke,43.649287,-79.374689,Food Truck
297,St. James Town,43.651494,-79.375418,Mystic Muffin,43.652484,-79.372655,Middle Eastern Restaurant
298,St. James Town,43.651494,-79.375418,St James Anglican Cathedral,43.65011,-79.374292,Church


In [58]:
##Explore the dataframe

print('The dataframe has {} neighborhoods with a total of  {} unique venues spanning across {} venue categories .'.format(
        len(toronto_venues['Neighborhood'].unique()), len(toronto_venues['Venue'].unique()),
        len(toronto_venues['Venue Category'].unique())
    )
)


The dataframe has 38 neighborhoods with a total of  674 unique venues spanning across 187 venue categories .


## Next Steps
### Now we have the data pulled and ready to explore on what would be to explore all the venues in each neighborhood. We can then find the most common venue categories in each neighborhood that are related to restaurants. We can then build a frequency distribution of all restaurant related venue categories for each neighborhood. We can cluster the neighborhoods mean values of frequency for all the relevant restaurant categories.  This will in turn group the neighborhoods based on the presence of certain similar venue categories in one cluster and dissimilar venue categories in another cluster and so on. We can then profile each clusters to understand their behavior. Then depending on the nature of the restaurant type we are interested in opening like Italian etc., we can find the cluster of neighborhoods with least competition for the Italian cuisine but also a hot spot for the restaurants as a whole. 

In [28]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(Toronto_Codes['Latitude'], Toronto_Codes['Longitude'], Toronto_Codes['Borough'], Toronto_Codes['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [59]:
## Remove space from column names
toronto_venues_restaurants = toronto_venues
toronto_venues_restaurants.columns = [c.replace(' ', '_') for c in toronto_venues_restaurants.columns]
toronto_venues_restaurants.shape

(826, 7)

In [60]:
## Extract only Categories that are Restaurants
toronto_venues_restaurants2= toronto_venues_restaurants.drop(toronto_venues_restaurants[toronto_venues_restaurants['Venue_Category'].str.contains("Restaurant")==False].index, axis=0, inplace=False) 
toronto_venues_restaurants2.head(10)

Unnamed: 0,Neighborhood,Neighborhood_Latitude,Neighborhood_Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category
4,"Riverdale,The Danforth West",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant
7,"Riverdale,The Danforth West",43.679557,-79.352188,Messini Authentic Gyros,43.677827,-79.350569,Greek Restaurant
8,"Riverdale,The Danforth West",43.679557,-79.352188,Mezes,43.677962,-79.350196,Greek Restaurant
9,"Riverdale,The Danforth West",43.679557,-79.352188,Cafe Fiorentina,43.677743,-79.350115,Italian Restaurant
10,"Riverdale,The Danforth West",43.679557,-79.352188,Christina's On The Danforth,43.67824,-79.349185,Greek Restaurant
14,"Riverdale,The Danforth West",43.679557,-79.352188,7 Numbers,43.677062,-79.353934,Italian Restaurant
18,"Riverdale,The Danforth West",43.679557,-79.352188,Rikkochez,43.677267,-79.353274,Restaurant
19,"Riverdale,The Danforth West",43.679557,-79.352188,Pan on the Danforth,43.678263,-79.348648,Greek Restaurant
20,"Riverdale,The Danforth West",43.679557,-79.352188,Astoria Shish Kebob House,43.677689,-79.351892,Greek Restaurant
24,"Riverdale,The Danforth West",43.679557,-79.352188,Ouzeri,43.678193,-79.348908,Greek Restaurant


In [61]:
toronto_venues_restaurants2.shape

(200, 7)

In [62]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues_restaurants2[['Venue_Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues_restaurants2['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]



Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Belgian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Comfort Food Restaurant,Cuban Restaurant,Dim Sum Restaurant,...,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
4,"Riverdale,The Danforth West",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,"Riverdale,The Danforth West",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,"Riverdale,The Danforth West",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,"Riverdale,The Danforth West",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10,"Riverdale,The Danforth West",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [149]:
toronto_onehot.shape


(200, 37)

In [64]:
### Get the mean for each unique venue by neighborhood
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Belgian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Comfort Food Restaurant,Cuban Restaurant,Dim Sum Restaurant,...,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,"Adelaide,Richmond,King",0.142857,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.142857,0.0
1,Berczy Park,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,...,0.0,0.0,0.125,0.25,0.0,0.0,0.125,0.0,0.0,0.0
2,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,...,0.1,0.1,0.0,0.1,0.1,0.0,0.1,0.0,0.1,0.0
4,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.125
6,"Commerce Court,Victoria Hotel",0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.428571,0.142857,0.0,0.0,0.0,0.0,0.0,0.0
7,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.111111,0.111111,0.222222,0.0,0.111111,0.0,0.0,0.0
8,Davisville North,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Design Exchange,Toronto Dominion Centre",0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [65]:

toronto_grouped.shape

(34, 37)

In [66]:

## Top 5 venue for each neighborhood
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,Richmond,King----
                 venue  freq
0     Asian Restaurant  0.29
1     Sushi Restaurant  0.14
2     Greek Restaurant  0.14
3   Seafood Restaurant  0.14
4  American Restaurant  0.14


----Berczy Park----
                venue  freq
0  Seafood Restaurant  0.25
1          Restaurant  0.12
2  Belgian Restaurant  0.12
3  Italian Restaurant  0.12
4     Thai Restaurant  0.12


----Business Reply Mail Processing Centre 969 Eastern----
                       venue  freq
0       Fast Food Restaurant   0.5
1                 Restaurant   0.5
2        American Restaurant   0.0
3      Portuguese Restaurant   0.0
4  Latin American Restaurant   0.0


----Central Bay Street----
                           venue  freq
0             Italian Restaurant   0.2
1     Modern European Restaurant   0.1
2  Vegetarian / Vegan Restaurant   0.1
3                Thai Restaurant   0.1
4             Chinese Restaurant   0.1


----Christie----
                       venue  freq
0         Italian 

In [67]:
## Get most common venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [69]:
import numpy as np
num_top_venues = 7

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,"Adelaide,Richmond,King",Asian Restaurant,American Restaurant,Sushi Restaurant,Greek Restaurant,Seafood Restaurant,Vegetarian / Vegan Restaurant,Cajun / Creole Restaurant
1,Berczy Park,Seafood Restaurant,Italian Restaurant,Comfort Food Restaurant,Restaurant,French Restaurant,Belgian Restaurant,Thai Restaurant
2,Business Reply Mail Processing Centre 969 Eastern,Fast Food Restaurant,Restaurant,Vietnamese Restaurant,Dim Sum Restaurant,French Restaurant,Falafel Restaurant,Ethiopian Restaurant
3,Central Bay Street,Italian Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,Seafood Restaurant,Ramen Restaurant,Portuguese Restaurant
4,Christie,Italian Restaurant,Restaurant,Dim Sum Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant
5,Church and Wellesley,Vietnamese Restaurant,Theme Restaurant,Indian Restaurant,Japanese Restaurant,Ethiopian Restaurant,Restaurant,Ramen Restaurant
6,"Commerce Court,Victoria Hotel",Restaurant,American Restaurant,Seafood Restaurant,Japanese Restaurant,New American Restaurant,Caribbean Restaurant,Cajun / Creole Restaurant
7,Davisville,Italian Restaurant,Sushi Restaurant,Indian Restaurant,Restaurant,Seafood Restaurant,Greek Restaurant,Thai Restaurant
8,Davisville North,Asian Restaurant,Restaurant,Vietnamese Restaurant,Dim Sum Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant
9,"Design Exchange,Toronto Dominion Centre",Restaurant,American Restaurant,Japanese Restaurant,Dim Sum Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant


In [86]:

# Perform Clustering
kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood',1)
#print(toronto_grouped_clustering)
#print(toronto_grouped)
# run k-means clustering
kmeans = KMeans(init = "k-means++", n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
labels = kmeans.labels_[0:35] 
print(labels)

[1 1 0 1 0 1 0 1 0 0 1 0 3 1 1 1 1 2 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1]


In [125]:
toronto_merged = Toronto_Codes
print(toronto_merged.shape)
toronto_merged.rename(columns={'Neighbourhood': 'Neighborhood'}, inplace=True)
toronto_merged.head(4)

(38, 5)


Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"Riverdale,The Danforth West",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923


In [119]:
# add neighborhood in a new frame
toronto_merged1 = toronto_grouped[['Neighborhood']]
toronto_merged1.shape

(34, 1)

In [131]:
# add clustering labels
toronto_merged1['Cluster Labels'] = labels.tolist()
toronto_merged1.shape

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


(34, 2)

In [133]:
# merge toronto_grouped with toronto_data to common venue info for all neighborhoods
toronto_merged2 = toronto_merged1.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
toronto_merged2.head(5)

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,"Adelaide,Richmond,King",1,Asian Restaurant,American Restaurant,Sushi Restaurant,Greek Restaurant,Seafood Restaurant,Vegetarian / Vegan Restaurant,Cajun / Creole Restaurant
1,Berczy Park,1,Seafood Restaurant,Italian Restaurant,Comfort Food Restaurant,Restaurant,French Restaurant,Belgian Restaurant,Thai Restaurant
2,Business Reply Mail Processing Centre 969 Eastern,0,Fast Food Restaurant,Restaurant,Vietnamese Restaurant,Dim Sum Restaurant,French Restaurant,Falafel Restaurant,Ethiopian Restaurant
3,Central Bay Street,1,Italian Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,Seafood Restaurant,Ramen Restaurant,Portuguese Restaurant
4,Christie,0,Italian Restaurant,Restaurant,Dim Sum Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant


In [134]:
toronto_merged2.shape

(34, 9)

In [135]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged3 = toronto_merged2.join(toronto_merged.set_index('Neighborhood'), on='Neighborhood')
toronto_merged3.shape

(34, 13)

In [136]:
toronto_merged3.head(34)

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,Postcode,Borough,Latitude,Longitude
0,"Adelaide,Richmond,King",1,Asian Restaurant,American Restaurant,Sushi Restaurant,Greek Restaurant,Seafood Restaurant,Vegetarian / Vegan Restaurant,Cajun / Creole Restaurant,M5H,Downtown Toronto,43.650571,-79.384568
1,Berczy Park,1,Seafood Restaurant,Italian Restaurant,Comfort Food Restaurant,Restaurant,French Restaurant,Belgian Restaurant,Thai Restaurant,M5E,Downtown Toronto,43.644771,-79.373306
2,Business Reply Mail Processing Centre 969 Eastern,0,Fast Food Restaurant,Restaurant,Vietnamese Restaurant,Dim Sum Restaurant,French Restaurant,Falafel Restaurant,Ethiopian Restaurant,M7Y,East Toronto,43.662744,-79.321558
3,Central Bay Street,1,Italian Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,Seafood Restaurant,Ramen Restaurant,Portuguese Restaurant,M5G,Downtown Toronto,43.657952,-79.387383
4,Christie,0,Italian Restaurant,Restaurant,Dim Sum Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,M6G,Downtown Toronto,43.669542,-79.422564
5,Church and Wellesley,1,Vietnamese Restaurant,Theme Restaurant,Indian Restaurant,Japanese Restaurant,Ethiopian Restaurant,Restaurant,Ramen Restaurant,M4Y,Downtown Toronto,43.66586,-79.38316
6,"Commerce Court,Victoria Hotel",0,Restaurant,American Restaurant,Seafood Restaurant,Japanese Restaurant,New American Restaurant,Caribbean Restaurant,Cajun / Creole Restaurant,M5L,Downtown Toronto,43.648198,-79.379817
7,Davisville,1,Italian Restaurant,Sushi Restaurant,Indian Restaurant,Restaurant,Seafood Restaurant,Greek Restaurant,Thai Restaurant,M4S,Central Toronto,43.704324,-79.38879
8,Davisville North,0,Asian Restaurant,Restaurant,Vietnamese Restaurant,Dim Sum Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,M4P,Central Toronto,43.712751,-79.390197
9,"Design Exchange,Toronto Dominion Centre",0,Restaurant,American Restaurant,Japanese Restaurant,Dim Sum Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,M5K,Downtown Toronto,43.647177,-79.381576


In [137]:
# Visualize Clusters
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged3['Latitude'], toronto_merged3['Longitude'], toronto_merged3['Neighborhood'], toronto_merged3['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [143]:
## Explore First Cluster
toronto_merged3.loc[toronto_merged3['Cluster Labels'] == 0, toronto_merged3.columns[[1] + list(range(0, toronto_merged3.shape[1]))]]

Unnamed: 0,Cluster Labels,Neighborhood,Cluster Labels.1,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,Postcode,Borough,Latitude,Longitude
2,0,Business Reply Mail Processing Centre 969 Eastern,0,Fast Food Restaurant,Restaurant,Vietnamese Restaurant,Dim Sum Restaurant,French Restaurant,Falafel Restaurant,Ethiopian Restaurant,M7Y,East Toronto,43.662744,-79.321558
4,0,Christie,0,Italian Restaurant,Restaurant,Dim Sum Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,M6G,Downtown Toronto,43.669542,-79.422564
6,0,"Commerce Court,Victoria Hotel",0,Restaurant,American Restaurant,Seafood Restaurant,Japanese Restaurant,New American Restaurant,Caribbean Restaurant,Cajun / Creole Restaurant,M5L,Downtown Toronto,43.648198,-79.379817
8,0,Davisville North,0,Asian Restaurant,Restaurant,Vietnamese Restaurant,Dim Sum Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,M4P,Central Toronto,43.712751,-79.390197
9,0,"Design Exchange,Toronto Dominion Centre",0,Restaurant,American Restaurant,Japanese Restaurant,Dim Sum Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,M5K,Downtown Toronto,43.647177,-79.381576
11,0,"Exhibition Place,Parkdale Village,Brockton",0,Italian Restaurant,Caribbean Restaurant,Restaurant,Dim Sum Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,M6K,West Toronto,43.636847,-79.428191
20,0,"Parkdale,Roncesvalles",0,Italian Restaurant,Restaurant,Eastern European Restaurant,Cuban Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,M6R,West Toronto,43.64896,-79.456325
24,0,"St. James Town,Cabbagetown",0,Restaurant,Italian Restaurant,Japanese Restaurant,Caribbean Restaurant,Indian Restaurant,Taiwanese Restaurant,Thai Restaurant,M4X,Downtown Toronto,43.667967,-79.367675
27,0,"Summerhill East,Moore Park",0,Restaurant,Vietnamese Restaurant,Dim Sum Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,M4T,Central Toronto,43.689574,-79.38316
31,0,"Underground city,First Canadian Place",0,Restaurant,American Restaurant,Japanese Restaurant,Seafood Restaurant,Greek Restaurant,Dim Sum Restaurant,Fast Food Restaurant,M5X,Downtown Toronto,43.648429,-79.38228


In [144]:
## Explore Second Cluster
toronto_merged3.loc[toronto_merged3['Cluster Labels'] == 1, toronto_merged3.columns[[1] + list(range(0, toronto_merged3.shape[1]))]]


Unnamed: 0,Cluster Labels,Neighborhood,Cluster Labels.1,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,Postcode,Borough,Latitude,Longitude
0,1,"Adelaide,Richmond,King",1,Asian Restaurant,American Restaurant,Sushi Restaurant,Greek Restaurant,Seafood Restaurant,Vegetarian / Vegan Restaurant,Cajun / Creole Restaurant,M5H,Downtown Toronto,43.650571,-79.384568
1,1,Berczy Park,1,Seafood Restaurant,Italian Restaurant,Comfort Food Restaurant,Restaurant,French Restaurant,Belgian Restaurant,Thai Restaurant,M5E,Downtown Toronto,43.644771,-79.373306
3,1,Central Bay Street,1,Italian Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,Seafood Restaurant,Ramen Restaurant,Portuguese Restaurant,M5G,Downtown Toronto,43.657952,-79.387383
5,1,Church and Wellesley,1,Vietnamese Restaurant,Theme Restaurant,Indian Restaurant,Japanese Restaurant,Ethiopian Restaurant,Restaurant,Ramen Restaurant,M4Y,Downtown Toronto,43.66586,-79.38316
7,1,Davisville,1,Italian Restaurant,Sushi Restaurant,Indian Restaurant,Restaurant,Seafood Restaurant,Greek Restaurant,Thai Restaurant,M4S,Central Toronto,43.704324,-79.38879
10,1,"Dovercourt Village,Dufferin",1,Middle Eastern Restaurant,Vietnamese Restaurant,Indian Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,M6H,West Toronto,43.669005,-79.442259
13,1,"Harbourfront East,Toronto Islands,Union Station",1,Italian Restaurant,New American Restaurant,Japanese Restaurant,Chinese Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,M5J,Downtown Toronto,43.640816,-79.381752
14,1,"Harbourfront,Regent Park",1,Mexican Restaurant,Italian Restaurant,French Restaurant,Restaurant,Dim Sum Restaurant,Fast Food Restaurant,Falafel Restaurant,M5A,Downtown Toronto,43.65426,-79.360636
15,1,"High Park,The Junction South",1,Mexican Restaurant,Italian Restaurant,Thai Restaurant,Fast Food Restaurant,Cajun / Creole Restaurant,Dumpling Restaurant,French Restaurant,M6P,West Toronto,43.661608,-79.464763
16,1,"Kensington Market,Grange Park,Chinatown",1,Vietnamese Restaurant,Caribbean Restaurant,Mexican Restaurant,Dumpling Restaurant,Comfort Food Restaurant,Vegetarian / Vegan Restaurant,Belgian Restaurant,M5T,Downtown Toronto,43.653206,-79.400049


In [146]:
## Explore Third Cluster
toronto_merged3.loc[toronto_merged3['Cluster Labels'] == 2, toronto_merged3.columns[[1] + list(range(0, toronto_merged3.shape[1]))]]


Unnamed: 0,Cluster Labels,Neighborhood,Cluster Labels.1,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,Postcode,Borough,Latitude,Longitude
17,2,Lawrence Park,2,Dim Sum Restaurant,Vegetarian / Vegan Restaurant,Greek Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,M4N,Central Toronto,43.72802,-79.38879


In [147]:
## Explore Fourth Cluster
toronto_merged3.loc[toronto_merged3['Cluster Labels'] == 3, toronto_merged3.columns[[1] + list(range(0, toronto_merged3.shape[1]))]]


Unnamed: 0,Cluster Labels,Neighborhood,Cluster Labels.1,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,Postcode,Borough,Latitude,Longitude
12,3,"Forest Hill West,Forest Hill North",3,Sushi Restaurant,Vietnamese Restaurant,Dim Sum Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,M5P,Central Toronto,43.696948,-79.411307


In [None]:
## End of Analysis