### Segmenting and Clustering Neighborhoods in Toronto

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

1. Start by creating a new Notebook for this assignment.
2. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a  pandas dataframe like the one shown below:

3. To create the above dataframe:

    3a. The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
    3b. Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
    3c. More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will  notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
    3d. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
    3e. Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
    3f. In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.


4. Submit a link to your Notebook on your Github repository. 

In [2]:
# import libraries 
import numpy as np 
import pandas as pd
import requests 
from bs4 import BeautifulSoup


In [3]:
# target url 

url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
url


'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [4]:
# requests will provide us with our target's HTML, 
# and beautifulsoup4 will parse that data

html_content = requests.get(url).text
soup = BeautifulSoup(html_content, "lxml")

In [5]:
# print soup in order to find table
#print(soup.prettify()) 

In [6]:
# find and print table using soup 
table = soup.find('table',class_='wikitable sortable')
# table 

In [7]:
# find headers 
headers = table.find_all('th')
#headers 

In [8]:
# find table rows 
rows = table.find_all('tr')
#rows 

In [9]:
# clean data and create array 
info = []

for i in rows:
    info.append([ii.text.strip() for ii in i.find_all('td')])
info[1:10]  

[['M1A', 'Not assigned', 'Not assigned'],
 ['M2A', 'Not assigned', 'Not assigned'],
 ['M3A', 'North York', 'Parkwoods'],
 ['M4A', 'North York', 'Victoria Village'],
 ['M5A', 'Downtown Toronto', 'Harbourfront'],
 ['M6A', 'North York', 'Lawrence Heights'],
 ['M6A', 'North York', 'Lawrence Manor'],
 ['M7A', 'Downtown Toronto', "Queen's Park"],
 ['M8A', 'Not assigned', 'Not assigned']]

In [10]:
df = pd.DataFrame(info, columns=['Postal Code', 'Borough', 'Neighbourhood'])
df 

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,,,
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
...,...,...,...
283,M8Z,Etobicoke,Mimico NW
284,M8Z,Etobicoke,The Queensway West
285,M8Z,Etobicoke,Royal York South West
286,M8Z,Etobicoke,South of Bloor


In [11]:
# exclude none from Postal 
df = df[~df['Postal Code'].isnull()]  
df 

Unnamed: 0,Postal Code,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
...,...,...,...
283,M8Z,Etobicoke,Mimico NW
284,M8Z,Etobicoke,The Queensway West
285,M8Z,Etobicoke,Royal York South West
286,M8Z,Etobicoke,South of Bloor


In [14]:
# exclude Not assigned from Borough 
df.drop(df[df['Borough']=="Not assigned"].index,axis=0, inplace=True)

In [15]:
df1 = df.reset_index()

In [16]:
df1.loc[df1['Postal Code']=="M1B"]  


Unnamed: 0,index,Postal Code,Borough,Neighbourhood
7,11,M1B,Scarborough,Rouge
8,12,M1B,Scarborough,Malvern


In [17]:
# Group by Postal code and combine data of 'Borough and 'Neighbourhood' separated by comma   
df2= df1.groupby('Postal Code').agg(lambda x: ','.join(x))
df2

Unnamed: 0_level_0,Borough,Neighbourhood
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,"Scarborough,Scarborough","Rouge,Malvern"
M1C,"Scarborough,Scarborough,Scarborough","Highland Creek,Rouge Hill,Port Union"
M1E,"Scarborough,Scarborough,Scarborough","Guildwood,Morningside,West Hill"
M1G,Scarborough,Woburn
M1H,Scarborough,Cedarbrae
...,...,...
M9N,York,Weston
M9P,Etobicoke,Westmount
M9R,"Etobicoke,Etobicoke,Etobicoke,Etobicoke","Kingsview Village,Martin Grove Gardens,Richvie..."
M9V,"Etobicoke,Etobicoke,Etobicoke,Etobicoke,Etobic...","Albion Gardens,Beaumond Heights,Humbergate,Jam..."


In [18]:
df2.loc[df2['Neighbourhood']=="Not assigned",'Borough']
df2

Unnamed: 0_level_0,Borough,Neighbourhood
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,"Scarborough,Scarborough","Rouge,Malvern"
M1C,"Scarborough,Scarborough,Scarborough","Highland Creek,Rouge Hill,Port Union"
M1E,"Scarborough,Scarborough,Scarborough","Guildwood,Morningside,West Hill"
M1G,Scarborough,Woburn
M1H,Scarborough,Cedarbrae
...,...,...
M9N,York,Weston
M9P,Etobicoke,Westmount
M9R,"Etobicoke,Etobicoke,Etobicoke,Etobicoke","Kingsview Village,Martin Grove Gardens,Richvie..."
M9V,"Etobicoke,Etobicoke,Etobicoke,Etobicoke,Etobic...","Albion Gardens,Beaumond Heights,Humbergate,Jam..."


In [19]:
df3 = df2.reset_index()
df3 

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,"Scarborough,Scarborough","Rouge,Malvern"
1,M1C,"Scarborough,Scarborough,Scarborough","Highland Creek,Rouge Hill,Port Union"
2,M1E,"Scarborough,Scarborough,Scarborough","Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,"Etobicoke,Etobicoke,Etobicoke,Etobicoke","Kingsview Village,Martin Grove Gardens,Richvie..."
101,M9V,"Etobicoke,Etobicoke,Etobicoke,Etobicoke,Etobic...","Albion Gardens,Beaumond Heights,Humbergate,Jam..."


In [20]:
# exclude double counting or more  from Borough 
df3['Borough']= df3['Borough'].str.replace('nan|[{}\s]','').str.split(',').apply(set).str.join(',').str.strip(',').str.replace(",{2,}",",")


In [21]:
df3.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [22]:
df3.shape 

(103, 3)

# Get the latitude and the longitude coordinates of each neighborhood. 
# Once you are able to create the above dataframe, submit a link to the new Notebook on your Github repository. (2 marks)

In [23]:
import wget 
url_get = wget.download( 'http://cocl.us/Geospatial_data/toronto_coordinates.csv') 
coordinates = pd.read_csv(url_get)
coordinates.head() 

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [24]:
# merge two data frames 
df4 = df3.set_index('Postal Code') 
coordinates1 = coordinates.set_index('Postal Code')
df5 = pd.concat([df4, coordinates1], axis=1, join='inner')
df5.index.name = 'PostalCode'
df5.reset_index(inplace=True)


In [25]:
df5.shape

(103, 5)

# Part 3: Explore and cluster the neighborhoods in Toronto¶ 

In [26]:
!conda install -c conda-forge geopy --yes

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\sboun\Anaconda3

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.11.28 |       hecc5488_0         182 KB  conda-forge
    certifi-2019.11.28         |           py37_0         148 KB  conda-forge
    conda-4.8.0                |           py37_1         3.0 MB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.4 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy  



  current version: 4.7.12
  latest version: 4.8.1

Please update conda by running

    $ conda update -n base -c defaults conda




In [28]:
 conda update -n base -c defaults conda

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... 
  - defaults::jupyter_console-5.2.0-py37_1, defaults::prompt_toolkit-3.0.2-py_0
  - defaults::jupyter_console-6.0.0-py37_0, defaults::prompt_toolkit-2.0.10-py_0done

## Package Plan ##

  environment location: C:\Users\sboun\Anaconda3

  added / updated specs:
    - conda


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-3.2.0               |           py37_0         885 KB
    anaconda-project-0.8.4     |             py_0         215 KB
    argh-0.26.2                |           py37_0          36 KB
    asn1crypto-1.2.0           |           py37_0         162 KB
    astroid-2.3.3              |           py37_0         292 KB
    astropy-4.0                |   py37he774522_0         7.2 MB
    attrs-19.3.0               |             py_0          39 KB
    autopep8-1.4.4        

DEBUG menuinst_win32:__init__(199): Menu: name: 'Anaconda${PY_VER} ${PLATFORM}', prefix: 'C:\Users\sboun\Anaconda3', env_name: 'None', mode: 'user', used_mode: 'user'
DEBUG menuinst_win32:create(323): Shortcut cmd is C:\Users\sboun\Anaconda3\pythonw.exe, args are ['C:\\Users\\sboun\\Anaconda3\\cwp.py', 'C:\\Users\\sboun\\Anaconda3', 'C:\\Users\\sboun\\Anaconda3\\pythonw.exe', 'C:\\Users\\sboun\\Anaconda3\\Scripts\\spyder-script.py']
DEBUG menuinst_win32:create(323): Shortcut cmd is C:\Users\sboun\Anaconda3\python.exe, args are ['C:\\Users\\sboun\\Anaconda3\\cwp.py', 'C:\\Users\\sboun\\Anaconda3', 'C:\\Users\\sboun\\Anaconda3\\python.exe', 'C:\\Users\\sboun\\Anaconda3\\Scripts\\spyder-script.py', '--reset']
DEBUG menuinst_win32:__init__(199): Menu: name: 'Anaconda${PY_VER} ${PLATFORM}', prefix: 'C:\Users\sboun\Anaconda3', env_name: 'None', mode: 'user', used_mode: 'user'
DEBUG menuinst_win32:create(323): Shortcut cmd is C:\Users\sboun\Anaconda3\pythonw.exe, args are ['C:\\Users\\sboun\\

In [26]:
from geopy.geocoders import Nominatim
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans

In [36]:
# find coordinaties of Toronto 
address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent="torontoN")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.653963, -79.387207.


In [37]:
# map toronto with Neighboorhoods 
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, long, post, borough, neigh in zip(df5['Latitude'], df5['Longitude'], df5['PostalCode'], df5['Borough'], df5['Neighbourhood']):
    label = "{} ({}): {}".format(borough, post, neigh)
    popup = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=popup,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(map_toronto) 
    
map_toronto 

In [34]:
# Limit Borough to Toronto name 
dff = ['EastToronto', 'CentralToronto', 'DowntownToronto', 'WestToronto'] 
df6 = df5[df5['Borough'].isin(dff)].reset_index(drop=True)
df6

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,EastToronto,The Beaches,43.676357,-79.293031
1,M4K,EastToronto,"The Danforth West,Riverdale",43.679557,-79.352188
2,M4L,EastToronto,"The Beaches West,India Bazaar",43.668999,-79.315572
3,M4M,EastToronto,Studio District,43.659526,-79.340923
4,M4N,CentralToronto,Lawrence Park,43.72802,-79.38879
5,M4P,CentralToronto,Davisville North,43.712751,-79.390197
6,M4R,CentralToronto,North Toronto West,43.715383,-79.405678
7,M4S,CentralToronto,Davisville,43.704324,-79.38879
8,M4T,CentralToronto,"Moore Park,Summerhill East",43.689574,-79.38316
9,M4V,CentralToronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049


In [35]:
# map the above Borough's 
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, long, post, borough, neigh in zip(df6['Latitude'], df6['Longitude'], df6['PostalCode'], df6['Borough'], df6['Neighbourhood']):
    label = "{} ({}): {}".format(borough, post, neigh)
    popup = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=popup,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(map_toronto)
    
map_toronto

In [70]:
# Using FourSquare API to explore the Borough's 
CLIENT_ID = 'GN1PKJNKKDJYVYW4V2CDOWTQWBW0TAE5GJJ3JCAA2QPXMO2V' # your Foursquare ID
CLIENT_SECRET = 'XGU2XZYWMZS4T2ZXQERKZWN3TT0BUQ1PMLUF0C2R3LAQLVFY' # your Foursquare Secret
VERSION = '20190330' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GN1PKJNKKDJYVYW4V2CDOWTQWBW0TAE5GJJ3JCAA2QPXMO2V
CLIENT_SECRET:XGU2XZYWMZS4T2ZXQERKZWN3TT0BUQ1PMLUF0C2R3LAQLVFY


In [98]:
radius = 500
LIMIT = 100

venues = []

for lat, long, post, borough, neighborhood in zip(df6['Latitude'], df6['Longitude'], df6['PostalCode'],df6['Borough'], df6['Neighbourhood']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            post, 
            borough,
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [99]:
venues_df = pd.DataFrame(venues)
venues_df.columns = ['PostalCode', 'Borough', 'Neighborhood', 'BoroughLatitude', 'BoroughLongitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
print(venues_df.shape)
venues_df.head() 

(1709, 9)


Unnamed: 0,PostalCode,Borough,Neighborhood,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,M4E,EastToronto,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,M4E,EastToronto,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,M4E,EastToronto,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,M4E,EastToronto,The Beaches,43.676357,-79.293031,Glen Stewart Ravine,43.6763,-79.294784,Other Great Outdoors
4,M4E,EastToronto,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood


In [100]:
# group venues 
venues_df.groupby(['PostalCode', 'Borough', 'Neighborhood'])['VenueName'].count() 

PostalCode  Borough          Neighborhood                                                                                        
M4E         EastToronto      The Beaches                                                                                               5
M4K         EastToronto      The Danforth West,Riverdale                                                                              42
M4L         EastToronto      The Beaches West,India Bazaar                                                                            17
M4M         EastToronto      Studio District                                                                                          42
M4N         CentralToronto   Lawrence Park                                                                                             4
M4P         CentralToronto   Davisville North                                                                                          9
M4R         CentralToronto   North Toronto West 

In [101]:
len(venues_df['VenueCategory'].unique()) 

236

In [102]:
# Analyze venues 


toronto_central_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add postal, borough and neighborhood column back to dataframe
toronto_central_onehot['PostalCode'] = venues_df['PostalCode'] 
toronto_central_onehot['Borough'] = venues_df['Borough'] 
toronto_central_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move postal, borough and neighborhood column to the first column
fixed_columns = list(toronto_central_onehot.columns[-3:]) + list(toronto_central_onehot.columns[:-3])
toronto_central_onehot = toronto_central_onehot[fixed_columns]

print(toronto_central_onehot.shape)
toronto_central_onehot.head()

(1709, 239)


Unnamed: 0,PostalCode,Borough,Neighborhoods,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,M4E,EastToronto,The Beaches,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,M4E,EastToronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M4E,EastToronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M4E,EastToronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4E,EastToronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [103]:
# Get frequency 
toronto_central_venues_freq = toronto_central_onehot.groupby(['PostalCode', 'Borough', 'Neighborhoods']).mean().reset_index()
print(toronto_central_venues_freq.shape)
toronto_central_venues_freq.head()

(39, 239)


Unnamed: 0,PostalCode,Borough,Neighborhoods,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,M4E,EastToronto,The Beaches,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,EastToronto,"The Danforth West,Riverdale",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381
2,M4L,EastToronto,"The Beaches West,India Bazaar",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,EastToronto,Studio District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.02381
4,M4N,CentralToronto,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [104]:
# Get Top 10 
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ['PostalCode', 'Borough', 'Neighborhoods']
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
columns = areaColumns+freqColumns

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['PostalCode'] = toronto_central_venues_freq['PostalCode']
neighborhoods_venues_sorted['Borough'] = toronto_central_venues_freq['Borough']
neighborhoods_venues_sorted['Neighborhoods'] = toronto_central_venues_freq['Neighborhoods']

for ind in np.arange(toronto_central_venues_freq.shape[0]):
    row_categories = toronto_central_venues_freq.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    neighborhoods_venues_sorted.iloc[ind, 3:] = row_categories_sorted.index.values[0:num_top_venues]

neighborhoods_venues_sorted.sort_values(freqColumns, inplace=True)
neighborhoods_venues_sorted

Unnamed: 0,PostalCode,Borough,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,M5V,DowntownToronto,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Lounge,Airport Service,Airport Terminal,Rental Car Location,Airport,Airport Food Court,Airport Gate,Bar,Sculpture Garden,Harbor / Marina
32,M6J,WestToronto,"Little Portugal,Trinity",Bar,Restaurant,Asian Restaurant,Café,Vietnamese Restaurant,Coffee Shop,Men's Store,Pizza Place,Diner,Cocktail Bar
3,M4M,EastToronto,Studio District,Café,Coffee Shop,Bakery,Italian Restaurant,Gastropub,American Restaurant,Brewery,Yoga Studio,Latin American Restaurant,Fish Market
33,M6K,WestToronto,"Brockton,Exhibition Place,Parkdale Village",Café,Coffee Shop,Breakfast Spot,Yoga Studio,Stadium,Burrito Place,Restaurant,Climbing Gym,Gym,Bakery
36,M6S,WestToronto,"Runnymede,Swansea",Café,Coffee Shop,Pizza Place,Italian Restaurant,Sushi Restaurant,Butcher,Bar,Indie Movie Theater,Food,Pub
25,M5S,DowntownToronto,"Harbord,University of Toronto",Café,Restaurant,Bookstore,Japanese Restaurant,Italian Restaurant,Bar,Sandwich Place,Bakery,Noodle House,French Restaurant
24,M5R,CentralToronto,"The Annex,North Midtown,Yorkville",Café,Sandwich Place,Coffee Shop,Liquor Store,History Museum,Indian Restaurant,Metro Station,Pub,BBQ Joint,Middle Eastern Restaurant
26,M5T,DowntownToronto,"Chinatown,Grange Park,Kensington Market",Café,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Coffee Shop,Dumpling Restaurant,Chinese Restaurant,Mexican Restaurant,Bakery,Bar,Dessert Shop
6,M4R,CentralToronto,North Toronto West,Clothing Store,Coffee Shop,Sporting Goods Shop,Yoga Studio,Italian Restaurant,Salon / Barbershop,Restaurant,Rental Car Location,Park,Mexican Restaurant
19,M5J,DowntownToronto,"Harbourfront East,Toronto Islands,Union Station",Coffee Shop,Aquarium,Hotel,Café,Italian Restaurant,Fried Chicken Joint,Brewery,Scenic Lookout,Restaurant,Pizza Place


In [107]:
kclusters = 4

toronto_central_venues_freq_clustering = toronto_central_venues_freq.drop(['PostalCode', 'Borough', 'Neighborhoods'], 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_central_venues_freq_clustering)

toronto_central_clustered_df = df6 
toronto_central_clustered_df['Cluster'] = kmeans.labels_
toronto_central_clustered_df = toronto_central_clustered_df.join(neighborhoods_venues_sorted.drop(['Borough', 'Neighborhoods'], 1).set_index('PostalCode'), on='PostalCode')
toronto_central_clustered_df.sort_values(['Cluster'] + freqColumns, inplace=True)
toronto_central_clustered_df

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,M5V,DowntownToronto,"CN Tower,Bathurst Quay,Island airport,Harbourf...",43.628947,-79.39442,0,Airport Lounge,Airport Service,Airport Terminal,Rental Car Location,Airport,Airport Food Court,Airport Gate,Bar,Sculpture Garden,Harbor / Marina
32,M6J,WestToronto,"Little Portugal,Trinity",43.647927,-79.41975,0,Bar,Restaurant,Asian Restaurant,Café,Vietnamese Restaurant,Coffee Shop,Men's Store,Pizza Place,Diner,Cocktail Bar
3,M4M,EastToronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Bakery,Italian Restaurant,Gastropub,American Restaurant,Brewery,Yoga Studio,Latin American Restaurant,Fish Market
33,M6K,WestToronto,"Brockton,Exhibition Place,Parkdale Village",43.636847,-79.428191,0,Café,Coffee Shop,Breakfast Spot,Yoga Studio,Stadium,Burrito Place,Restaurant,Climbing Gym,Gym,Bakery
36,M6S,WestToronto,"Runnymede,Swansea",43.651571,-79.48445,0,Café,Coffee Shop,Pizza Place,Italian Restaurant,Sushi Restaurant,Butcher,Bar,Indie Movie Theater,Food,Pub
25,M5S,DowntownToronto,"Harbord,University of Toronto",43.662696,-79.400049,0,Café,Restaurant,Bookstore,Japanese Restaurant,Italian Restaurant,Bar,Sandwich Place,Bakery,Noodle House,French Restaurant
24,M5R,CentralToronto,"The Annex,North Midtown,Yorkville",43.67271,-79.405678,0,Café,Sandwich Place,Coffee Shop,Liquor Store,History Museum,Indian Restaurant,Metro Station,Pub,BBQ Joint,Middle Eastern Restaurant
26,M5T,DowntownToronto,"Chinatown,Grange Park,Kensington Market",43.653206,-79.400049,0,Café,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Coffee Shop,Dumpling Restaurant,Chinese Restaurant,Mexican Restaurant,Bakery,Bar,Dessert Shop
6,M4R,CentralToronto,North Toronto West,43.715383,-79.405678,0,Clothing Store,Coffee Shop,Sporting Goods Shop,Yoga Studio,Italian Restaurant,Salon / Barbershop,Restaurant,Rental Car Location,Park,Mexican Restaurant
19,M5J,DowntownToronto,"Harbourfront East,Toronto Islands,Union Station",43.640816,-79.381752,0,Coffee Shop,Aquarium,Hotel,Café,Italian Restaurant,Fried Chicken Joint,Brewery,Scenic Lookout,Restaurant,Pizza Place


In [106]:
# create map Red
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, post, bor, poi, cluster in zip(toronto_central_clustered_df['Latitude'], toronto_central_clustered_df['Longitude'], toronto_central_clustered_df['PostalCode'], toronto_central_clustered_df['Borough'], toronto_central_clustered_df['Neighbourhood'], toronto_central_clustered_df['Cluster']):
    label = folium.Popup('{} ({}): {} - Cluster {}'.format(bor, post, poi, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters