# Project Object:

Say this is you and you live on the west side of the city of Toronto in Canada. You love your neighborhood, mainly because of all the great amenities and other types of venues that exist in the neighborhood, such as gourmet fast food joints, pharmacies, parks, grad schools and so on. Now say you receive a job offer from a great company on the other side of the city with great career prospects. However given the far distance from your current place you unfortunately must move if you decide to accept the offer. Wouldn't it be great if you're able to determine neighborhoods on the other side of the city that are exactly the same as your current neighborhood, and if not perhaps similar neighborhoods that are at least closer to your new job?

# How:

Given a city like the City of Toronto, you will segment it into different neighborhoods using the geographical coordinates of the center of each neighborhood, and then using a combination of location data and machine learning, you will group the neighbourhoods into clusters like this.

# Packages:

In [179]:
!pip install wikipedia
!pip install folium



In [180]:
import pandas as pd
from pandas.io.json import json_normalize
import numpy as np
import wikipedia
import json
import requests
import folium
from bs4 import BeautifulSoup

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print("Hello Capstone Project Course!")

Hello Capstone Project Course!


## Quest 1: Download and Explore Neighborhoods in Toronto

For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. You will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format like the New York dataset.

Once the data is in a structured format, you can replicate the analysis that we did to the New York City dataset to explore and cluster the neighborhoods in the city of Toronto.

### 1.1 Get Canada FSA Table

In [181]:
import pandas as pd
import wikipedia as wp
html = wp.page("List_of_postal_codes_of_Canada:_M").html().encode("UTF-8")
df = pd.read_html(html)[0]  # Try 2nd table first as most pages contain contents table first
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Not assigned


### 1.2 Clean the dataset by:

1. Aggregate the neighborhoods sections
2. Clean up the "Not Assigned" neighborhoods

In [182]:
postal_borough = {}

# Clean up the df by removing not assigned postal code
df = df[df.Borough != 'Not assigned']

# Assign borough to "Not assigned" Neighborhoods
for index, row in df.iterrows():
    if row['Neighbourhood'] == 'Not assigned':
        row['Neighbourhood'] = row['Borough']
        

# Aggregate boroughs and neighborhoods
df = df.groupby(['Postcode','Borough'], sort = False).agg(lambda x: ', '.join(x))

df = df.reset_index()

df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park
5,M9A,Queen's Park,Queen's Park
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


### 1.3 Find the Longitude and Latitude of each Borough(FSA)

### 1.3.1 Load the geospatial data

In [183]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


### 1.3.2 Find the matching postal code

In [184]:
result = pd.merge(df, geospatial, left_on='Postcode', right_on='Postal Code')
result.drop(['Postal Code'], axis=1, inplace=True)
result

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
5,M9A,Queen's Park,Queen's Park,43.667856,-79.532242
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937


In [185]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df['Borough'].unique()),
        df.shape[0]
    )
)

The dataframe has 11 boroughs and 103 neighborhoods.


## 1.4 Create a map of Toronto with neighborhoods superimposed on top.

In [186]:
toronto_lat = 43.6532
toronto_long = -79.3832

map_toronto = folium.Map(location=[toronto_lat, toronto_long], zoom_start = 10)

for lat, lon, borough, neighborhood in zip(result['Latitude'], result['Longitude'], result['Borough'], result['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
                [lat, lon],
                radius=5,
                popup=label,
                color='red',
                fill=True,
                fill_color='#3186cc',
                fill_opacity=0.7,
                parse_html=False).add_to(map_toronto)

map_toronto

## 1.5 Use Foursquare to explore the borough

In [187]:
CLIENT_ID = 'QY2FDPBTB3F3H30TE1HVOW0RL4Q51WISXS1KZCLQM3ELAKXH' # your Foursquare ID
CLIENT_SECRET = 'QDEPFOUNICMCDWFQCH5JARLYBXROEYR3ADXKYMD3L0G2P33C' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QY2FDPBTB3F3H30TE1HVOW0RL4Q51WISXS1KZCLQM3ELAKXH
CLIENT_SECRET:QDEPFOUNICMCDWFQCH5JARLYBXROEYR3ADXKYMD3L0G2P33C


### 1.5.1 Start with one borough only

In [188]:
i = 10

borough_name = result.loc[i, 'Borough']
borough_fsa = result.loc[i, 'Postcode']
borough_latitude = result.loc[i, 'Latitude'] # neighborhood latitude value
borough_longitude = result.loc[i, 'Longitude'] # neighborhood longitude value

print('Latitude and longitude values of {} {} are {}, {}.'.format(borough_name, borough_fsa,
                                                               borough_latitude, 
                                                               borough_longitude))

Latitude and longitude values of North York M6B are 43.709577, -79.44507259999999.


In [189]:
radius = 5000
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=QY2FDPBTB3F3H30TE1HVOW0RL4Q51WISXS1KZCLQM3ELAKXH&client_secret=QDEPFOUNICMCDWFQCH5JARLYBXROEYR3ADXKYMD3L0G2P33C&v=20180605&ll=43.7532586,-79.3296565&radius=5000&limit=100'

In [190]:
explores = requests.get(url).json()

with open('explores.json', 'w') as f:
    json.dump(explores, f)
    
    
explores

{'meta': {'code': 200, 'requestId': '5e4cc16a0de0d9001b45b99b'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Toronto',
  'headerFullLocation': 'Toronto',
  'headerLocationGranularity': 'city',
  'totalResults': 222,
  'suggestedBounds': {'ne': {'lat': 43.79825864500005,
    'lng': -79.26747389849278},
   'sw': {'lat': 43.70825855499996, 'lng': -79.39183910150722}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b8991cbf964a520814232e3',
       'name': "Allwyn's Bakery",
       'location': {'address': '81 Underhill drive',
        'lat': 43.75984035203157,
        'lng': -79.32471879917513,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.75984035203157,
   

In [191]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [192]:
venues = explores['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


## Quest 2: Explore Neighborhoods in Toronto

In [193]:
def getNearbyVenues(names, fsas, latitudes, longitudes, radius=5000):
    
    venues_list=[]
    for name, fsa, lat, lng in zip(names, fsas, latitudes, longitudes):
        print(name, fsa)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [194]:
toronto_venues = getNearbyVenues(names=result['Borough'],
                                 fsas=result['Postcode'],
                                   latitudes=result['Latitude'],
                                   longitudes=result['Longitude']
                                  )

North York M3A
North York M4A
Downtown Toronto M5A
North York M6A
Downtown Toronto M7A
Queen's Park M9A
Scarborough M1B
North York M3B
East York M4B
Downtown Toronto M5B
North York M6B
Etobicoke M9B
Scarborough M1C
North York M3C
East York M4C
Downtown Toronto M5C
York M6C
Etobicoke M9C
Scarborough M1E
East Toronto M4E
Downtown Toronto M5E
York M6E
Scarborough M1G
East York M4G
Downtown Toronto M5G
Downtown Toronto M6G
Scarborough M1H
North York M2H
North York M3H
East York M4H
Downtown Toronto M5H
West Toronto M6H
Scarborough M1J
North York M2J
North York M3J
East York M4J
Downtown Toronto M5J
West Toronto M6J
Scarborough M1K
North York M2K
North York M3K
East Toronto M4K
Downtown Toronto M5K
West Toronto M6K
Scarborough M1L
North York M2L
North York M3L
East Toronto M4L
Downtown Toronto M5L
North York M6L
North York M9L
Scarborough M1M
North York M2M
North York M3M
East Toronto M4M
North York M5M
York M6M
North York M9M
Scarborough M1N
North York M2N
North York M3N
Central Toronto M4

In [195]:
print(toronto_venues.shape)
toronto_venues.head(10)

(10269, 7)


Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,North York,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,North York,43.753259,-79.329656,Donalda Golf & Country Club,43.752816,-79.342741,Golf Course
2,North York,43.753259,-79.329656,Starbucks Reserve Bar,43.735764,-79.344156,Coffee Shop
3,North York,43.753259,-79.329656,Galleria Supermarket,43.75352,-79.349518,Supermarket
4,North York,43.753259,-79.329656,Island Foods,43.745866,-79.346035,Caribbean Restaurant
5,North York,43.753259,-79.329656,Darband Restaurant,43.755194,-79.348498,Middle Eastern Restaurant
6,North York,43.753259,-79.329656,Naan & Kabob Halal,43.742903,-79.305148,Middle Eastern Restaurant
7,North York,43.753259,-79.329656,Me Va Me Kitchen Express,43.754957,-79.351894,Mediterranean Restaurant
8,North York,43.753259,-79.329656,Kostas Meat Market,43.760605,-79.30183,Greek Restaurant
9,North York,43.753259,-79.329656,VIA CIBO | italian streetfood,43.754067,-79.357951,Italian Restaurant


In [196]:
toronto_venues.groupby('Borough').count()

Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central Toronto,900,900,900,900,900,900
Downtown Toronto,1900,1900,1900,1900,1900,1900
East Toronto,500,500,500,500,500,500
East York,500,500,500,500,500,500
Etobicoke,1083,1083,1083,1083,1083,1083
Mississauga,100,100,100,100,100,100
North York,2397,2397,2397,2397,2397,2397
Queen's Park,100,100,100,100,100,100
Scarborough,1689,1689,1689,1689,1689,1689
West Toronto,600,600,600,600,600,600


In [197]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 246 uniques categories.


## Quest 3: Analyze Each Borough

In [198]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot['Borough'] = toronto_venues['Borough']

fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])

toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Borough,ATM,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,...,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [199]:
toronto_onehot.shape

(10269, 247)

In [200]:
toronto_grouped = toronto_onehot.groupby('Borough').mean().reset_index()

toronto_grouped

Unnamed: 0,Borough,ATM,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,...,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,Central Toronto,0.0,0.001111,0.0,0.0,0.0,0.0,0.004444,0.003333,0.002222,...,0.001111,0.003333,0.001111,0.0,0.007778,0.0,0.0,0.01,0.0,0.0
1,Downtown Toronto,0.0,0.0,0.0,0.0,0.011053,0.005263,0.018947,0.005789,0.000526,...,0.000526,0.0,0.000526,0.001053,0.0,0.0,0.0,0.006316,0.0,0.0
2,East Toronto,0.0,0.0,0.0,0.0,0.028,0.0,0.0,0.0,0.012,...,0.006,0.002,0.0,0.002,0.0,0.0,0.0,0.0,0.0,0.0
3,East York,0.0,0.008,0.0,0.0,0.022,0.0,0.0,0.0,0.01,...,0.002,0.01,0.0,0.0,0.0,0.0,0.0,0.004,0.0,0.0
4,Etobicoke,0.000923,0.000923,0.001847,0.006464,0.01662,0.0,0.0,0.004617,0.012927,...,0.006464,0.006464,0.0,0.0,0.007387,0.000923,0.0,0.00831,0.0,0.0
5,Mississauga,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.02,...,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,North York,0.0,0.001252,0.003338,0.0,0.010847,0.0,0.0,0.00292,0.008344,...,0.013767,0.006675,0.0,0.0,0.003338,0.002503,0.0,0.001669,0.0,0.0
7,Queen's Park,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0
8,Scarborough,0.0,0.001184,0.0,0.0,0.007697,0.0,0.0,0.004737,0.010657,...,0.008881,0.005329,0.0,0.0,0.002368,0.0,0.005329,0.000592,0.003552,0.023683
9,West Toronto,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.008333,0.008333,...,0.003333,0.0,0.008333,0.006667,0.0,0.0,0.0,0.008333,0.0,0.0


In [201]:
print(toronto_grouped.shape)

(11, 247)


### Print each borough's top 5 most common venues

In [202]:
num_top_venue = 5

for borough in toronto_grouped['Borough']:
    print("----" + borough + "----")
    temp = toronto_grouped[toronto_grouped['Borough'] == borough].T.reset_index()
    temp.columns = ['Venue', 'Frequency']
    temp = temp.iloc[1:]
    temp['Frequency'] = temp['Frequency'].astype(float)
    temp = temp.round({'Frequency': 2})
    print(temp.sort_values('Frequency', ascending=False).reset_index(drop=True).head(num_top_venue))
    print('\n')

----Central Toronto----
                Venue  Frequency
0                Café       0.08
1  Italian Restaurant       0.06
2                Park       0.05
3         Coffee Shop       0.05
4       Grocery Store       0.03


----Downtown Toronto----
                Venue  Frequency
0         Coffee Shop       0.07
1                Café       0.06
2               Hotel       0.04
3  Italian Restaurant       0.04
4                Park       0.04


----East Toronto----
         Venue  Frequency
0  Coffee Shop       0.08
1         Park       0.07
2         Café       0.06
3      Brewery       0.04
4       Bakery       0.03


----East York----
                Venue  Frequency
0                Park       0.08
1         Coffee Shop       0.06
2                Café       0.06
3             Brewery       0.04
4  Italian Restaurant       0.04


----Etobicoke----
                Venue  Frequency
0         Coffee Shop       0.08
1              Bakery       0.04
2                Park       0.03
3   

In [203]:
# Sort venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venue]

In [204]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Borough']

for ind in np.arange(num_top_venue):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

        
borough_venues_sorted = pd.DataFrame(columns=columns)
borough_venues_sorted['Borough'] = toronto_grouped['Borough']

for ind in np.arange(toronto_grouped.shape[0]):
    borough_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)


borough_venues_sorted.head(10)

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Central Toronto,Café,Italian Restaurant,Coffee Shop,Park,Grocery Store
1,Downtown Toronto,Coffee Shop,Café,Park,Italian Restaurant,Hotel
2,East Toronto,Coffee Shop,Park,Café,Brewery,Beach
3,East York,Park,Café,Coffee Shop,Italian Restaurant,Brewery
4,Etobicoke,Coffee Shop,Bakery,Italian Restaurant,Park,Sandwich Place
5,Mississauga,Hotel,Vietnamese Restaurant,Japanese Restaurant,Grocery Store,Liquor Store
6,North York,Coffee Shop,Bakery,Middle Eastern Restaurant,Café,Grocery Store
7,Queen's Park,Coffee Shop,Liquor Store,Sandwich Place,Café,Brewery
8,Scarborough,Coffee Shop,Park,Chinese Restaurant,Burger Joint,Pharmacy
9,West Toronto,Café,Park,Bar,Coffee Shop,Italian Restaurant


## Quest 4: Cluster Neighborhoods

### 4.1: Run k-means to cluster the borough into 5 clusters

In [205]:
kcluster = 5

toronto_grouped_clustering = toronto_grouped.drop('Borough', 1)

kmeans = KMeans(n_clusters=kcluster, random_state=0).fit(toronto_grouped_clustering)

kmeans.labels_[0:10]

array([2, 0, 0, 0, 1, 3, 1, 4, 1, 2], dtype=int32)

### 4.2: Create new dataframe that includes the cluster as well as the top 10 venues for each borough

In [206]:
# Add clustering labels
borough_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = result

toronto_merged = toronto_merged.join(borough_venues_sorted.set_index('Borough'), on='Borough')

In [207]:
toronto_merged.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1,Coffee Shop,Bakery,Middle Eastern Restaurant,Café,Grocery Store
1,M4A,North York,Victoria Village,43.725882,-79.315572,1,Coffee Shop,Bakery,Middle Eastern Restaurant,Café,Grocery Store
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,0,Coffee Shop,Café,Park,Italian Restaurant,Hotel
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,1,Coffee Shop,Bakery,Middle Eastern Restaurant,Café,Grocery Store
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,0,Coffee Shop,Café,Park,Italian Restaurant,Hotel
5,M9A,Queen's Park,Queen's Park,43.667856,-79.532242,4,Coffee Shop,Liquor Store,Sandwich Place,Café,Brewery
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,1,Coffee Shop,Park,Chinese Restaurant,Burger Joint,Pharmacy
7,M3B,North York,Don Mills North,43.745906,-79.352188,1,Coffee Shop,Bakery,Middle Eastern Restaurant,Café,Grocery Store
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937,0,Park,Café,Coffee Shop,Italian Restaurant,Brewery
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,0,Coffee Shop,Café,Park,Italian Restaurant,Hotel


### 4.3 Visualize the resulting clusters

In [208]:
map_clusters = folium.Map(location = [toronto_lat, toronto_long], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kcluster)
ys = [i + x + (i*x)**2 for i in range(kcluster)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []

for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Borough'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters

## Quest 5: Examine Clusters

### Cluster 1

In [209]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Downtown Toronto,0,Coffee Shop,Café,Park,Italian Restaurant,Hotel
4,Downtown Toronto,0,Coffee Shop,Café,Park,Italian Restaurant,Hotel
8,East York,0,Park,Café,Coffee Shop,Italian Restaurant,Brewery
9,Downtown Toronto,0,Coffee Shop,Café,Park,Italian Restaurant,Hotel
14,East York,0,Park,Café,Coffee Shop,Italian Restaurant,Brewery
15,Downtown Toronto,0,Coffee Shop,Café,Park,Italian Restaurant,Hotel
19,East Toronto,0,Coffee Shop,Park,Café,Brewery,Beach
20,Downtown Toronto,0,Coffee Shop,Café,Park,Italian Restaurant,Hotel
23,East York,0,Park,Café,Coffee Shop,Italian Restaurant,Brewery
24,Downtown Toronto,0,Coffee Shop,Café,Park,Italian Restaurant,Hotel


### Cluster 2

In [210]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,North York,1,Coffee Shop,Bakery,Middle Eastern Restaurant,Café,Grocery Store
1,North York,1,Coffee Shop,Bakery,Middle Eastern Restaurant,Café,Grocery Store
3,North York,1,Coffee Shop,Bakery,Middle Eastern Restaurant,Café,Grocery Store
6,Scarborough,1,Coffee Shop,Park,Chinese Restaurant,Burger Joint,Pharmacy
7,North York,1,Coffee Shop,Bakery,Middle Eastern Restaurant,Café,Grocery Store
10,North York,1,Coffee Shop,Bakery,Middle Eastern Restaurant,Café,Grocery Store
11,Etobicoke,1,Coffee Shop,Bakery,Italian Restaurant,Park,Sandwich Place
12,Scarborough,1,Coffee Shop,Park,Chinese Restaurant,Burger Joint,Pharmacy
13,North York,1,Coffee Shop,Bakery,Middle Eastern Restaurant,Café,Grocery Store
17,Etobicoke,1,Coffee Shop,Bakery,Italian Restaurant,Park,Sandwich Place


### Cluster 3

In [211]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
16,York,2,Café,Italian Restaurant,Coffee Shop,Bakery,Bar
21,York,2,Café,Italian Restaurant,Coffee Shop,Bakery,Bar
31,West Toronto,2,Café,Park,Bar,Coffee Shop,Italian Restaurant
37,West Toronto,2,Café,Park,Bar,Coffee Shop,Italian Restaurant
43,West Toronto,2,Café,Park,Bar,Coffee Shop,Italian Restaurant
56,York,2,Café,Italian Restaurant,Coffee Shop,Bakery,Bar
61,Central Toronto,2,Café,Italian Restaurant,Coffee Shop,Park,Grocery Store
62,Central Toronto,2,Café,Italian Restaurant,Coffee Shop,Park,Grocery Store
63,York,2,Café,Italian Restaurant,Coffee Shop,Bakery,Bar
64,York,2,Café,Italian Restaurant,Coffee Shop,Bakery,Bar


### Cluster 4

In [212]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
76,Mississauga,3,Hotel,Vietnamese Restaurant,Japanese Restaurant,Grocery Store,Liquor Store


### Cluster 5

In [213]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,Queen's Park,4,Coffee Shop,Liquor Store,Sandwich Place,Café,Brewery
