# Battle of the Neighborhoods

## IBM Coursera Capstone Project

##### In this particular notebook we are going to be scraping data for neighborhoods in Canada.

We will perform the following steps:
    
    1. Get data from wikipedia page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
    2. Convert the wiki table into a pandas data frame
    3. ignore the postcodes which do not have a borough assigned
    4. for neighborhoods which are not assigned, mark them as same as the borough name
    5. for same postcodes with multiple neighborhoods, we will keep the neighborhoods as a comma separated list instead of multiple rows
    6. view the panda dataframe created
    7. write out the number of rows

In [58]:
import numpy as np
import pandas as panda
import requests
from bs4 import BeautifulSoup
from collections import defaultdict
import pandas as panda
import folium
from matplotlib import pyplot as plot
from itertools import chain
from sklearn.cluster import KMeans
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [2]:
wiki_link = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
wiki_data_as_string = requests.get(wiki_link).text

In [3]:
soup = BeautifulSoup(wiki_data_as_string,'lxml')

We inspected the wikipedia page detailing Toronto neighborhoods using chrome developer tools. We found out that we are lookign for a table with the class details as 
#### class="wikitable sortable jquery-tablesorter"

In [4]:
toronto_table = soup.find('table',{'class':'wikitable sortable'})

Maintain a running list for the postal codes, boroughs and neighborhoods found.
We will add them to a dictionary , for the simple reason of ease of conversion to panda dataframe

In [5]:
toronto_neighborhood = defaultdict(list)

for row in toronto_table.findAll("tr"):
    cells = row.findAll(["td"])
    if cells:
        cell_text = [i.text.strip() for i in cells]
        postcode = cell_text[0]
        borough = cell_text[1]
        neighborhood = cell_text[2]
        
        if borough and borough.lower()!='not assigned':
            toronto_neighborhood['postcode'].append(postcode)
            toronto_neighborhood['borough'].append(borough)
            toronto_neighborhood['neighborhood'].append(neighborhood if neighborhood.lower()!='not assigned' else borough)

    

In [6]:
toronto_neighborhood= panda.DataFrame(toronto_neighborhood)


In order to fulfill the requirement of having same borough with multiple neighborhoods appearing as a comma separated
list , we will perform the following steps:
    
    1. Groupby borough
    2. aggregate on the column neighborhood
    3. aggregation function returns comma separated values
    4. merge the aggregated table with original table
    5. remove duplicate columns and duplicate rows

In [7]:
def combine_all_neighborhoods(x):
    hoods=[]
    hoods.extend(x)
    return ','.join(hoods)

In [8]:
grouped_by_borough = toronto_neighborhood.groupby(['borough']).agg({'neighborhood':combine_all_neighborhoods}).reset_index()
grouped_by_borough.head()

Unnamed: 0,borough,neighborhood
0,Central Toronto,"Lawrence Park,Roselawn,Davisville North,Forest..."
1,Downtown Toronto,"Harbourfront,Regent Park,Ryerson,Garden Distri..."
2,East Toronto,"The Beaches,The Danforth West,Riverdale,The Be..."
3,East York,"Woodbine Gardens,Parkview Hill,Woodbine Height..."
4,Etobicoke,"Islington Avenue,Cloverdale,Islington,Martin G..."


In [9]:
grouped_by_borough.shape, toronto_neighborhood.shape

((11, 2), (212, 3))

In [10]:
temp = panda.merge(toronto_neighborhood,grouped_by_borough,how='left', on ='borough')
temp.drop('neighborhood_x', axis=1,inplace=True)
temp.rename(columns = {'postcode':'postcode'.title(),'borough':'borough'.title(),'neighborhood_y':'neighborhood'.title()}, inplace = True)
temp

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,"Parkwoods,Victoria Village,Lawrence Heights,La..."
1,M4A,North York,"Parkwoods,Victoria Village,Lawrence Heights,La..."
2,M5A,Downtown Toronto,"Harbourfront,Regent Park,Ryerson,Garden Distri..."
3,M5A,Downtown Toronto,"Harbourfront,Regent Park,Ryerson,Garden Distri..."
4,M6A,North York,"Parkwoods,Victoria Village,Lawrence Heights,La..."
5,M6A,North York,"Parkwoods,Victoria Village,Lawrence Heights,La..."
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,"Islington Avenue,Cloverdale,Islington,Martin G..."
8,M1B,Scarborough,"Rouge,Malvern,Highland Creek,Rouge Hill,Port U..."
9,M1B,Scarborough,"Rouge,Malvern,Highland Creek,Rouge Hill,Port U..."


In [11]:
temp.drop_duplicates(inplace=True)
temp.head(20)

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,"Parkwoods,Victoria Village,Lawrence Heights,La..."
1,M4A,North York,"Parkwoods,Victoria Village,Lawrence Heights,La..."
2,M5A,Downtown Toronto,"Harbourfront,Regent Park,Ryerson,Garden Distri..."
4,M6A,North York,"Parkwoods,Victoria Village,Lawrence Heights,La..."
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,"Islington Avenue,Cloverdale,Islington,Martin G..."
8,M1B,Scarborough,"Rouge,Malvern,Highland Creek,Rouge Hill,Port U..."
10,M3B,North York,"Parkwoods,Victoria Village,Lawrence Heights,La..."
11,M4B,East York,"Woodbine Gardens,Parkview Hill,Woodbine Height..."
13,M5B,Downtown Toronto,"Harbourfront,Regent Park,Ryerson,Garden Distri..."


In [12]:
temp.isnull().any()

Postcode        False
Borough         False
Neighborhood    False
dtype: bool

In [13]:
temp.shape

(103, 3)

In order to add the latitude and longitude co-ordinates with already webscraped data that we have collected, we are going to use the geospatial co ordinates file provided.

1. Load the excel using pandas
2. Join on postal code with already created table

In [14]:
geospatial_coordinates  = 'Geospatial_Coordinates.csv'
geo_data = panda.read_csv(geospatial_coordinates)
geo_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [15]:
geo_data.rename(columns={'Postal Code':'Postcode'}, inplace=True)
geo_data.head(1)

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353


In [16]:
toronto_neighborhood_with_coordinates= panda.merge(temp,geo_data, how='left', on='Postcode')
toronto_neighborhood_with_coordinates.shape, temp.shape

((103, 5), (103, 3))

In [17]:
toronto_neighborhood_with_coordinates.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,"Parkwoods,Victoria Village,Lawrence Heights,La...",43.753259,-79.329656
1,M4A,North York,"Parkwoods,Victoria Village,Lawrence Heights,La...",43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.65426,-79.360636
3,M6A,North York,"Parkwoods,Victoria Village,Lawrence Heights,La...",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


Lets focus on Toronto boroughs for now

In [18]:
data = toronto_neighborhood_with_coordinates[toronto_neighborhood_with_coordinates.Borough.str.contains('Toronto')]
data.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.65426,-79.360636
9,M5B,Downtown Toronto,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.657162,-79.378937
15,M5C,Downtown Toronto,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.651494,-79.375418
19,M4E,East Toronto,"The Beaches,The Danforth West,Riverdale,The Be...",43.676357,-79.293031
20,M5E,Downtown Toronto,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.644771,-79.373306


In [19]:
##lets check for null /empty /NAN values

data.isnull().any()

Postcode        False
Borough         False
Neighborhood    False
Latitude        False
Longitude       False
dtype: bool

Lets plot the locations in a world map for visualization

In [20]:
# create map of Toronto using latitude and longitude values
map_newyork = folium.Map(location=[43.6532, -79.3832], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(data['Latitude'], data['Longitude'], data['Borough'], data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
#         parse_html=False
    ).add_to(map_newyork)  
    
map_newyork

Lets fetch the details of top venues around each location using our foursquare api

In [21]:
foursquare_url = 'https://api.foursquare.com/v2/venues/explore?&client_id={0}&client_secret={1}&v={2}&ll={3},{4}&radius={5}&limit={6}'

CLIENT_ID = '4MXIJYYVXBKI4OR3XFDZH0IP1IWKFDUKEU3YYINP23D410XM' # your Foursquare ID
CLIENT_SECRET = 'OVVCVA5XAKRYE4VQWYZ0MXLRAA1ONJOE4DUBRNGEBALXSOSH' # your Foursquare Secret



In [22]:
def getNearbyVenues(names, latitudes, longitudes,url = foursquare_url, radius=500, limit =100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
#         print(name)
            
        # create the API request URL
        url = foursquare_url.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            '20180605', #version 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

   
    return venues_list

In [23]:
venues_list = getNearbyVenues(
    names = data['Neighborhood'],
    latitudes = data['Latitude'],
    longitudes = data['Longitude']
)
venues_list

[[('Harbourfront,Regent Park,Ryerson,Garden District,St. James Town,Berczy Park,Central Bay Street,Christie,Adelaide,King,Richmond,Harbourfront East,Toronto Islands,Union Station,Design Exchange,Toronto Dominion Centre,Commerce Court,Victoria Hotel,Harbord,University of Toronto,Chinatown,Grange Park,Kensington Market,CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara,Rosedale,Stn A PO Boxes 25 The Esplanade,Cabbagetown,St. James Town,First Canadian Place,Underground city,Church and Wellesley',
   43.6542599,
   -79.3606359,
   'Roselle Desserts',
   43.653446723052674,
   -79.3620167174383,
   'Bakery'),
  ('Harbourfront,Regent Park,Ryerson,Garden District,St. James Town,Berczy Park,Central Bay Street,Christie,Adelaide,King,Richmond,Harbourfront East,Toronto Islands,Union Station,Design Exchange,Toronto Dominion Centre,Commerce Court,Victoria Hotel,Harbord,University of Toronto,Chinatown,Grange Park,Kensington Market,CN Tower,Bathurst Q

In [24]:
nearby_data = panda.DataFrame(list(chain.from_iterable(venues_list)))
nearby_data.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
print('Total venues details downloaded:',nearby_data.shape)
nearby_data.head()

Total venues details downloaded: (1699, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.65426,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
3,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
4,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In [25]:
nearby_data.groupby(['Neighborhood']).head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.65426,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
3,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
4,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
248,"The Beaches,The Danforth West,Riverdale,The Be...",43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
249,"The Beaches,The Danforth West,Riverdale,The Be...",43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
250,"The Beaches,The Danforth West,Riverdale,The Be...",43.676357,-79.293031,Ava's Appletree,43.6801,-79.2907,Baby Store
251,"The Beaches,The Danforth West,Riverdale,The Be...",43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
504,"Dovercourt Village,Dufferin,Little Portugal,Tr...",43.669005,-79.442259,The Greater Good Bar,43.669409,-79.439267,Bar


In [40]:
nearby_data.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Dovercourt Village,Dufferin,Little Portugal,Trinity,Brockton,Exhibition Place,Parkdale Village,High Park,The Junction South,Parkdale,Roncesvalles,Runnymede,Swansea",181,181,181,181,181,181
"Harbourfront,Regent Park,Ryerson,Garden District,St. James Town,Berczy Park,Central Bay Street,Christie,Adelaide,King,Richmond,Harbourfront East,Toronto Islands,Union Station,Design Exchange,Toronto Dominion Centre,Commerce Court,Victoria Hotel,Harbord,University of Toronto,Chinatown,Grange Park,Kensington Market,CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara,Rosedale,Stn A PO Boxes 25 The Esplanade,Cabbagetown,St. James Town,First Canadian Place,Underground city,Church and Wellesley",1281,1281,1281,1281,1281,1281
"Lawrence Park,Roselawn,Davisville North,Forest Hill North,Forest Hill West,North Toronto West,The Annex,North Midtown,Yorkville,Davisville,Moore Park,Summerhill East,Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West",116,116,116,116,116,116
"The Beaches,The Danforth West,Riverdale,The Beaches West,India Bazaar,Studio District,Business Reply Mail Processing Centre 969 Eastern",121,121,121,121,121,121


#### Observations:

#### 1. Venue category is a categorical field. We can categorize the same using panda get dummies method
#### 2. Altogether there are 3 unique groups. During our analysis we can drop the field Venue..it provides no special details specific to our analysis

P.S In case we wanted to drill down further, eg analyze neighborhoods based on quality/reviews of each venue or say cluster neighborhoods based on quality of food, we would require venue to download review set from foursquare api


In [31]:
nearby_encoded = panda.get_dummies(nearby_data[['Venue Category']], prefix="", prefix_sep="")
nearby_encoded.head(2)

Unnamed: 0,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [32]:
nearby_encoded['Neighborhood']=nearby_data['Neighborhood']
columns = ['Neighborhood']
columns.extend([i for i in nearby_encoded.columns.tolist() if i not in ['Neighborhood']])


In [34]:

nearby_encoded = nearby_encoded[columns]
nearby_encoded.tail()

Unnamed: 0,Neighborhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
1694,"The Beaches,The Danforth West,Riverdale,The Be...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1695,"The Beaches,The Danforth West,Riverdale,The Be...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1696,"The Beaches,The Danforth West,Riverdale,The Be...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1697,"The Beaches,The Danforth West,Riverdale,The Be...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1698,"The Beaches,The Danforth West,Riverdale,The Be...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [44]:
data_for_clustering = nearby_encoded.groupby(['Neighborhood']).mean().reset_index()
data_for_clustering.head()

Unnamed: 0,Neighborhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Dovercourt Village,Dufferin,Little Portugal,Tr...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,...,0.0,0.0,0.01105,0.0,0.0,0.01105,0.005525,0.0,0.0,0.005525
1,"Harbourfront,Regent Park,Ryerson,Garden Distri...",0.000781,0.000781,0.000781,0.000781,0.000781,0.001561,0.001561,0.001561,0.017955,...,0.000781,0.002342,0.007806,0.002342,0.000781,0.005464,0.006245,0.000781,0.001561,0.003123
2,"Lawrence Park,Roselawn,Davisville North,Forest...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,...,0.017241,0.0,0.008621,0.0,0.0,0.008621,0.0,0.0,0.0,0.008621
3,"The Beaches,The Danforth West,Riverdale,The Be...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024793,...,0.008264,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024793


In [45]:
data_for_clustering.drop(['Neighborhood'], axis =1,inplace=True)
data_for_clustering.head()

Unnamed: 0,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,0.005525,...,0.0,0.0,0.01105,0.0,0.0,0.01105,0.005525,0.0,0.0,0.005525
1,0.000781,0.000781,0.000781,0.000781,0.000781,0.001561,0.001561,0.001561,0.017955,0.001561,...,0.000781,0.002342,0.007806,0.002342,0.000781,0.005464,0.006245,0.000781,0.001561,0.003123
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,...,0.017241,0.0,0.008621,0.0,0.0,0.008621,0.0,0.0,0.0,0.008621
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024793,0.0,...,0.008264,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024793


In [46]:
data_for_clustering.shape

(4, 238)

In [47]:
## Because all our data is mean,we can simply take the values as is without putting it through scaling/normalization
# set number of clusters
kclusters = 4

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(data_for_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 3, 1, 2])

In [48]:
data_for_clustering['cluster']=kmeans.labels_

In [49]:
data_for_clustering

Unnamed: 0,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio,cluster
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,0.005525,...,0.0,0.01105,0.0,0.0,0.01105,0.005525,0.0,0.0,0.005525,0
1,0.000781,0.000781,0.000781,0.000781,0.000781,0.001561,0.001561,0.001561,0.017955,0.001561,...,0.002342,0.007806,0.002342,0.000781,0.005464,0.006245,0.000781,0.001561,0.003123,3
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,...,0.0,0.008621,0.0,0.0,0.008621,0.0,0.0,0.0,0.008621,1
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024793,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024793,2


In [54]:
data_for_clustering['Neighborhood']=nearby_data['Neighborhood'].unique()

In [55]:
data_for_clustering

Unnamed: 0,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio,cluster,Neighborhood
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,0.005525,...,0.01105,0.0,0.0,0.01105,0.005525,0.0,0.0,0.005525,0,"Harbourfront,Regent Park,Ryerson,Garden Distri..."
1,0.000781,0.000781,0.000781,0.000781,0.000781,0.001561,0.001561,0.001561,0.017955,0.001561,...,0.007806,0.002342,0.000781,0.005464,0.006245,0.000781,0.001561,0.003123,3,"The Beaches,The Danforth West,Riverdale,The Be..."
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,...,0.008621,0.0,0.0,0.008621,0.0,0.0,0.0,0.008621,1,"Dovercourt Village,Dufferin,Little Portugal,Tr..."
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024793,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024793,2,"Lawrence Park,Roselawn,Davisville North,Forest..."


In [56]:
tt = data_for_clustering.join(nearby_data.set_index('Neighborhood'), on='Neighborhood')
tt

Unnamed: 0,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Women's Store,Yoga Studio,cluster,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,0.005525,...,0.0,0.005525,0,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.654260,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,0.005525,...,0.0,0.005525,0,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.654260,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,0.005525,...,0.0,0.005525,0,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.654260,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,0.005525,...,0.0,0.005525,0,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.654260,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,0.005525,...,0.0,0.005525,0,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.654260,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,0.005525,...,0.0,0.005525,0,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.654260,-79.360636,Impact Kitchen,43.656369,-79.356980,Restaurant
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,0.005525,...,0.0,0.005525,0,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.654260,-79.360636,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,0.005525,...,0.0,0.005525,0,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.654260,-79.360636,Dominion Pub and Kitchen,43.656919,-79.358967,Pub
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,0.005525,...,0.0,0.005525,0,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.654260,-79.360636,Corktown Common,43.655618,-79.356211,Park
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005525,0.005525,...,0.0,0.005525,0,"Harbourfront,Regent Park,Ryerson,Garden Distri...",43.654260,-79.360636,The Distillery Historic District,43.650244,-79.359323,Historic Site


In [66]:
tt.cluster.value_counts()

0    1281
1     181
3     121
2     116
Name: cluster, dtype: int64

In [77]:
# create map
map_clusters = folium.Map(location= [43.6532, -79.3832], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(tt['Neighborhood Latitude'], tt['Neighborhood Longitude'], tt['Neighborhood'], tt['cluster']):
    label = folium.Popup(' Cluster ' + str(lat))
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7
    )\
    .add_to(map_clusters)
       
map_clusters