## This notebook is to create the Dataframe with Canada's Postal codes, Borough and Neigborhood. The Dataframe will be reformatted as suggested in the section 1 of this assignment.

#### Import Required Libraries

In [2]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library
from geopy.geocoders import Nominatim
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will

#### Using Requests package get the required wikipedia page in the xml format. Then using lxml and BeautifulSoup packages get the list of Postal Codes, Borough and Neighborhood values and create a python list.

In [3]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
src_xml = BeautifulSoup(source, 'lxml')
table = src_xml.find('table', class_="wikitable sortable")
tbody = table.findAll('td')
list = []
for vals in tbody:
    list.append(vals.text)

#### Define the column names of the Dataframe

In [4]:
cols=['PostalCode', 'Borough', 'Neighborhood']
canada_df = pd.DataFrame(columns=cols)

#### Since the Postal codes, Borough and Neighborhood are in the same list, separate them in to different python lists and create the Dataframe

In [5]:
x = 0
cols1=[]
cols2=[]
cols3=[]
for s in range(0,280):
    cols1.append(list[x])
    x+=1
    cols2.append(list[x])
    x+=1
    cols3.append(list[x].replace('\n', ''))
    x+=1

canada_df['PostalCode']=cols1
canada_df['Borough']=cols2
canada_df['Neighborhood']=cols3

#### Remove the rows where Borough has the value 'Not assigned'

In [6]:
canada_df=canada_df.set_index('Borough')
canada_df=canada_df.drop('Not assigned', axis=0)
canada_df=canada_df.reset_index()
col_lst=['PostalCode', 'Borough', 'Neighborhood']
canada_df=canada_df[col_lst]

In [7]:
canada_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


#### There can be multiple Neighborhood for the same Postal code. So Group them by the Postal code/Borough and combine the rows such that there is only 1 row per Postal code with all the Neighborhood sparated by comma

In [8]:
canada_grp=canada_df.groupby(['PostalCode','Borough'], as_index=False, sort=True).agg(', '.join)

In [9]:
canada_grp.tail(20)

Unnamed: 0,PostalCode,Borough,Neighborhood
82,M6P,West Toronto,"High Park, The Junction South"
83,M6R,West Toronto,"Parkdale, Roncesvalles"
84,M6S,West Toronto,"Runnymede, Swansea"
85,M7A,Queen's Park,Not assigned
86,M7R,Mississauga,Canada Post Gateway Processing Centre
87,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
88,M8V,Etobicoke,"Humber Bay Shores, Mimico South, New Toronto"
89,M8W,Etobicoke,"Alderwood, Long Branch"
90,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
91,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So..."


#### There are many Neighoborhoods with no values (with 'Not assigned'). So assign the corresponding Borough values to the Neighborhood

In [10]:
for inx in canada_grp.index:
    if canada_grp['Neighborhood'][inx] == 'Not assigned':
        canada_grp['Neighborhood'][inx] = canada_grp['Borough'][inx]

canada_grp.tail(20)

Unnamed: 0,PostalCode,Borough,Neighborhood
82,M6P,West Toronto,"High Park, The Junction South"
83,M6R,West Toronto,"Parkdale, Roncesvalles"
84,M6S,West Toronto,"Runnymede, Swansea"
85,M7A,Queen's Park,Queen's Park
86,M7R,Mississauga,Canada Post Gateway Processing Centre
87,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
88,M8V,Etobicoke,"Humber Bay Shores, Mimico South, New Toronto"
89,M8W,Etobicoke,"Alderwood, Long Branch"
90,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
91,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So..."


In [11]:
canada_grp.shape

(102, 3)

#### Import the CSV file with the Latitude and Longitude for the Post codes. And then merge the new Dataframe with the earlier dataframe so that we have the Postal codes, Borough, Neighborhood and corresponding Latitute/Longitude values in the single Dataframe

In [12]:
file_name='https://cocl.us/Geospatial_data/Geospatial_Coordinates.csv'
latlong_df=pd.read_csv(file_name)

In [13]:
latlong_df.columns = ['PostalCode','Latitude','Longitude']
result_df = pd.merge(canada_grp,latlong_df,on='PostalCode',how='left')
result_df.columns = ['PostalCode','Borough', 'Neighborhood','Latitude','Longitude']
result_df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


#### Now get the Geographical coordinates of Toronto, Canada

In [14]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


#### Print the map of Toronto Neighborhood

In [32]:
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighbor in zip(result_df['Latitude'], result_df['Longitude'], result_df['Neighborhood']):
    label = folium.Popup(neighbor)
    folium.CircleMarker(
        location=[lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7
        ).add_to(map_Toronto)
print('Toronto Map')
map_Toronto

Toronto Map


#### Define Foursquare credentials

In [15]:
CLIENT_ID = 'PPULTGPLWG3NODEL1BMOIOBYMZGPNNAEMQ2DFYMJJY0DQHPH' # your Foursquare ID
CLIENT_SECRET = 'QPLVV2YP1RUXCTWHFR1NQCYMVQE4ZSI5VFKG2STZD5L3XOAJ' # your Foursquare Secret
VERSION = '20190531' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PPULTGPLWG3NODEL1BMOIOBYMZGPNNAEMQ2DFYMJJY0DQHPH
CLIENT_SECRET:QPLVV2YP1RUXCTWHFR1NQCYMVQE4ZSI5VFKG2STZD5L3XOAJ


#### Define the function to explore the venues around Toronto Neighborhood

In [20]:
LIMIT=100
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
print('Completed')

Completed


#### Call the above function to get the venues and create a Dataframe

In [21]:
Toronto_venues = getNearbyVenues(names=result_df['Neighborhood'],
                                   latitudes=result_df['Latitude'],
                                   longitudes=result_df['Longitude']
                                  )

In [22]:
Toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge, Malvern",43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Chris Effects Painting,43.784343,-79.163742,Construction & Landscaping
3,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place


In [23]:
print('There are {} uniques categories.'.format(len(Toronto_venues['Venue Category'].unique())))

There are 280 uniques categories.


#### Perform one Hot encoding for each venue category

In [24]:
Toronto_one = pd.DataFrame()

Toronto_one = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")
Toronto_one['Neighborhood'] = Toronto_venues['Neighborhood']
Toronto_one['Latitude'] = Toronto_venues['Venue Latitude']
Toronto_one['Longitude'] = Toronto_venues['Venue Longitude']
Toronto_one.columns

Index(['Accessories Store', 'Adult Boutique', 'Afghan Restaurant', 'Airport',
       'Airport Food Court', 'Airport Gate', 'Airport Lounge',
       'Airport Service', 'Airport Terminal', 'American Restaurant',
       ...
       'Video Game Store', 'Video Store', 'Vietnamese Restaurant',
       'Warehouse Store', 'Wine Bar', 'Wings Joint', 'Women's Store',
       'Yoga Studio', 'Latitude', 'Longitude'],
      dtype='object', length=282)

In [25]:
Toronto_grouped = Toronto_one.groupby('Neighborhood').mean().reset_index()
Toronto_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Latitude,Longitude
0,"Adelaide, King, Richmond",0.01,0.000000,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.000000,0.000000,0.000000,0.000000,0.010000,0.000000,0.000000,0.000000,43.650015,-79.384113
1,Agincourt,0.00,0.000000,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,43.793922,-79.260166
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.00,0.000000,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,43.815199,-79.289821
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.00,0.000000,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.000000,0.090909,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,43.741636,-79.586120
4,"Alderwood, Long Branch",0.00,0.000000,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,43.601698,-79.545303
5,"Bathurst Manor, Downsview North, Wilson Heights",0.00,0.000000,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.000000,0.055556,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,43.755542,-79.440516
6,Bayview Village,0.00,0.000000,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,43.787903,-79.380860
7,"Bedford Park, Lawrence Manor East",0.00,0.000000,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,43.733672,-79.419400
8,Berczy Park,0.00,0.000000,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,43.647262,-79.373650
9,"Birch Cliff, Cliffside West",0.00,0.000000,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,43.694203,-79.262554


#### Print Top 5 venues for each Neighborhood

In [26]:
num_top_venues = 5

for hood in Toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Toronto_grouped[Toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
             venue   freq
0         Latitude  43.65
1      Coffee Shop   0.06
2             Café   0.05
3       Steakhouse   0.04
4  Thai Restaurant   0.04


----Agincourt----
                venue   freq
0            Latitude  43.79
1      Breakfast Spot   0.25
2              Lounge   0.25
3  Chinese Restaurant   0.25
4      Sandwich Place   0.25


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
                venue   freq
0            Latitude  43.82
1          Playground   0.33
2    Asian Restaurant   0.33
3                Park   0.33
4  Miscellaneous Shop   0.00


----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
                 venue   freq
0             Latitude  43.74
1        Grocery Store   0.18
2         Liquor Store   0.09
3       Sandwich Place   0.09
4  Fried Chicken Joint   0.09


----Alderwood, Long Branch----
            venue  freq
0        Latitude  

                        venue   freq
0                    Latitude  43.72
1                      Bakery   0.20
2  Construction & Landscaping   0.20
3                        Park   0.20
4            Basketball Court   0.20


----East Birchmount Park, Ionview, Kennedy Park----
              venue   freq
0          Latitude  43.73
1        Playground   0.25
2  Department Store   0.25
3       Coffee Shop   0.25
4    Discount Store   0.25


----East Toronto----
               venue   freq
0           Latitude  43.69
1  Convenience Store   0.33
2        Coffee Shop   0.33
3               Park   0.33
4  Accessories Store   0.00


----Emery, Humberlea----
                venue   freq
0            Latitude  43.72
1      Baseball Field   1.00
2   Accessories Store   0.00
3  Miscellaneous Shop   0.00
4               Motel   0.00


----Fairview, Henry Farm, Oriole----
                  venue   freq
0              Latitude  43.78
1        Clothing Store   0.12
2  Fast Food Restaurant   0.09
3      

         venue   freq
0     Latitude  43.65
1  Coffee Shop   0.11
2         Café   0.04
3   Restaurant   0.04
4        Hotel   0.03


----Studio District----
                 venue   freq
0             Latitude  43.66
1                 Café   0.11
2          Coffee Shop   0.08
3  American Restaurant   0.05
4   Italian Restaurant   0.05


----The Annex, North Midtown, Yorkville----
            venue   freq
0        Latitude  43.67
1            Café   0.13
2     Coffee Shop   0.13
3  Sandwich Place   0.13
4     Pizza Place   0.09


----The Beaches----
                  venue   freq
0              Latitude  43.68
1     Health Food Store   0.20
2  Other Great Outdoors   0.20
3                 Trail   0.20
4                   Pub   0.20


----The Beaches West, India Bazaar----
               venue   freq
0           Latitude  43.67
1               Park   0.11
2        Pizza Place   0.06
3                Pub   0.06
4  Fish & Chips Shop   0.06


----The Danforth West, Riverdale----
          

#### Create a new Dataframe with top 10 venues for each Neighborhood

In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [28]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Toronto_grouped['Neighborhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Latitude,Coffee Shop,Café,American Restaurant,Bar,Steakhouse,Thai Restaurant,Cosmetics Shop,Gym,Bakery
1,Agincourt,Latitude,Sandwich Place,Lounge,Breakfast Spot,Chinese Restaurant,Health Food Store,Dumpling Restaurant,Dog Run,Doner Restaurant,Donut Shop
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Latitude,Playground,Park,Asian Restaurant,Health Food Store,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Latitude,Grocery Store,Pizza Place,Fried Chicken Joint,Pharmacy,Coffee Shop,Sandwich Place,Liquor Store,Beer Store,Fast Food Restaurant
4,"Alderwood, Long Branch",Latitude,Pizza Place,Pharmacy,Skating Rink,Dance Studio,Coffee Shop,Pool,Pub,Sandwich Place,Gym
5,"Bathurst Manor, Downsview North, Wilson Heights",Latitude,Coffee Shop,Pharmacy,Bank,Restaurant,Deli / Bodega,Sushi Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Supermarket
6,Bayview Village,Latitude,Japanese Restaurant,Bank,Chinese Restaurant,Café,Drugstore,Discount Store,Dog Run,Doner Restaurant,Donut Shop
7,"Bedford Park, Lawrence Manor East",Latitude,Fast Food Restaurant,Coffee Shop,Italian Restaurant,Sandwich Place,Sushi Restaurant,Japanese Restaurant,Juice Bar,Comfort Food Restaurant,Liquor Store
8,Berczy Park,Latitude,Coffee Shop,Cocktail Bar,Bakery,Beer Bar,Café,Farmers Market,Seafood Restaurant,Steakhouse,Cheese Shop
9,"Birch Cliff, Cliffside West",Latitude,General Entertainment,Skating Rink,Café,College Stadium,Construction & Landscaping,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant


#### Run k-means to cluster the Neighborhood into 5 clusters

In [29]:
kclusters = 5

Toronto_grouped_clustering = Toronto_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

kmeans.labels_

array([0, 0, 4, 2, 2, 2, 0, 0, 0, 0, 2, 0, 0, 4, 0, 0, 4, 2, 0, 0, 0, 2, 0,
       0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 0, 4, 3, 0, 0, 0,
       4, 2, 0, 0, 0, 0, 2, 0, 0, 2, 3, 2, 2, 2, 0, 2, 4, 0, 0, 0, 1, 0, 2,
       2, 0, 4, 0, 4, 2, 0, 2, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 2, 4, 0, 0, 2,
       4, 0, 2, 0, 0, 0, 4], dtype=int32)

In [30]:
Toronto_merged = Toronto_grouped

# add clustering labels
Toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Toronto_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Coffee Shop,Café,American Restaurant,Bar,Steakhouse,Thai Restaurant,Cosmetics Shop,Gym,Bakery
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Sandwich Place,Lounge,Breakfast Spot,Chinese Restaurant,Health Food Store,Dumpling Restaurant,Dog Run,Doner Restaurant,Donut Shop
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Playground,Park,Asian Restaurant,Health Food Store,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Grocery Store,Pizza Place,Fried Chicken Joint,Pharmacy,Coffee Shop,Sandwich Place,Liquor Store,Beer Store,Fast Food Restaurant
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Pizza Place,Pharmacy,Skating Rink,Dance Studio,Coffee Shop,Pool,Pub,Sandwich Place,Gym


#### Print the map of Toronto with neighborhood clusters

In [31]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighborhood'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters