# Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto

# Gokhan Ince

### Note: Github does not show the maps.
### To see the project with the map please go to https://nbviewer.jupyter.org/github/incegokhan/Coursera_Capstone/blob/master/AssignmentWeek3.ipynb

## PART 1

### For this assignment, we are required to explore and cluster the neighborhoods in Toronto.

### First, we need to import pandas library to pull data from the link. And then assign the URL to a variable. And then pull the data by using read_html function to put them in a dataframe.

In [1]:
import pandas as pd
!pip install lxml

# I assigned the link to url variable
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

# I creaated a dataframe by pulling data from the link
df = pd.read_html(url, header = 0)[0]

Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/e7/a8/40115c84414c017e1a293f331709eb7534303d3ccd11ef805ac09b1481e7/lxml-4.4.1-cp37-cp37m-manylinux1_x86_64.whl (5.7MB)
[K     |████████████████████████████████| 5.8MB 194kB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.4.1


### After that we need to ignore the cells which has no borough information.

In [2]:
#Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
df = df[df.Borough != 'Not assigned']

### Sometimes a postal code is related to different neighbourhoods. We need to combine them.

In [3]:
#More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
df = df.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(list).apply(lambda x:', '.join(x)).to_frame().reset_index()

### If a cell has borough info but not assigned to a neighborhood, the neighborhood will be the same as the borough.

In [4]:
#If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
for index, row in df.iterrows():
    if row['Neighbourhood'] == 'Not assigned':
        row['Neighbourhood'] = row['Borough']

### Now, our dataframe is ready. Let's see its shape.

In [5]:
#shape of the dataframe
df.shape

(103, 3)

### Its shape is (103,3). It means our dataframe has 3 columns and 103 rows.
### Now we can see our data

In [6]:
# data
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


## Part 2

### Now, I will download the geospatial data of Toronto, create a dataframe and then show some data from the dataframe.

In [7]:
!wget -q -O 'Toronto_long_lat_data.csv'  http://cocl.us/Geospatial_data
df_lon_lat = pd.read_csv('Toronto_long_lat_data.csv')
df_lon_lat.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### I will rename the column names.Because I will be needed to merge the dataframes properly

In [8]:
df_lon_lat.columns=['Postcode','Latitude','Longitude']
df_lon_lat.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Now I will merge the dataframes by using Postcode and then I will create a new dataframe.

In [9]:
df_toronto = pd.merge(df,df_lon_lat[['Postcode','Latitude', 'Longitude']], on='Postcode')
df_toronto

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


## Part 3

### Now, I will import a library to convert an address in to latitude and longitude values.
### Geopy library will help us to get information.

In [10]:
!pip install geopy

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/80/93/d384479da0ead712bdaf697a8399c13a9a89bd856ada5a27d462fb45e47b/geopy-1.20.0-py2.py3-none-any.whl (100kB)
[K     |████████████████████████████████| 102kB 2.6MB/s ta 0:00:011
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/5b/ac/4f348828091490d77899bc74e92238e2b55c59392f21948f296e94e50e2b/geographiclib-1.49.tar.gz
Building wheels for collected packages: geographiclib
  Building wheel for geographiclib (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/jovyan/.cache/pip/wheels/99/45/d1/14954797e2a976083182c2e7da9b4e924509e59b6e5c661061
Successfully built geographiclib
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.49 geopy-1.20.0


In [11]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
# Matplotlib and associated modules
import matplotlib.cm as cm
import matplotlib.colors as colors

### Now, I will detect the latitude and longitude values of Toronto.

In [12]:
# Toronto is assigned to the variable "address"
address = 'Toronto, ON'
geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
latitude_toronto = location.latitude
longitude_toronto = location.longitude
print('Latitude of Toronto is {} and longtitude is {}.'.format(latitude_toronto, longitude_toronto))

Latitude of Toronto is 43.653963 and longtitude is -79.387207.


### Now, I will import Folium library to generate maps.

In [13]:
!pip install folium
import pandas as pd
import folium

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/72/ff/004bfe344150a064e558cb2aedeaa02ecbf75e60e148a55a9198f0c41765/folium-0.10.0-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 1.3MB/s eta 0:00:01
Installing collected packages: folium
Successfully installed folium-0.10.0


### The library is ready. Now we can generate our map.

In [14]:
map_toronto = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=10)
# added markers to map
for lat, lng, borough, Neighbourhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighbourhood']):
    label = '{}, {}'.format(Neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    


### At last we are able to see our map.

In [16]:
map_toronto

### Now, we will get the data from Foursquare. I will prepare the codes for it.

In [17]:
CLIENT_ID='NJQUG3TFKKQ534FY1LDPXMMQA5PW0T1EDNHXL0IFT20FYDHJ' #Required for query
CLIENT_SECRET='D55WTVEONESDKAJ3JT1CRDU1QCWJRP4DZPUPJMIU0D10VNXH' #Required for query
VERSION='20190913' #Required for query
LIMIT = 100  #We will to pull 100 records. But we don't have a premium account. Because of that we can pull only 100
radius = 500 # Radius is 500 meters.

lat_df = [] # An empty data frame for latitude values
long_df = [] # An empty data frame for latitude values
lat_df= df_toronto['Latitude'] 
long_df=df_toronto['Longitude']
url_df = [] # # An empty data frame for URLs

### Now i will create a while loop to get URLs for each neighborhood.

In [18]:
i = 0
length = df_toronto.shape[0] # I need it for loop. I will get Foursquare URL for each neighborhood.
while i < length:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat_df[i],
        long_df[i],
        radius, 
        LIMIT)
    url_df.append(url)
    i = i + 1

### With them we can get information from FourSquare

In [113]:
url_df

['https://api.foursquare.com/v2/venues/explore?&client_id=NJQUG3TFKKQ534FY1LDPXMMQA5PW0T1EDNHXL0IFT20FYDHJ&client_secret=D55WTVEONESDKAJ3JT1CRDU1QCWJRP4DZPUPJMIU0D10VNXH&v=20190913&ll=43.806686299999996,-79.19435340000001&radius=500&limit=100',
 'https://api.foursquare.com/v2/venues/explore?&client_id=NJQUG3TFKKQ534FY1LDPXMMQA5PW0T1EDNHXL0IFT20FYDHJ&client_secret=D55WTVEONESDKAJ3JT1CRDU1QCWJRP4DZPUPJMIU0D10VNXH&v=20190913&ll=43.7845351,-79.16049709999999&radius=500&limit=100',
 'https://api.foursquare.com/v2/venues/explore?&client_id=NJQUG3TFKKQ534FY1LDPXMMQA5PW0T1EDNHXL0IFT20FYDHJ&client_secret=D55WTVEONESDKAJ3JT1CRDU1QCWJRP4DZPUPJMIU0D10VNXH&v=20190913&ll=43.7635726,-79.1887115&radius=500&limit=100',
 'https://api.foursquare.com/v2/venues/explore?&client_id=NJQUG3TFKKQ534FY1LDPXMMQA5PW0T1EDNHXL0IFT20FYDHJ&client_secret=D55WTVEONESDKAJ3JT1CRDU1QCWJRP4DZPUPJMIU0D10VNXH&v=20190913&ll=43.7709921,-79.21691740000001&radius=500&limit=100',
 'https://api.foursquare.com/v2/venues/explore?&cli

### This function extracts the category of the venue.

In [20]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Now, i will pull the data from Foursquare, merge with df_toronto dataframe and export to a csv file.

In [22]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

j = 0

while j < length:
    try:
        results = requests.get(url_df[j]).json()
        venues = results['response']['groups'][0]['items']
        nearby_venues = json_normalize(venues)
        filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
        nearby_venues =nearby_venues.loc[:, filtered_columns]
        nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
        nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
    except:
        pass
    nearby_venues['zipcode'] = df_toronto['Postcode'][j]
    nearby_venues['brgh'] = df_toronto['Borough'][j].replace(' ','')
    nearby_venues['nghbrhd'] = df_toronto['Neighbourhood'][j].replace(' ','')
    nearby_venues['centerlatitude'] = df_toronto['Latitude'][j]
    nearby_venues['centerlong'] = df_toronto['Longitude'][j]
    export_csv = nearby_venues.to_csv ('export_dataframe.csv', mode='a',encoding='utf-8',index = None, header=True)
    
    j = j+1

### Now, I will pull the data from the csv file

In [23]:
venue_data = pd.read_csv('export_dataframe.csv')

### I will give new column names

In [24]:
venue_data.columns = ['Name','Categories','Latitude','Longitude','Zip Code','Borough','Neighborhood',
                            'Center Latitude','Center Longitude']

### It is the shape of the dataframe

In [25]:
venue_data.shape[0]

2364

### And some sample data...

In [26]:
venue_data.head()

Unnamed: 0,Name,Categories,Latitude,Longitude,Zip Code,Borough,Neighborhood,Center Latitude,Center Longitude
0,Wendy's,Fast Food Restaurant,43.80744841934756,-79.19905558052072,M1B,Scarborough,"Rouge,Malvern",43.806686299999996,-79.19435340000001
1,name,categories,lat,lng,zipcode,brgh,nghbrhd,centerlatitude,centerlong
2,Royal Canadian Legion,Bar,43.78253332838298,-79.16308473261682,M1C,Scarborough,"HighlandCreek,RougeHill,PortUnion",43.7845351,-79.16049709999999
3,name,categories,lat,lng,zipcode,brgh,nghbrhd,centerlatitude,centerlong
4,Swiss Chalet Rotisserie & Grill,Pizza Place,43.76769708292701,-79.1899135003439,M1E,Scarborough,"Guildwood,Morningside,WestHill",43.7635726,-79.1887115


### I will drop the rows where names are name

In [27]:
venue_data = venue_data[venue_data.Name != 'name']

In [28]:
venue_data.head()

Unnamed: 0,Name,Categories,Latitude,Longitude,Zip Code,Borough,Neighborhood,Center Latitude,Center Longitude
0,Wendy's,Fast Food Restaurant,43.80744841934756,-79.19905558052072,M1B,Scarborough,"Rouge,Malvern",43.8066863,-79.19435340000001
2,Royal Canadian Legion,Bar,43.78253332838298,-79.16308473261682,M1C,Scarborough,"HighlandCreek,RougeHill,PortUnion",43.7845351,-79.16049709999999
4,Swiss Chalet Rotisserie & Grill,Pizza Place,43.76769708292701,-79.1899135003439,M1E,Scarborough,"Guildwood,Morningside,WestHill",43.7635726,-79.1887115
5,G & G Electronics,Electronics Store,43.765309,-79.191537,M1E,Scarborough,"Guildwood,Morningside,WestHill",43.7635726,-79.1887115
6,Marina Spa,Spa,43.766,-79.191,M1E,Scarborough,"Guildwood,Morningside,WestHill",43.7635726,-79.1887115


### We can detect the venue categories in neighbourhoods and their numbers.

In [36]:
venue_data.groupby(['Categories'])['Neighborhood'].value_counts().sort_values(ascending=False)

Categories                     Neighborhood                                
Coffee Shop                    HarbourfrontEast,TorontoIslands,UnionStation    13
                               DesignExchange,TorontoDominionCentre            12
                               CentralBayStreet                                12
                               CommerceCourt,VictoriaHotel                     11
Clothing Store                 Fairview,HenryFarm,Oriole                       10
Coffee Shop                    FirstCanadianPlace,Undergroundcity              10
                               StnAPOBoxes25TheEsplanade                       10
                               Harbourfront,RegentPark                          9
Greek Restaurant               TheDanforthWest,Riverdale                        9
Coffee Shop                    Queen'sPark                                      9
                               Ryerson,GardenDistrict                           8
                      

### Now, i will use one-hot encoding to analyze the neighborhoods.

In [34]:
venue_category_onehot = pd.get_dummies(venue_data[['Categories']])
venue_category_onehot['Neighborhood'] = venue_data['Neighborhood']
fix_columns = venue_category_onehot.columns[-1] + venue_category_onehot.columns[:-1]
#venue_category_onehot = venue_category_onehot[fix_columns]
print(venue_category_onehot.shape)
venue_c = venue_category_onehot.groupby('Neighborhood').sum()

(2266, 277)


### Now, we can group the neighborhoods by frequency.

In [61]:
toronto_grouped = venue_category_onehot.groupby('Neighborhood').mean().reset_index()

### I will print the neighborhoods with the 5 most common venues.

In [62]:
# Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
                     venue  freq
0   Categories_Coffee Shop  0.08
1          Categories_Café  0.05
2           Categories_Bar  0.04
3    Categories_Steakhouse  0.04
4  Categories_Burger Joint  0.03


----Agincourt----
                          venue  freq
0     Categories_Breakfast Spot  0.25
1             Categories_Lounge  0.25
2       Categories_Skating Rink  0.25
3     Categories_Clothing Store  0.25
4  Categories_Accessories Store  0.00


----AgincourtNorth,L'AmoreauxEast,Milliken,SteelesEast----
                           venue  freq
0                Categories_Park  0.67
1          Categories_Playground  0.33
2  Categories_Miscellaneous Shop  0.00
3       Categories_Movie Theater  0.00
4               Categories_Motel  0.00


----AlbionGardens,BeaumondHeights,Humbergate,Jamestown,MountOlive,Silverstone,SouthSteeles,Thistletown----
                             venue  freq
0         Categories_Grocery Store  0.22
1  Categories_Fast Food Restaurant  0

### It is a function to get most common venues.

In [63]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [64]:
import numpy as np
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Categories_Coffee Shop,Categories_Café,Categories_Steakhouse,Categories_Bar,Categories_Cosmetics Shop,Categories_Thai Restaurant,Categories_Burger Joint,Categories_Restaurant,Categories_Hotel,Categories_American Restaurant
1,Agincourt,Categories_Lounge,Categories_Breakfast Spot,Categories_Skating Rink,Categories_Clothing Store,Categories_brgh,Categories_Empanada Restaurant,Categories_Donut Shop,Categories_Drugstore,Categories_Dumpling Restaurant,Categories_Eastern European Restaurant
2,"AgincourtNorth,L'AmoreauxEast,Milliken,Steeles...",Categories_Park,Categories_Playground,Categories_brgh,Categories_Electronics Store,Categories_Dog Run,Categories_Doner Restaurant,Categories_Donut Shop,Categories_Drugstore,Categories_Dumpling Restaurant,Categories_Eastern European Restaurant
3,"AlbionGardens,BeaumondHeights,Humbergate,James...",Categories_Grocery Store,Categories_Pizza Place,Categories_Fried Chicken Joint,Categories_Coffee Shop,Categories_Sandwich Place,Categories_Fast Food Restaurant,Categories_Beer Store,Categories_Pharmacy,Categories_Gluten-free Restaurant,Categories_Department Store
4,"Alderwood,LongBranch",Categories_Pizza Place,Categories_Pharmacy,Categories_Gym,Categories_Skating Rink,Categories_Pub,Categories_Coffee Shop,Categories_Athletics & Sports,Categories_Sandwich Place,Categories_Pool,Categories_Doner Restaurant
5,"BathurstManor,DownsviewNorth,WilsonHeights",Categories_Coffee Shop,Categories_Sushi Restaurant,Categories_Frozen Yogurt Shop,Categories_Chinese Restaurant,Categories_Deli / Bodega,Categories_Middle Eastern Restaurant,Categories_Diner,Categories_Sandwich Place,Categories_Restaurant,Categories_Pizza Place
6,BayviewVillage,Categories_Japanese Restaurant,Categories_Bank,Categories_Chinese Restaurant,Categories_Café,Categories_brgh,Categories_Doner Restaurant,Categories_Drugstore,Categories_Dumpling Restaurant,Categories_Eastern European Restaurant,Categories_Electronics Store
7,"BedfordPark,LawrenceManorEast",Categories_Pizza Place,Categories_Coffee Shop,Categories_Italian Restaurant,Categories_Greek Restaurant,Categories_Thai Restaurant,Categories_Grocery Store,Categories_Pharmacy,Categories_Pub,Categories_Restaurant,Categories_Café
8,BerczyPark,Categories_Coffee Shop,Categories_Cocktail Bar,Categories_Steakhouse,Categories_Café,Categories_Cheese Shop,Categories_Seafood Restaurant,Categories_Farmers Market,Categories_Bakery,Categories_Beer Bar,Categories_Belgian Restaurant
9,"BirchCliff,CliffsideWest",Categories_College Stadium,Categories_Café,Categories_Skating Rink,Categories_General Entertainment,Categories_Dumpling Restaurant,Categories_Dog Run,Categories_Doner Restaurant,Categories_Donut Shop,Categories_Drugstore,Categories_Eastern European Restaurant


### Now, we can start to cluster the neighborhoods. First let's import the library and then do the clustering

In [65]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 0, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

### Now, I will create a new dataframe from df_toronto dataframe and give new column names. I will be necessary for merging the dataframes.

In [77]:
withoutzipcode = df_toronto.drop(['Postcode'],axis=1)

In [80]:
withoutzipcode.columns =['Borough','Neighborhood','Latitude','Longitude']

In [111]:
withoutzipcode.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,Scarborough,Woburn,43.770992,-79.216917
4,Scarborough,Cedarbrae,43.773136,-79.239476


In [103]:
merged_toronto = pd.merge(neighborhoods_venues_sorted, withoutzipcode, left_index=True, right_index=True)

In [110]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

# add clustering labels
merged_toronto['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
#toronto_merged = merged_toronto.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

merged_toronto.head() # check the last columns!

Unnamed: 0,Neighborhood_x,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Neighborhood_y,Latitude,Longitude,Cluster Labels
0,"Adelaide,King,Richmond",Categories_Coffee Shop,Categories_Café,Categories_Steakhouse,Categories_Bar,Categories_Cosmetics Shop,Categories_Thai Restaurant,Categories_Burger Joint,Categories_Restaurant,Categories_Hotel,Categories_American Restaurant,Scarborough,"Rouge, Malvern",43.806686,-79.194353,1
1,Agincourt,Categories_Lounge,Categories_Breakfast Spot,Categories_Skating Rink,Categories_Clothing Store,Categories_brgh,Categories_Empanada Restaurant,Categories_Donut Shop,Categories_Drugstore,Categories_Dumpling Restaurant,Categories_Eastern European Restaurant,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,1
2,"AgincourtNorth,L'AmoreauxEast,Milliken,Steeles...",Categories_Park,Categories_Playground,Categories_brgh,Categories_Electronics Store,Categories_Dog Run,Categories_Doner Restaurant,Categories_Donut Shop,Categories_Drugstore,Categories_Dumpling Restaurant,Categories_Eastern European Restaurant,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0
3,"AlbionGardens,BeaumondHeights,Humbergate,James...",Categories_Grocery Store,Categories_Pizza Place,Categories_Fried Chicken Joint,Categories_Coffee Shop,Categories_Sandwich Place,Categories_Fast Food Restaurant,Categories_Beer Store,Categories_Pharmacy,Categories_Gluten-free Restaurant,Categories_Department Store,Scarborough,Woburn,43.770992,-79.216917,1
4,"Alderwood,LongBranch",Categories_Pizza Place,Categories_Pharmacy,Categories_Gym,Categories_Skating Rink,Categories_Pub,Categories_Coffee Shop,Categories_Athletics & Sports,Categories_Sandwich Place,Categories_Pool,Categories_Doner Restaurant,Scarborough,Cedarbrae,43.773136,-79.239476,1


### Now, it is the most exciting moment. We will be able to see the clustered neighborhoods on the map.

In [109]:
# create map
map_clusters = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(merged_toronto['Latitude'], merged_toronto['Longitude'], merged_toronto['Neighborhood_x'], merged_toronto['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Please check my project :)

## Thanks for reviewing
## Regards
## Gokhan Ince