# Please Note: I have made single notebook for all three question please scroll down for all review all questions

# Question 1: 

## Use the BeautifulSoup package or any other way you are comfortable with to transform the data in the table on the Wikipedia page into the above pandas dataframe

#### Install Beautiful Soup and Request

In [1]:
!pip install beautifulsoup4
!pip install requests



#### Get website table content from the wiki page and store in soup

In [4]:
#import libraries and get the neighborhood table on wiki page
from bs4 import BeautifulSoup
import requests
r  = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(r.text)
neighborhood_table=soup.table
neighborhood_table

<table class="wikitable sortable">
<tbody><tr>
<th>Postcode</th>
<th>Borough</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
</td></tr>
<tr>
<td>M4A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Harbourfront_(Toronto)" title="Harbourfront (Toronto)">Harbourfront</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Regent_Park" title="Regent Park">Regent Park</a>
</td></tr>
<tr>
<td>M6A</td>

#### Convert table content to Data Frame

In [6]:
import pandas as pd

neighborhood_list=[]
for row in neighborhood_table.find_all('tr'):
    col=row.find_all('td')
    # skip the header of table
    if(len(col)==0):
        continue
    neighborhood_list.append([col[0].text.strip(),col[1].text.strip(),col[2].text.strip()])
print(neighborhood_list[0])

neighborhood_df=pd.DataFrame(neighborhood_list)
neighborhood_df.columns=['Postcode','Borough','Neighborhood']
neighborhood_df.head()

['M1A', 'Not assigned', 'Not assigned']


Unnamed: 0,Postcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [7]:
neighborhood_df=neighborhood_df[neighborhood_df['Borough']!='Not assigned']
neighborhood_df=neighborhood_df.reset_index(drop=True)
neighborhood_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.

In [8]:
neighborhood_df.loc[neighborhood_df['Neighborhood']=="Not assigned", "Neighborhood"]=neighborhood_df['Borough']
neighborhood_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

In [9]:
neighborhood_df=neighborhood_df.groupby(['Postcode', 'Borough'], sort=False).agg(','.join)
neighborhood_df.reset_index(['Postcode', 'Borough'],inplace=True)
neighborhood_df.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront,Regent Park"
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge,Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens,Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson,Garden District"


use the .shape method to print the number of rows of your dataframe.

In [10]:
neighborhood_df.shape

(103, 3)

# Question 2

## Use Geocoder Package to get the latitude and the longitude coordinates of each neighborhood

#### Add Longitude and Latitude for each Borough

In [11]:
!pip install geocoder
import geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 19.2MB/s ta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [14]:
dummy = [0.000000] * len(neighborhood_df.index)
neighborhood_df["Latitude"]=dummy
neighborhood_df["Longitude"]=dummy
neighborhood_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,0.0,0.0
1,M4A,North York,Victoria Village,0.0,0.0
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",0.0,0.0
3,M6A,North York,"Lawrence Heights,Lawrence Manor",0.0,0.0
4,M7A,Queen's Park,Queen's Park,0.0,0.0


In [15]:
for ind in neighborhood_df.index:
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(neighborhood_df['Postcode'][ind]),key="")
        lat_lng_coords = g.latlng
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    neighborhood_df["Latitude"][ind]=latitude
    neighborhood_df["Longitude"][ind]=longitude
    
print(neighborhood_df.shape)
neighborhood_df.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


(103, 5)


Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


# Question 3

## Explore and cluster the neighborhoods in Toronto

### Utilizing the Foursquare API to explore the neighborhoods

Define Foursquare Credentials and Version

In [16]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


Define get_category_type function from the Foursquare lab, which extracts the category of the venue

In [17]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Get Venues of each Neighbourhood from Foursquare API 

In [18]:
def getNearbyVenues(postcodes, boroughs, neighbourhoods, latitudes, longitudes, radius=500, LIMIT = 100):
    venues_list=[]
    for postcode, bor, neigh, lat, lng in zip(postcodes, boroughs, neighbourhoods, latitudes, longitudes):
        #print(name)
    # create URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        radius, 
        LIMIT
        )
    
        # make the GET request
        response=requests.get(url).json()
        results = response["response"]['groups'][0]['items']
        if len(results)==0:
            print("No venues for this neighborhood: ", postcode,bor,neigh,lat,lng)
        # return only relevant information for each nearby venue
        venues_list.append([(
            postcode,
            bor,
            neigh,
            lat,
            lng,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postcode', 
                  'Borough', 
                  'Neighborhood', 
                  'Latitude', 
                  'Longitude',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

    return(nearby_venues)
    
    

 Run the above function on each neighborhood and create a new dataframe called toronto_venues.

In [21]:
toronto_venues=getNearbyVenues(neighborhood_df['Postcode'], neighborhood_df['Borough'], neighborhood_df['Neighborhood'], neighborhood_df['Latitude'], neighborhood_df['Longitude'],500,100)

No venues for this neighborhood:  M9A Etobicoke Islington Avenue 43.6678556 -79.5322424
No venues for this neighborhood:  M9B Etobicoke Cloverdale,Islington,Martin Grove,Princess Gardens,West Deane Park 43.65094320000001 -79.5547244
No venues for this neighborhood:  M2M North York Newtonbrook,Willowdale 43.789053 -79.40849279999999
No venues for this neighborhood:  M1X Scarborough Upper Rouge 43.8361247 -79.20563609999999


Let's check the size of the resulting dataframe

In [22]:
print(toronto_venues.shape)
toronto_venues.head()

(2314, 9)


Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M3A,North York,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,M3A,North York,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,M4A,North York,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,M4A,North York,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,M4A,North York,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


Let's check how many venues were returned for each neighborhood

In [23]:
toronto_venues.groupby("Neighborhood").count()

Unnamed: 0_level_0,Postcode,Borough,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100,100,100
Agincourt,6,6,6,6,6,6,6,6
"Agincourt North,L'Amoreaux East,Milliken,Steeles East",4,4,4,4,4,4,4,4
"Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown",10,10,10,10,10,10,10,10
"Alderwood,Long Branch",10,10,10,10,10,10,10,10
"Bathurst Manor,Downsview North,Wilson Heights",19,19,19,19,19,19,19,19
Bayview Village,4,4,4,4,4,4,4,4
"Bedford Park,Lawrence Manor East",23,23,23,23,23,23,23,23
Berczy Park,57,57,57,57,57,57,57,57
"Birch Cliff,Cliffside West",4,4,4,4,4,4,4,4


Let's find out how many unique categories can be curated from all the returned venues

In [24]:
print("There are {} unique category".format(len(toronto_venues["Venue Category"].unique())))

There are 268 unique category


### Analyze Each Neighborhood

In [29]:
# one hot encoding
toronto_venues_onehot=pd.get_dummies(toronto_venues[["Venue Category"]], prefix="", prefix_sep="")
print(toronto_venues_onehot.shape)
toronto_venues_onehot.head()

(2314, 268)


Unnamed: 0,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [30]:
# add neighbourhood column to venue detais data frame
toronto_venues_onehot['Neighborhood']=toronto_venues['Neighborhood']

print(toronto_venues_onehot.shape)
toronto_venues_onehot.head()

(2314, 268)


Unnamed: 0,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [31]:
toronto_venues_grouped=toronto_venues_onehot.groupby('Neighborhood').mean().reset_index()
toronto_venues_grouped

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide,King,Richmond",0.00,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,0.030000,...,0.0,0.020000,0.000000,0.000000,0.000000,0.000000,0.010000,0.000000,0.00,0.000000
1,Agincourt,0.00,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",0.00,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",0.00,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
4,"Alderwood,Long Branch",0.00,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
5,"Bathurst Manor,Downsview North,Wilson Heights",0.00,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,0.000000,...,0.0,0.000000,0.000000,0.052632,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
6,Bayview Village,0.00,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
7,"Bedford Park,Lawrence Manor East",0.00,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,0.043478,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
8,Berczy Park,0.00,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,0.000000,...,0.0,0.017544,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
9,"Birch Cliff,Cliffside West",0.00,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [33]:
num_top_venues = 5
for hood in toronto_venues_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_venues_grouped[toronto_venues_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
             venue  freq
0      Coffee Shop  0.08
1             Café  0.05
2              Bar  0.04
3       Steakhouse  0.04
4  Thai Restaurant  0.03


----Agincourt----
                venue  freq
0      Sandwich Place  0.17
1      Breakfast Spot  0.17
2  Chinese Restaurant  0.17
3          Print Shop  0.17
4              Lounge  0.17


----Agincourt North,L'Amoreaux East,Milliken,Steeles East----
                      venue  freq
0                      Park  0.50
1                Playground  0.25
2          Sculpture Garden  0.25
3            Medical Center  0.00
4  Mediterranean Restaurant  0.00


----Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown----
                  venue  freq
0           Pizza Place   0.2
1         Grocery Store   0.1
2  Fast Food Restaurant   0.1
3            Beer Store   0.1
4   Fried Chicken Joint   0.1


----Alderwood,Long Branch----
                venue  freq
0         

                        venue  freq
0  Construction & Landscaping   0.5
1              Baseball Field   0.5
2           Accessories Store   0.0
3          Miscellaneous Shop   0.0
4                       Motel   0.0


----Fairview,Henry Farm,Oriole----
                  venue  freq
0        Clothing Store  0.15
1  Fast Food Restaurant  0.07
2           Coffee Shop  0.07
3   Japanese Restaurant  0.04
4            Food Court  0.03


----First Canadian Place,Underground city----
         venue  freq
0  Coffee Shop  0.12
1         Café  0.07
2        Hotel  0.04
3   Restaurant  0.04
4   Steakhouse  0.04


----Flemingdon Park,Don Mills South----
                 venue  freq
0  Sporting Goods Shop  0.09
1           Beer Store  0.09
2          Coffee Shop  0.09
3                  Gym  0.09
4         Concert Hall  0.05


----Forest Hill North,Forest Hill West----
               venue  freq
0      Jewelry Store  0.25
1               Park  0.25
2   Sushi Restaurant  0.25
3              Trail  0.

               venue  freq
0     Sandwich Place  0.15
1               Café  0.15
2        Coffee Shop  0.10
3     History Museum  0.05
4  Indian Restaurant  0.05


----The Beaches----
                  venue  freq
0                 Trail  0.17
1     Health Food Store  0.17
2                  Park  0.17
3  Other Great Outdoors  0.17
4                   Pub  0.17


----The Beaches West,India Bazaar----
                venue  freq
0                Park  0.11
1       Movie Theater  0.05
2  Italian Restaurant  0.05
3        Burger Joint  0.05
4       Burrito Place  0.05


----The Danforth West,Riverdale----
                    venue  freq
0        Greek Restaurant  0.21
1             Coffee Shop  0.10
2          Ice Cream Shop  0.07
3      Italian Restaurant  0.07
4  Furniture / Home Store  0.05


----The Junction North,Runnymede----
               venue  freq
0  Convenience Store  0.25
1     Breakfast Spot  0.25
2      Grocery Store  0.25
3           Bus Line  0.25
4  Accessories Store  0.

#### Let's put sorted venues into a pandas dataframe

First, let's write a function to sort the venues in descending order.

In [34]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [35]:
import numpy as np

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_venues_grouped["Neighborhood"].unique()

for ind in np.arange(toronto_venues_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_venues_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,Bar,Steakhouse,Cosmetics Shop,Thai Restaurant,Hotel,Restaurant,Bakery,American Restaurant
1,Agincourt,Print Shop,Breakfast Spot,Chinese Restaurant,Latin American Restaurant,Sandwich Place,Lounge,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Discount Store
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Park,Playground,Sculpture Garden,Yoga Studio,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Pizza Place,Beer Store,Japanese Restaurant,Sandwich Place,Discount Store,Fried Chicken Joint,Pharmacy,Grocery Store,Fast Food Restaurant,Electronics Store
4,"Alderwood,Long Branch",Pizza Place,Coffee Shop,Skating Rink,Sandwich Place,Athletics & Sports,Pub,Dance Studio,Gym,Pharmacy,Construction & Landscaping


 ### Cluster Neighborhoods

Run k-means to cluster the neighborhood into 5 clusters.

In [36]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library


Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    branca-0.3.1               |             py_0          25 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.2 MB

The following NEW packages will be 

In [38]:
# set number of clusters
kclusters = 5

toronto_venues_grouped_clustering = toronto_venues_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_venues_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 3, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [39]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [45]:
toronto_venues_merged = neighborhood_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_venues_merged = toronto_venues_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_venues_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,3.0,Food & Drink Shop,Park,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Yoga Studio
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Pizza Place,Hockey Arena,Intersection,Portuguese Restaurant,Coffee Shop,Electronics Store,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Event Space
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636,1.0,Coffee Shop,Park,Bakery,Pub,Mexican Restaurant,Restaurant,Café,Breakfast Spot,Theater,Yoga Studio
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763,1.0,Clothing Store,Furniture / Home Store,Accessories Store,Sporting Goods Shop,Miscellaneous Shop,Coffee Shop,Gift Shop,Vietnamese Restaurant,Event Space,Boutique
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,1.0,Coffee Shop,Diner,Park,Gym,Yoga Studio,Italian Restaurant,Japanese Restaurant,Nightclub,Seafood Restaurant,Sandwich Place


### We find that there is no data available for some neighbourhood droping that row

In [46]:
toronto_venues_merged.dropna(inplace=True)
toronto_venues_merged[["Cluster Labels"]]=toronto_venues_merged[["Cluster Labels"]].astype(int)
print(toronto_venues_merged["Cluster Labels"].unique())

[3 1 0 2 4]


Finally, let's visualize the resulting clusters

In [47]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_venues_merged['Latitude'], toronto_venues_merged['Longitude'], toronto_venues_merged['Neighborhood'], toronto_venues_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

Cluster 0

In [50]:
toronto_venues_merged.loc[toronto_venues_merged['Cluster Labels'] == 0, toronto_venues_merged.columns[[1] + list(range(5, toronto_venues_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,0,Fast Food Restaurant,Yoga Studio,Diner,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant


Cluster 1

In [49]:
toronto_venues_merged.loc[toronto_venues_merged['Cluster Labels'] == 1, toronto_venues_merged.columns[[1] + list(range(5, toronto_venues_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,1,Pizza Place,Hockey Arena,Intersection,Portuguese Restaurant,Coffee Shop,Electronics Store,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Event Space
2,Downtown Toronto,1,Coffee Shop,Park,Bakery,Pub,Mexican Restaurant,Restaurant,Café,Breakfast Spot,Theater,Yoga Studio
3,North York,1,Clothing Store,Furniture / Home Store,Accessories Store,Sporting Goods Shop,Miscellaneous Shop,Coffee Shop,Gift Shop,Vietnamese Restaurant,Event Space,Boutique
4,Queen's Park,1,Coffee Shop,Diner,Park,Gym,Yoga Studio,Italian Restaurant,Japanese Restaurant,Nightclub,Seafood Restaurant,Sandwich Place
7,North York,1,Caribbean Restaurant,Gym / Fitness Center,Japanese Restaurant,Basketball Court,Café,Yoga Studio,Dumpling Restaurant,Doner Restaurant,Donut Shop,Drugstore
8,East York,1,Pizza Place,Fast Food Restaurant,Gym / Fitness Center,Gastropub,Pharmacy,Café,Bus Line,Breakfast Spot,Bank,Intersection
9,Downtown Toronto,1,Clothing Store,Coffee Shop,Cosmetics Shop,Café,Bakery,Diner,Japanese Restaurant,Italian Restaurant,Restaurant,Bookstore
10,North York,1,Bakery,Pub,Italian Restaurant,Asian Restaurant,Japanese Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Yoga Studio
13,North York,1,Gym,Beer Store,Sporting Goods Shop,Coffee Shop,Chinese Restaurant,Fast Food Restaurant,Bubble Tea Shop,Concert Hall,Bike Shop,Sandwich Place
14,East York,1,Skating Rink,Park,Curling Ice,Pharmacy,Spa,Video Store,Beer Store,Cosmetics Shop,Athletics & Sports,Dog Run


Cluster 2

In [51]:
toronto_venues_merged.loc[toronto_venues_merged['Cluster Labels'] == 2, toronto_venues_merged.columns[[1] + list(range(5, toronto_venues_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Scarborough,2,Construction & Landscaping,Bar,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
57,North York,2,Construction & Landscaping,Baseball Field,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
101,Etobicoke,2,Baseball Field,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,Filipino Restaurant


Cluster 3

In [52]:
toronto_venues_merged.loc[toronto_venues_merged['Cluster Labels'] == 3, toronto_venues_merged.columns[[1] + list(range(5, toronto_venues_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,3,Food & Drink Shop,Park,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Yoga Studio
21,York,3,Park,Fast Food Restaurant,Women's Store,Market,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
35,East York,3,Park,Convenience Store,Coffee Shop,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
40,North York,3,Airport,Park,Yoga Studio,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
64,York,3,Park,Convenience Store,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Empanada Restaurant
66,North York,3,Park,Bank,Convenience Store,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
77,Etobicoke,3,Park,Bus Line,Mobile Phone Shop,Yoga Studio,Dumpling Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant
83,Central Toronto,3,Playground,Park,Dumpling Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant
85,Scarborough,3,Park,Playground,Sculpture Garden,Yoga Studio,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
91,Downtown Toronto,3,Park,Playground,Trail,Yoga Studio,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore


Cluster 4

In [53]:
toronto_venues_merged.loc[toronto_venues_merged['Cluster Labels'] == 4, toronto_venues_merged.columns[[1] + list(range(5, toronto_venues_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough,4,Playground,Dim Sum Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
