# Applied data science capstone

This notebook is part of the Coursera course for Applied data science capstone (week 3).

To do so, this notebook will folow three parts: 

The first two were presented in previous notebooks, but they were included here to gather and clean the data.

Part 1 was presented in the previous notebook and consist of the following steps:

1. Import the necessary modules.
2. Get the data provided in the lab section.
3. Check the data.
4. Cleaning the data I (if the value of column "Borough" is "Not Assigned" the row will be droped). 
5. Checking the dataframe (verify if all the row that didn't have an assigned value for the column "Borough" were excluded).
6. Cleaning the data II (if the value of "Neighbourhood" is "Not Assigned", the value of "Neighbourhood" will be equal to the value of "Borough").
7. Checking the dataframe (verifying if the postalcode "M7A", Queen's Land was updated).
8. Joining the Neighbourhoods with the same postal code.
9. Checking the shape of the dataframe.

Part 2 was presented in the previous notebook and consist of the following steps:

1. Import the extra modules for Part 2.
2. Getting the coordinates using geocoder (since the package is unreliable, step 2 was not performed. Steps 3 to 5 five imported the information).
3. Retrieving the geodata using the url provided in the lab.
4. Using geo dataframe to create two support lists to complete the first dataframe.
5. Creating the new columns (latitude and longitude) and updating the dataframe with the extra information.

Part 3 is the main goal of this notebook and consists of the following steps:

1. Create a dataframe only with the information from Toronto's neighbourhoods.
2. Load the modules to explore the neighbourhoods.
3. Saving the credential for the Foursquare API (they will be hidden for privacy)
4. Exploring the neighbourhoods.
5. Choosing the best restaurant.
6. Clustering the venues from the neighbouhoods.

1 - First, let's import the modules

In [2]:
import pandas as pd
from pandas.io.html import read_html
import numpy as np

2 - Get the data from the url provided in the lab and assigning it to a variable

In [3]:
url_page = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

#Searching in url for wikitables
table = read_html(url_page,  attrs={"class":"wikitable"})

#Check how many tables were imported
print ("Extracted {num} table from url".format(num=len(table)))

Extracted 1 table from url


3 - Since only one table were imported, the data can be acessed through "table[0]"

In [4]:
table[0].head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


4 - Cleaning the cells that doens't have an assigned Borough

In [5]:
#Identify where the indexes where Borough is "Not assigned"
indexNames = table[0][ table[0]['Borough'] == 'Not assigned' ].index

# Delete these row indexes from dataFrame
table[0].drop(indexNames , inplace=True)

5 - Checking the table to verify if all the row that met the criteria were excluded

In [6]:
indexNames = table[0][ table[0]['Borough'] == 'Not assigned' ].index
len(indexNames)

0

Since no indexes were found labeled as "Not Assigned" the "Borough", column "Borough" is cleaned

6 - If the neighbourhood is not assigned, the neighborhood will be equal to Borough

In [7]:
#First, identify all indixes of neighbourhoods that doens't have an assigned value
indexNames = table[0][ table[0]['Neighbourhood'] == 'Not assigned' ].index

#Assigning the value of "Borough" to "Neighbourhood" when "Neighbourhood" = "Not assigned"
for i in indexNames:
    table[0]['Neighbourhood'][i] = table[0]['Borough'][i] 

7 - Verifying if M7A "Queen's park" was updated, since it was the only value with a Borough that was missing the Neighboorhood

In [8]:
table[0].loc[table[0]['Postcode'] == 'M7A']

Unnamed: 0,Postcode,Borough,Neighbourhood
8,M7A,Queen's Park,Queen's Park


8 - Joining the neighboorhood with the same postal code area

In [9]:
#Grouping the information based on the Postcode and joining the column neighbourhood that has the same postal code separeted by a comma
table[0] = table[0].groupby(['Postcode', 'Borough'],as_index=False)['Neighbourhood'].agg(lambda x:', '.join(x))
table[0]

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


9 - Verifying the frame's dimensions

In [10]:
table[0].shape

(103, 3)

# Part 2 of the lab section

From this step forward, the notebook will identify the geolocation of the neighbourhoods

1 - Installing geocoder (if not installed) and importing geocoder for Part 2

In [25]:
!conda install -c conda-forge geocoder --yes
import geocoder

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geocoder-1.38.1            |             py_1          53 KB  conda-forge
    ratelim-0.1.6              |             py_2           6 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          59 KB

The following NEW packages will be INSTALLED:

    geocoder: 1.38.1-py_1 conda-forge
    ratelim:  0.1.6-py_2  conda-forge


Downloading and Extracting Packages
geocoder-1.38.1      | 53 KB     | ##################################### | 100% 
ratelim-0.1.6        | 6 KB      | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done


2 - Getting the coordinates based on postalcode using geocoders

In [None]:
# initialize an empty list
lat_lng_coords = [None]*len(table[0])

# loop until you get the coordinates
for i in [0,2]:
    while(lat_lng_coords[i] is None):
        print(table[0]['Postcode'][i])
        g = geocoder.google('{}, Toronto, Ontario'.format(table[0]['Postcode'][i]))
        lat_lng_coords[i] = g.latlng
        print(g)
    table[0]['Latitude'][i] = lat_lng_coords[i][0]
    table[0]['longitude'][i] = lat_lng_coords[i][1]
    
table[0]

As explained in the lab section, geocoder can be unreliable, this means that the command geocoder.google might not return any coordinates.
Due the unreliability of the package, the second step will not be followed. Step 3 will replace step 2 due the unreliability of the package.

3 - Retrieving the geodata from the csv file provided in the lab.

In [11]:
#Saving the url of the csv file
url = 'http://cocl.us/Geospatial_data'

geo = pd.read_csv(url)
geo

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


The data retrieved from the url presents the latitute and longitute of each post code, therefore, by comparing the "Postcode" column from table[0] with the column Postal code from geo, it is possible to update table[0] and add the latitude and longitude.

4 - Use the postal code from the geo dataframe to complete table[0].

In [12]:
#Create two empty list to save the latitude and longitude to save the information in the same order as presented in table[0]
latitude = []
longitude = []

#All codes from table[0] should be filled, so a for loop is needed
for i in range(len(table[0])):
    if geo["Postal Code"][i] == table[0]["Postcode"][i]:
        latitude.append(geo['Latitude'][i])
        longitude.append(geo['Longitude'][i])

5 - Create the new columns on the datafram table[0].

In [13]:
table[0]['Latitude'] = latitude
table[0]['Longitude'] = longitude
table[0]

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


# Part 3  - Exploring the neighbourhoods

In the third part of the notebook, we will explore the neighbourhoods of Toronto and group them into clusters using Kmeans.

1 - Fist, we will create a dataframe called "Toronto" to include only the neighbourhoods that bellongs to the city

In [14]:
#First, let's copy table[0]
toronto = table[0]

#Search in the boroughs for the substring 'Toronto'
for i in toronto['Borough']:
    if 'Toronto' not in i:#If the substring 'Toronto' is not in 'Borough': drop the line.
        toronto.drop(toronto.loc[toronto['Borough']==i].index, inplace=True)
        
toronto

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.676357,-79.293031
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
42,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
43,M4M,East Toronto,Studio District,43.659526,-79.340923
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
45,M4P,Central Toronto,Davisville North,43.712751,-79.390197
46,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
47,M4S,Central Toronto,Davisville,43.704324,-79.38879
48,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
49,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049


2 - Load the modules to explore the neighbourhoods.

In [43]:
# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

# library to handle requests
import requests 

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#If the folium library is not installed, download it
#!conda install -c conda-forge folium=0.5.0 --yes
!pip install folium
import folium # plotting library



3 - Saving the credential for the Foursquare API (they will be hidden for privacy)

In [17]:
CLIENT_ID = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' # your Foursquare ID
CLIENT_SECRET = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' # your Foursquare Secret
VERSION = '20190928'
LIMIT = 30

4 - Exploring the neighbourhoods

Let's assume you are going out for dinner at Downtown Toronto and want to know the locations for Italian restaurants within 1000 meters of its center.

In [18]:
#First we define what we are looking for
search_query = 'Italian'

#Then we define the radious
radius = 1000

#Get the latitude and longitude of Downtown Toronto 
lat = toronto[toronto.Borough ==  'Downtown Toronto'].iloc[0]['Latitude']
long = toronto[toronto.Borough ==  'Downtown Toronto'].iloc[0]['Longitude']

#Finaly we set the url for our search
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, long, VERSION, search_query, radius, LIMIT)

With the url we made the request to get the information regarding Italian restaurants in Downtown Toronto using the command requests.

In [19]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d9546bd2c20170037697631'},
 'response': {'venues': [{'id': '53bed05e498e908d95ce1f4d',
    'name': 'italian vegatarian  magic',
    'location': {'address': '7 Yorkville Ave.',
     'lat': 43.673800977060786,
     'lng': -79.38661007203328,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.673800977060786,
       'lng': -79.38661007203328}],
     'distance': 972,
     'cc': 'CA',
     'city': 'Toronto',
     'state': 'ON',
     'country': 'Canada',
     'formattedAddress': ['7 Yorkville Ave.', 'Toronto ON', 'Canada']},
    'categories': [{'id': '4bf58dd8d48988d110941735',
      'name': 'Italian Restaurant',
      'pluralName': 'Italian Restaurants',
      'shortName': 'Italian',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/italian_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1570064061',
    'hasPerk': False},
   {'id': '4b5f24fcf964a520e0a829e3',
    'name': 'Bacaro Italian Eatery',

The Foursquare returned a json file. To better interpret the data, let's convert it into a dataframe.

In [20]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
italian = json_normalize(venues)
italian.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.state,name,referralId
0,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",False,53bed05e498e908d95ce1f4d,7 Yorkville Ave.,CA,Toronto,Canada,,972,"[7 Yorkville Ave., Toronto ON, Canada]","[{'label': 'display', 'lat': 43.67380097706078...",43.673801,-79.38661,ON,italian vegatarian magic,v-1570064061
1,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",False,4b5f24fcf964a520e0a829e3,2 Bloor St East,CA,Toronto,Canada,Yonge + Bloor,1130,"[2 Bloor St East (Yonge + Bloor), Toronto ON, ...","[{'label': 'display', 'lat': 43.67082400553553...",43.670824,-79.384675,ON,Bacaro Italian Eatery,v-1570064061
2,"[{'id': '4bf58dd8d48988d1d6941735', 'name': 'S...",False,50f764c9e4b0fb724d4966e1,914 Yonge Street,CA,,Canada,,1180,"[914 Yonge Street, Canada]","[{'label': 'display', 'lat': 43.67371080453014...",43.673711,-79.389755,,La Casa Rumbera Del Italiano,v-1570064061


Just three italian restaurants were found within 1.000 meters of Downtown Toronto

Let's filter the dataframe to keep only the necessary information.

In [21]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in italian.columns if col.startswith('location.')] + ['id']
italian_filtered = italian.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
italian_filtered['categories'] = italian_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
italian_filtered.columns = [column.split('.')[-1] for column in italian_filtered.columns]

italian_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,state,id
0,italian vegatarian magic,Italian Restaurant,7 Yorkville Ave.,CA,Toronto,Canada,,972,"[7 Yorkville Ave., Toronto ON, Canada]","[{'label': 'display', 'lat': 43.67380097706078...",43.673801,-79.38661,ON,53bed05e498e908d95ce1f4d
1,Bacaro Italian Eatery,Italian Restaurant,2 Bloor St East,CA,Toronto,Canada,Yonge + Bloor,1130,"[2 Bloor St East (Yonge + Bloor), Toronto ON, ...","[{'label': 'display', 'lat': 43.67082400553553...",43.670824,-79.384675,ON,4b5f24fcf964a520e0a829e3
2,La Casa Rumbera Del Italiano,Strip Club,914 Yonge Street,CA,,Canada,,1180,"[914 Yonge Street, Canada]","[{'label': 'display', 'lat': 43.67371080453014...",43.673711,-79.389755,,50f764c9e4b0fb724d4966e1


The last location is a club, since we are looking for restaurants, let's drop it.

In [22]:
italian_filtered.drop(2,inplace=True)
italian_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,state,id
0,italian vegatarian magic,Italian Restaurant,7 Yorkville Ave.,CA,Toronto,Canada,,972,"[7 Yorkville Ave., Toronto ON, Canada]","[{'label': 'display', 'lat': 43.67380097706078...",43.673801,-79.38661,ON,53bed05e498e908d95ce1f4d
1,Bacaro Italian Eatery,Italian Restaurant,2 Bloor St East,CA,Toronto,Canada,Yonge + Bloor,1130,"[2 Bloor St East (Yonge + Bloor), Toronto ON, ...","[{'label': 'display', 'lat': 43.67082400553553...",43.670824,-79.384675,ON,4b5f24fcf964a520e0a829e3


Let's plot the map to check where are the restaurants

In [24]:
venues_map = folium.Map(location=[lat, long], zoom_start=15) # generate map centred around Downtown Toronto

# add a red circle marker to represent Downtown Toronto
folium.features.CircleMarker(
    location = [lat, long],
    radius=10,
    color='red',
    popup='Downtown Toronto',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(italian_filtered.lat, italian_filtered.lng, italian_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

5 - Choosing the best restaurant.

Let's check the ratings of both restaurants to see which one is better.

In [25]:
venue_id = italian_filtered['id'][0] # ID of Italian vegetarian magic
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)

result = requests.get(url).json()
try:
    print('The rating of Italian vegetarian magic is: ', result['response']['venue']['rating'])
except:
    print('This venue has not been rated yet.')
    
venue_id = italian_filtered['id'][1] # ID of Bacaro Italian Eatery
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)

result = requests.get(url).json()
try:
    print('The rating of Bacaro Italian Eatery is: ', result['response']['venue']['rating'])
except:
    print('This venue has not been rated yet.')

This venue has not been rated yet.
This venue has not been rated yet.


Both restaurants have not been rated yet, so let's check the first restaurant to see if it is a good option for vegeratians.

Fisrt, let's check the number of tips.

In [26]:
venue_id = italian_filtered['id'][0] # ID of Italian vegetarian magic
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)

result['response']['venue']['tips']['count']

0

No tips were found, so we can't be sure if it is a good restaurant since we don't have tips not rate.

Let's check the second restaurant.

In [27]:
venue_id = italian_filtered['id'][1] # ID of Bacaro Italian Eatery
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)

result = requests.get(url).json()
result['response']['venue']['tips']['count']

0

Both restaurants don't have tips nor rates, so you plan to visit one of them, make a comment in Foursquare.

6 - Clustering the venues from the neighbourhoods.

Loading the modules to performe the clusters.

In [28]:
from sklearn.cluster import KMeans 

Get 5 locations in each neighbourhood
Beggining with Downtown Toronto

In [29]:
LIMIT = 5
#Then we define the radious
radius = 1000

#Get the latitude and longitude of Downtown Toronto 
lat = toronto[toronto.Borough ==  'Downtown Toronto'].iloc[0]['Latitude']
long = toronto[toronto.Borough ==  'Downtown Toronto'].iloc[0]['Longitude']

#Finaly we set the url for our search
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, long, VERSION, radius, LIMIT)
result = requests.get(url).json()
result

{'meta': {'code': 200, 'requestId': '5d95470266fc65002ce7c279'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Rosedale',
  'headerFullLocation': 'Rosedale, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 26,
  'suggestedBounds': {'ne': {'lat': 43.68856260900001,
    'lng': -79.36510816548741},
   'sw': {'lat': 43.670562590999985, 'lng': -79.38995063451262}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4adcb343f964a520e32e21e3',
       'name': 'Summerhill Market',
       'location': {'address': '446 Summerhill Ave',
        'crossStreet': 'btwn. MacLennan Ave. and Glen Rd.',
        'lat': 43.68626482142425,
        'lng': -79.37545823237794,
      

Declaring a function to format the data into a dataframe

In [30]:
#definying function to locate and format the data
def locate_format(results):
    # assign relevant part of JSON to venues
    venues = results['response']['groups'][0]['items']
    formated_data = json_normalize(venues) # flatten JSON

    # filter columns
    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    formated_data_filtered = formated_data.loc[:, filtered_columns]

    # filter the category for each row
    formated_data_filtered['venue.categories'] = formated_data_filtered.apply(get_category_type, axis=1)

    # clean columns
    formated_data_filtered.columns = [col.split('.')[-1] for col in formated_data_filtered.columns]

    return formated_data_filtered

Using locate_format to get the first part of the information.

In [31]:
info1 = locate_format(result)
info1

Unnamed: 0,name,categories,lat,lng
0,Summerhill Market,Grocery Store,43.686265,-79.375458
1,Toronto Lawn Tennis Club,Athletics & Sports,43.680667,-79.388559
2,Black Camel,BBQ Joint,43.677016,-79.389367
3,Tinuno,Filipino Restaurant,43.671281,-79.37492
4,Craigleigh Gardens,Park,43.678099,-79.371586


Repeating the process for the other three neighbourhoods.

For Central Toronto...

In [32]:
#Get the latitude and longitude of Downtown Toronto 
lat = toronto[toronto.Borough ==  'Central Toronto'].iloc[0]['Latitude']
long = toronto[toronto.Borough ==  'Central Toronto'].iloc[0]['Longitude']

#Finaly we set the url for our search
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, long, VERSION, radius, LIMIT)
result = requests.get(url).json()
result

info2 = locate_format(result)
info2

Unnamed: 0,name,categories,lat,lng
0,Lawrence Park Ravine,Park,43.726963,-79.394382
1,Granite Club,Gym / Fitness Center,43.733043,-79.381986
2,Tim Hortons,Coffee Shop,43.727324,-79.379563
3,Glendon Bookstore,Bookstore,43.727024,-79.378976
4,Glendon Forest,Trail,43.727226,-79.378413


For East Toronto...

In [33]:
#Get the latitude and longitude of Downtown Toronto 
lat = toronto[toronto.Borough ==  'East Toronto'].iloc[0]['Latitude']
long = toronto[toronto.Borough ==  'East Toronto'].iloc[0]['Longitude']

#Finaly we set the url for our search
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, long, VERSION, radius, LIMIT)
result = requests.get(url).json()
result

info3 = locate_format(result)
info3

Unnamed: 0,name,categories,lat,lng
0,Tori's Bakeshop,Vegetarian / Vegan Restaurant,43.672114,-79.290331
1,The Beech Tree,Gastropub,43.680493,-79.288846
2,The Fox Theatre,Indie Movie Theater,43.672801,-79.287272
3,Ed's Real Scoop,Ice Cream Shop,43.67263,-79.287993
4,Glen Manor Ravine,Trail,43.676821,-79.293942


For West Toronto...

In [34]:
#Get the latitude and longitude of Downtown Toronto 
lat = toronto[toronto.Borough ==  'West Toronto'].iloc[0]['Latitude']
long = toronto[toronto.Borough ==  'West Toronto'].iloc[0]['Longitude']

#Finaly we set the url for our search
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, long, VERSION, radius, LIMIT)
result = requests.get(url).json()
result

info4 = locate_format(result)
info4

Unnamed: 0,name,categories,lat,lng
0,The Greater Good Bar,Bar,43.669409,-79.439267
1,Parallel,Middle Eastern Restaurant,43.669516,-79.438728
2,Happy Bakery & Pastries,Bakery,43.66705,-79.441791
3,Blood Brothers Brewing,Brewery,43.669944,-79.436533
4,Planet Fitness Toronto Galleria,Gym / Fitness Center,43.667588,-79.442574


Joining together all infos in a single dataframe.

In [35]:
info_complete = pd.concat([info1, info2, info3, info4], ignore_index=True)
info_complete

Unnamed: 0,name,categories,lat,lng
0,Summerhill Market,Grocery Store,43.686265,-79.375458
1,Toronto Lawn Tennis Club,Athletics & Sports,43.680667,-79.388559
2,Black Camel,BBQ Joint,43.677016,-79.389367
3,Tinuno,Filipino Restaurant,43.671281,-79.37492
4,Craigleigh Gardens,Park,43.678099,-79.371586
5,Lawrence Park Ravine,Park,43.726963,-79.394382
6,Granite Club,Gym / Fitness Center,43.733043,-79.381986
7,Tim Hortons,Coffee Shop,43.727324,-79.379563
8,Glendon Bookstore,Bookstore,43.727024,-79.378976
9,Glendon Forest,Trail,43.727226,-79.378413


Clustering the venues based on latitude and longitude. 

In [36]:
#Initializing k-means with six clusters
k_means = KMeans(init="k-means++", n_clusters=5, n_init=12)

#Grouping the location based on latitude and longitude
k_means.fit(info_complete[['lat','lng']])

#Get the label of the cluster
k_means_labels = k_means.labels_
k_means_labels

array([0, 4, 4, 0, 0, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
      dtype=int32)

Therefore, we can see the we have cluster numbered from one to five (six clusters as defined)

Lets add the cluster labels into the dataframe and order the venues.

In [37]:
info_complete['cluster'] = k_means_labels
info_complete

Unnamed: 0,name,categories,lat,lng,cluster
0,Summerhill Market,Grocery Store,43.686265,-79.375458,0
1,Toronto Lawn Tennis Club,Athletics & Sports,43.680667,-79.388559,4
2,Black Camel,BBQ Joint,43.677016,-79.389367,4
3,Tinuno,Filipino Restaurant,43.671281,-79.37492,0
4,Craigleigh Gardens,Park,43.678099,-79.371586,0
5,Lawrence Park Ravine,Park,43.726963,-79.394382,3
6,Granite Club,Gym / Fitness Center,43.733043,-79.381986,3
7,Tim Hortons,Coffee Shop,43.727324,-79.379563,3
8,Glendon Bookstore,Bookstore,43.727024,-79.378976,3
9,Glendon Forest,Trail,43.727226,-79.378413,3


Let's plot the venues on a map to see how they clustered.

In [46]:
#Lets center the map in Downtown Toronto
lat = toronto[toronto.Borough ==  'Downtown Toronto'].iloc[0]['Latitude']
long = toronto[toronto.Borough ==  'Downtown Toronto'].iloc[0]['Longitude']

venues_map = folium.Map(location=[lat, long], zoom_start=10) # generate map centred around Downtown Toronto

# add a red circle marker to represent Downtown Toronto
folium.features.CircleMarker(
    location = [lat, long],
    radius=10,
    color='red',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the venues as colorful circle markers and label it with its name
# each cluster will have its unique color so it can be identified
for lat, lng, label, cluster in zip(info_complete.lat, info_complete.lng, info_complete.name, info_complete.cluster):
    if cluster == 0:
        folium.features.CircleMarker(
            [lat, lng],
            radius=5,
            color='blue',
            fill = True,
            fill_color='blue',
            fill_opacity=0.6
        ).add_to(venues_map)
    elif cluster == 1:
        folium.features.CircleMarker(
            [lat, lng],
            radius=5,
            color='red',
            fill = True,
            fill_color='red',
            fill_opacity=0.6
        ).add_to(venues_map)
    elif cluster == 2:
        folium.features.CircleMarker(
            [lat, lng],
            radius=5,
            color='purple',
            fill = True,
            fill_color='purple',
            fill_opacity=0.6
        ).add_to(venues_map)
    elif cluster == 3:
        folium.features.CircleMarker(
            [lat, lng],
            radius=5,
            color='green',
            fill = True,
            fill_color='green',
            fill_opacity=0.6
        ).add_to(venues_map)
    else:
        folium.features.CircleMarker(
            [lat, lng],
            radius=5,
            color='black',
            fill = True,
            fill_color='black',
            fill_opacity=0.6
        ).add_to(venues_map)
        

# display map
display(venues_map)

As we can see, each neighbourhood of Toronto got its cluster, but Donwtown Toronto got two.