<a href="https://colab.research.google.com/github/louiswillems/IBM_Capston_Project/blob/master/FoursquarevsGoogle.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Comparing Information of Nearby venues from Neighborhoods in Montreal - Foursquare API vs. Google API



For this final assignment, we will  explore and cluster the neighborhoods in Montréal.
We will compare the data from Foursquare API with data of nearby venues from Google Maps API. For that, we will analyse  both datasets.

## Table of Contents

<ol>

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

<li> <a href="#item1">Download Datasets</a></li>
<li> <a href="#item2">Coordinates from Google Maps Geocoding API</a></li>
<li> <a href="#item3">Merging Datasets</a></li> 
 <br>
 <li> <a href="#item4"> Venues for all Neighborhoods (Foursquare API)</a></li> 
<li> <a href="#item5"> Analyze, Cluster and Mapping Neighborhoods (Foursquare API)</a></li>  
<li> <a href="#item6"> Venues for all Neighborhoods (Google Places API)</a></li> 
<li> <a href="#item7"> Analyze, Cluster and Mapping Neighborhoods (Google Places API)</a></li>  

  <li> <a href="#item8"> Conclusion</a></li>  
</font>
</div>
</ol>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [0]:
!pip install beautifulsoup4
!pip install lxml
!pip install requests
!pip install folium==0.5.0
!pip install geopy

## 1. Download and Clean Montreal postal codes data (Wikipedia)

We will build a code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_H, in order to obtain the data that is in the table of postal codes in Montréal and to transform the data into a pandas dataframe.

In [2]:
from bs4 import BeautifulSoup
import numpy as np
import lxml

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium


source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_H').text
soup = BeautifulSoup(source)


table = soup.find('table')
table_rows = table.find_all('tr')

res = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        res.append(row)

## Data Cleaninng: We will clean our data extract from Wikipedia

df = pd.DataFrame(res, columns=["column1", "column2", "column3", "column4", "column5", "column6", "column7", "column8", "column9"])

df = df.replace(r'^H\d+[A-Z](Not) assigned$', np.nan, regex=True)

df1 = df['column1'].str.split('(H\d+[A-Z])(.*)', expand=True)
df1 = df1.loc[:,[1,2]]
df1.rename(columns={1:'Postcode', 2:'Neighborhood'}, inplace=True)


df2 = df['column2'].str.split('(H\d+[A-Z])(.*)('')', expand=True)
df2 = df2.loc[:,[1,2]]
df2.rename(columns={1:'Postcode', 2:'Neighborhood'}, inplace=True)


df3 = df['column3'].str.split('(H\d+[A-Z])(.*)('')', expand=True)
df3 = df3.loc[:,[1,2]]
df3.rename(columns={1:'Postcode', 2:'Neighborhood'}, inplace=True)


df4 = df['column4'].str.split('(H\d+[A-Z])(.*)('')', expand=True)
df4 = df4.loc[:,[1,2]]
df4.rename(columns={1:'Postcode', 2:'Neighborhood'}, inplace=True)
df4

df5 = df['column5'].str.split('(H\d+[A-Z])(.*)('')', expand=True)
df5 = df5.loc[:,[1,2]]
df5.rename(columns={1:'Postcode', 2:'Neighborhood'}, inplace=True)

df6 = df['column6'].str.split('(H\d+[A-Z])(.*)('')', expand=True)
df6 = df6.loc[:,[1,2]]
df6.rename(columns={1:'Postcode', 2:'Neighborhood'}, inplace=True)

df7 = df['column7'].str.split('(H\d+[A-Z])(.*)('')', expand=True)
df7 = df7.loc[:,[1,2]]
df7.rename(columns={1:'Postcode', 2:'Neighborhood'}, inplace=True)


df8 = df['column8'].str.split('(H\d+[A-Z])(.*)('')', expand=True)
df8 = df8.loc[:,[1,2]]
df8.rename(columns={1:'Postcode', 2:'Neighborhood'}, inplace=True)


df9 = df['column9'].str.split('(H\d+[A-Z])(.*)('')', expand=True)
df9 = df9.loc[:,[1,2]]
df9.rename(columns={1:'Postcode', 2:'Neighborhood'}, inplace=True)
df9


frames = [df1, df2, df3, df4, df5, df6, df7, df8, df9]

result = pd.concat(frames)
df_final = result.dropna(axis=0).reset_index(drop = True)

# remove non-informative data
df_final = df_final[df_final.Postcode != 'H0P']
df_final = df_final[df_final.Postcode != 'H0M']
df_final = df_final[df_final.Postcode != 'H0H']
df_final.head(10)

Unnamed: 0,Postcode,Neighborhood
3,H1A,Pointe-aux-Trembles
4,H1B,Montreal East
5,H1C,Rivière-des-PrairiesNortheast
6,H1E,Rivière-des-PrairiesSouthwest
7,H1G,Montréal-NordNorth
8,H1H,Montréal-NordSouth
9,H1J,AnjouWest
10,H1K,AnjouEast
11,H1L,MercierNorth
12,H1M,MercierWest


## 2. Coordinates from Google Maps Geocoding API

Now, we need to get the latitude and the longitude coordinates of each postal code in Montréal from Google Maps Geocoding API

In [0]:
def getGoogleMapCoord(postcode):

    api_key = '<<HERE YOUR API>>'
    key = '&key={}'.format(api_key)

    venues_list=[]
    for postcode in postcode:
#         print(postcode)

        # make the GET request
        results = requests.get('https://maps.googleapis.com/maps/api/geocode/json?address={},Montreal,QC,Canada'.format(postcode)+key).json()
        final = results['results']
        venues_list.append([(
            postcode,
            v['geometry']['location']['lat'],
            v['geometry']['location']['lng']) for v in final])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postcode','Lat', 'Long']

    return nearby_venues

In [4]:
geo = getGoogleMapCoord(postcode=df_final['Postcode'])
geo.head()

Unnamed: 0,Postcode,Lat,Long
0,H1A,45.676622,-73.509825
1,H1B,45.633696,-73.51504
2,H1C,45.66242,-73.545167
3,H1E,45.639276,-73.585731
4,H1G,45.616126,-73.626227


## 3. Merging Wikipedia data & Coordinates of Montréal

We will now join our data from Wikipedia (df_final ) and our coordinates in Montréal (geo)


In [5]:
montreal_merged = pd.merge(df_final, geo, how='left', left_on='Postcode', right_on='Postcode')
montreal_merged.head()

Unnamed: 0,Postcode,Neighborhood,Lat,Long
0,H1A,Pointe-aux-Trembles,45.676622,-73.509825
1,H1B,Montreal East,45.633696,-73.51504
2,H1C,Rivière-des-PrairiesNortheast,45.66242,-73.545167
3,H1E,Rivière-des-PrairiesSouthwest,45.639276,-73.585731
4,H1G,Montréal-NordNorth,45.616126,-73.626227


Next, we use geopy library to get the latitude and longitude values of Montréal

In [6]:
address = 'Montreal, QC'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Montreal {}, {}.'.format(latitude, longitude))



The geograpical coordinate of Montreal 45.4972159, -73.6103642.


## 4. Venues for all Neighborhoods in Montréal with the Foursquare API

Next, we are going to start utilizing the Foursquare API to get venues for all neighborhoods in Montréal.


In [7]:
CLIENT_ID = 'QLNA20MZ2FEHSIRL5FQVZAJG3OCI4ZVMLKN0UCXQPLXQBK5Y'
CLIENT_SECRET = '1CJQ3PRC1NIODJERJE1P52BYZF0LP0ND2GJXZ1KLJZ00H41E'
VERSION = '20181005' # Foursquare API version

LIMIT = 500 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QLNA20MZ2FEHSIRL5FQVZAJG3OCI4ZVMLKN0UCXQPLXQBK5Y
CLIENT_SECRET:1CJQ3PRC1NIODJERJE1P52BYZF0LP0ND2GJXZ1KLJZ00H41E


In [0]:
def getNearbyVenuesFoursquare(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [0]:
montreal_foursquare = getNearbyVenuesFoursquare(names=montreal_merged['Neighborhood'],
                                   latitudes=montreal_merged['Lat'],
                                   longitudes=montreal_merged['Long']
                                  )

In [12]:
print(montreal_foursquare.shape)
montreal_foursquare.head()

(1987, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Pointe-aux-Trembles,45.676622,-73.509825,AMT Gare Pointe-aux-Trembles,45.674882,-73.504908,Train Station
1,Montreal East,45.633696,-73.51504,Resto-bar Larry,45.634221,-73.516023,Restaurant
2,Montreal East,45.633696,-73.51504,Service 3r Valorisation,45.633998,-73.518851,Home Service
3,Montreal East,45.633696,-73.51504,Envirotech Services Industriels Inc,45.631157,-73.516678,Home Service
4,Montreal East,45.633696,-73.51504,Sonav Inc,45.632562,-73.510142,Electronics Store


## 5. Analyze, Cluster and Mapping Neighborhoods with Foursquare API

Run *k*-means to cluster the neighborhood into 5 clusters. 


In [13]:
# one hot encoding
montreal_onehot = pd.get_dummies(montreal_foursquare[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
montreal_onehot['Neighborhood'] = montreal_foursquare['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [montreal_onehot.columns[-1]] + list(montreal_onehot.columns[:-1])
montreal_onehot = montreal_onehot[fixed_columns]


# Merging
montreal_grouped = montreal_onehot.groupby('Neighborhood').mean().reset_index()
montreal_final = pd.merge(montreal_merged, montreal_grouped, how='left', left_on='Neighborhood', right_on='Neighborhood')
montreal_final.dropna(axis=0, inplace=True)


def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]
  
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = montreal_grouped['Neighborhood']

for ind in np.arange(montreal_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(montreal_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Laval-sur-le-Lac,Steakhouse,Golf Course,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Event Service,English Restaurant,Empanada Restaurant
1,AhuntsicCentral,Bakery,Café,Coffee Shop,Shop & Service,Breakfast Spot,Pharmacy,Bistro,Cheese Shop,Sushi Restaurant,Bar
2,AhuntsicEast,Park,Gym,Playground,Athletics & Sports,Dive Bar,Dog Run,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store
3,AhuntsicNorth,Pharmacy,Pet Store,Bar,Yoga Studio,Discount Store,Falafel Restaurant,Event Space,Event Service,English Restaurant,Empanada Restaurant
4,AhuntsicSoutheast,Italian Restaurant,Park,Train Station,Sandwich Place,Restaurant,Clothing Store,Grocery Store,Bank,Pharmacy,Women's Store


In [0]:
# set number of clusters
kclusters = 3

montreal_grouped_clustering = montreal_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(init = "k-means++",n_clusters=kclusters,n_init = 12, random_state=0).fit(montreal_grouped_clustering)


montreal_final['Cluster Labels'] = kmeans.labels_

montreal_final2 = montreal_final[['Postcode', 'Neighborhood', 'Lat', 'Long', 'Cluster Labels']]

# montreal_merged = montreal_final.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
montreal_merged_f = pd.merge(montreal_final2, neighborhoods_venues_sorted, left_on='Neighborhood', right_on='Neighborhood')




Let's plot our clusters with Folium

In [15]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(montreal_final['Lat'], montreal_final['Long'], montreal_final['Neighborhood'], montreal_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 6. Venues for all Neighborhoods in Montréal with the Google Maps Places API

Next, we are going to start utilizing the Google Maps Place API to get venues for all neighborhoods in Montréal.

In [16]:
montreal_merge = pd.merge(df_final, geo, how='left', left_on='Postcode', right_on='Postcode')
montreal_merge.head()

Unnamed: 0,Postcode,Neighborhood,Lat,Long
0,H1A,Pointe-aux-Trembles,45.676622,-73.509825
1,H1B,Montreal East,45.633696,-73.51504
2,H1C,Rivière-des-PrairiesNortheast,45.66242,-73.545167
3,H1E,Rivière-des-PrairiesSouthwest,45.639276,-73.585731
4,H1G,Montréal-NordNorth,45.616126,-73.626227


In [0]:
def getGoogleMapTypelocation(neigh,lat,lng):

    api_key = '<<>HERE YOUR API>'
    key = '&key={}'.format(api_key)

    venues_list=[]
    for neigh, lat, lng in zip(neigh, lat, lng):
     #print(postcode)

        results = requests.get('https://maps.googleapis.com/maps/api/place/nearbysearch/json?radius=500&location={},{}'.format(lat,lng)+key).json()
        final = results['results']
        venues_list.append([(
            neigh,
            lat,
            lng,
            v['types'][0]) for v in final])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood','Lat', 'Long', 'Types']

    return nearby_venues

In [0]:
montreal_GAPI = getGoogleMapTypelocation(neigh=montreal_merge['Neighborhood'],
                                   lat=montreal_merge['Lat'],
                                   lng=montreal_merge['Long']
                                  )

## 7. Analyze, Cluster and Mapping Neighborhoods with Google Places  API

Run *k*-means to cluster the neighborhood into 5 clusters. 

In [20]:
# one hot encoding
montreal_onehot = pd.get_dummies(montreal_GAPI[['Types']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
montreal_onehot['Neighborhood'] = montreal_GAPI['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [montreal_onehot.columns[-1]] + list(montreal_onehot.columns[:-1])
montreal_onehot = montreal_onehot[fixed_columns]

montreal_grouped = montreal_onehot.groupby('Neighborhood').mean().reset_index()
montreal_final3 = pd.merge(montreal_merge, montreal_grouped, how='left', left_on='Neighborhood', right_on='Neighborhood')
montreal_final3.dropna(axis=0, inplace=True)
print(montreal_final3.shape)
montreal_final3.head()

(121, 94)


Unnamed: 0,Postcode,Neighborhood,Lat,Long,accounting,airport,art_gallery,atm,bakery,bank,bar,beauty_salon,bicycle_store,book_store,bus_station,cafe,car_dealer,car_rental,car_repair,car_wash,cemetery,church,city_hall,clothing_store,convenience_store,courthouse,dentist,department_store,doctor,electrician,electronics_store,finance,fire_station,florist,funeral_home,furniture_store,gas_station,general_contractor,grocery_or_supermarket,gym,hair_care,hardware_store,health,home_goods_store,hospital,insurance_agency,jewelry_store,laundry,lawyer,library,liquor_store,local_government_office,locality,locksmith,lodging,meal_delivery,meal_takeaway,mosque,movie_theater,moving_company,museum,neighborhood,night_club,painter,park,parking,pet_store,pharmacy,physiotherapist,place_of_worship,plumber,point_of_interest,police,post_office,premise,real_estate_agency,restaurant,roofing_contractor,school,shoe_store,shopping_mall,spa,stadium,storage,store,sublocality_level_1,supermarket,synagogue,train_station,transit_station,travel_agency,university,veterinary_care,zoo
0,H1A,Pointe-aux-Trembles,45.676622,-73.509825,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.272727,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,H1B,Montreal East,45.633696,-73.51504,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,H1C,Rivière-des-PrairiesNortheast,45.66242,-73.545167,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.7,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,H1E,Rivière-des-PrairiesSouthwest,45.639276,-73.585731,0.0,0.0,0.0,0.0,0.05,0.1,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.35,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,H1G,Montréal-NordNorth,45.616126,-73.626227,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.05,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.05,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]
  
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted1 = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted1['Neighborhood'] = montreal_grouped['Neighborhood']

for ind in np.arange(montreal_grouped.shape[0]):
    neighborhoods_venues_sorted1.iloc[ind, 1:] = return_most_common_venues(montreal_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted1.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Laval-sur-le-Lac,point_of_interest,health,doctor,insurance_agency,store,neighborhood,restaurant,spa,locality,pharmacy
1,AhuntsicCentral,restaurant,beauty_salon,florist,physiotherapist,atm,bakery,sublocality_level_1,gym,locality,point_of_interest
2,AhuntsicEast,point_of_interest,gym,school,dentist,health,locality,sublocality_level_1,store,stadium,museum
3,AhuntsicNorth,restaurant,health,point_of_interest,pharmacy,sublocality_level_1,locality,general_contractor,beauty_salon,lodging,hospital
4,AhuntsicSoutheast,point_of_interest,clothing_store,home_goods_store,travel_agency,bank,sublocality_level_1,locality,finance,restaurant,zoo


In [0]:
# set number of clusters
kclusters = 3

montreal_grouped_clustering1 = montreal_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(init = "k-means++",n_clusters=kclusters,n_init = 12, random_state=0).fit(montreal_grouped_clustering1)

montreal_final3['Cluster Labels'] = kmeans.labels_

montreal_final4 = montreal_final3[['Postcode', 'Neighborhood', 'Lat', 'Long', 'Cluster Labels']]

# montreal_merged = montreal_final.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
montreal_merge_g = pd.merge(montreal_final4, neighborhoods_venues_sorted1, left_on='Neighborhood', right_on='Neighborhood')


Let's plot our clusters with Folium

In [23]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(montreal_final['Lat'], montreal_final3['Long'], montreal_final3['Neighborhood'], montreal_final3['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 8. Conclusion

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster from each API.


### Google Maps Places API

Data from Cluster 0

In [24]:
montreal_merge_g.loc[montreal_merge_g['Cluster Labels'] == 0, montreal_merge_g.columns[[1] + list(range(5, montreal_merge_g.shape[1]))]].head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Rivière-des-PrairiesNortheast,point_of_interest,electrician,locality,painter,sublocality_level_1,electronics_store,school,zoo,florist,department_store
4,Montréal-NordNorth,doctor,point_of_interest,health,store,restaurant,pharmacy,physiotherapist,park,dentist,department_store
6,AnjouWest,point_of_interest,store,general_contractor,clothing_store,sublocality_level_1,locality,furniture_store,laundry,bar,art_gallery
7,AnjouEast,point_of_interest,health,accounting,car_repair,electronics_store,plumber,locality,sublocality_level_1,finance,bicycle_store
11,Saint-LéonardNorth,point_of_interest,home_goods_store,bank,furniture_store,mosque,roofing_contractor,locality,sublocality_level_1,electronics_store,general_contractor


 And

### Foursquare API

Data from Cluster 0

In [31]:
montreal_merged_f.loc[montreal_merged_f['Cluster Labels'] == 0, montreal_merged_f.columns[[1] + list(range(5, montreal_merged_f.shape[1]))]].head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Montreal East,Home Service,Rental Car Location,Dim Sum Restaurant,Falafel Restaurant,Factory,Event Space,English Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
1,Rivière-des-PrairiesSouthwest,Discount Store,Italian Restaurant,Bakery,Pharmacy,Dive Bar,Dog Run,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Yoga Studio
3,Montréal-NordSouth,Fast Food Restaurant,Ice Cream Shop,Sandwich Place,Factory,Event Space,English Restaurant,Falafel Restaurant,Empanada Restaurant,Dim Sum Restaurant,Eastern European Restaurant
4,AnjouEast,Soccer Field,Yoga Studio,Diner,Falafel Restaurant,Factory,Event Space,English Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
5,MercierNorth,Pharmacy,Restaurant,Vietnamese Restaurant,Supermarket,Baseball Field,Grocery Store,Yoga Studio,Dumpling Restaurant,Discount Store,Dive Bar


### Comparing Information from ANJOU EAST: Foursquare vs. Google Places API

Foursquare API

In [25]:
montreal_merged_f.loc[montreal_merged_f['Neighborhood'] == 'AnjouEast', montreal_merged_f.columns[[1] + list(range(5, montreal_merged_f.shape[1]))]].head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,AnjouEast,Park,Convenience Store,Soccer Field,American Restaurant,Yoga Studio,Electronics Store,Dog Run,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant


Google Places API:

In [27]:
montreal_merge_g.loc[montreal_merge_g['Neighborhood'] == 'AnjouEast', montreal_merge_g.columns[[1] + list(range(5, montreal_merge_g.shape[1]))]].head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,AnjouEast,point_of_interest,health,accounting,car_repair,electronics_store,plumber,locality,sublocality_level_1,finance,bicycle_store


##Conclusion:

The answer of what API to use is difficult. Both apps have pros and cons. <br>
Google places it is an automatic pop up from Google and Google Maps. It is informational, but not precise. For example, if we are looking for the most relevant pharmacy we can take advantage of Google Places. The branch of Google will list the surrounding offices along with ratings, comments, and contact information but the categories for the APi are not really accurate.

In the other hand, Foursquare has not the reach of Google. It is a tailored app for a more demanding audience. But, the categories for all venues are really clear and can be easelly understood. 

Foursquare and Google Places are two different tools for different people.