# Coursera Captstone Project: Battle of the Neighbourhoods

# Opening a coffee shop in Toronto

# Introduction 

Opening a coffee shop in a large city like Toronto can be challenging as there are many factors to consider, such as location, target consumer etc. This can be made especially if you are not familiar with the area.

We are a large chain of coffee shops looking to break into the Toronto market. We want to start by buying five coffee shops in the downtown area of Toronto. However, in order to decide where these first five shops should be placed we need information. Our strategy will be to find "hotspots" for coffee shops in Toronto. Our hypothesis is basic supply and demand, i.e. that areas with a lot of coffee consumers have a lot of coffee shops. By finding these areas with a cluster of coffee shops, we can determine where to open our first five locations in order to reach the maximum number of customers as quickly as possible.

The target audience for this is myself and anyone else who wants to open a coffee shop in an area known for coffee.

# Data

Using Folium and a list of Toronto neighbourhoods available from wikipedia, I will create a map of the Toronto neighbourhoods considered to be in the downtown area.

Foursquare provides location information on venues, including coffeeshops. I will use the venues found from frousquare and produce a clustering analysis to determine "hotspots" for coffee shops.

# Toronto downtown Neighbourhoods

The first step is to import the relevant libraries

In [15]:
#import relevant libraries

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
!pip3 install lxml
print('Libraries imported.')

ModuleNotFoundError: No module named 'geopy'

In [None]:
#Scrape information for Toronto neighbourhoods
df=pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")[0]
df.head()

In [None]:
#downtown boroughs only
df_cleanDT = df[df['Borough'] == 'Downtown Toronto']
df_cleanDT.head()

In [10]:
#import the coordinates
LongLat = pd.read_csv("https://cocl.us/Geospatial_data")

full_dfDT = pd.merge(df_cleanDT, LongLat, on = 'Postal Code')
full_dfDT.head()

NameError: name 'df_cleanDT' is not defined

In [None]:
#Get coordinates for Toronto
address = 'Toronto, ON'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

In [None]:

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, borough, neighborhood in zip(full_dfDT['Latitude'], full_dfDT['Longitude'], full_dfDT['Borough'], full_dfDT['Neighbourhood']):
    label = '{}, {}'.format(borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [None]:
##Foursquare location data

In [None]:
#import relevant libraries
import requests # library to handle requests

import random # library for random number generation

!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

print('Folium installed')
print('Libraries imported.')

In [None]:
# define Foursquare credentials
CLIENT_ID = '*************************' # your Foursquare ID
CLIENT_SECRET = '*************************' # your Foursquare Secret
ACCESS_TOKEN = '' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 50

In [None]:
#define the area to search in
address = '190 Yonge st, Toronto, Canada'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

In [None]:
#search for results
search_query = 'coffee'
radius = 1000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()

# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

In [None]:

dataframe_filtered.head()

In [None]:
##
Methodology

In [None]:
###Now that we have obtained the data we are going to use. The first step is to find out which three neighourhoods (or whatever makes sense) have the most shops to narrow down the search. Pick a spot and map out the venues.

The second step is to investigate in those areas, where the shops are located and create a heat map.

Finally, we will use Kmeans clustering to create clusters and determine 3 cluster centres to find possible locations and see how it overlaps with the heatmap

In [None]:
# get venues in all of the neighbourhoods
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:

toronto_venues = getNearbyVenues(names=full_dfDT['Neighbourhood'],
                                   latitudes=full_dfDT['Latitude'],
                                   longitudes=full_dfDT['Longitude']
)

In [None]:

print(toronto_venues.shape)
toronto_venues.head()

In [None]:

# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

#Take out just the venues we are interested in
Cafes_Coffee = toronto_onehot[['Neighborhood', 'Coffee Shop', 'Donut Shop']]
#sum up the donut and coffee shops
Cafes_Coffee['Total']= Cafes_Coffee.iloc[:, -2:-1].sum(axis=1)
#group by neighbourhood and sort values
toronto_grouped = Cafes_Coffee.groupby('Neighborhood').mean().reset_index()
toronto_grouped.sort_values(by=['Total'], ascending=False, inplace = True)
toronto_grouped

In [None]:

coffeeshops_top5 = toronto_grouped.head()
coffeeshops_top5.rename(columns = {'Neighborhood' : 'Neighbourhood'}, inplace = True)
top5_loc = pd.merge(coffeeshops_top5, full_dfDT, on = 'Neighbourhood')
top5_loc.insert(0, "Rank", [1, 2, 3, 4, 5],  True)
top5_loc

In [None]:
# create map of Toronto using latitude and longitude values
map_toronto2 = folium.Map(location=[latitude, longitude], zoom_start=15)

centreLon = top5_loc['Longitude'].mean()
centreLat = top5_loc['Latitude'].mean()
# add a red circle marker to represent the centre of the places
folium.CircleMarker(
    [centreLat, centreLon],
    radius=10,
    color='red',
    popup='Centre point',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(map_toronto2)

# add markers to map
for lat, lng, borough, neighborhood in zip(top5_loc['Latitude'], top5_loc['Longitude'], top5_loc['Rank'], top5_loc['Neighbourhood']):
    label = '{}, {}'.format(borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto2)  
    
map_toronto2

In [None]:

#get a list of the longitude and latitude for the venues
venues_loc = dataframe_filtered[['lat', 'lng']]

#map out the venues
venues_map = folium.Map(location=[latitude, longitude], zoom_start=14) # generate map centred around the "Downtown address"

# add a red circle marker to represent the Center point
folium.CircleMarker(
    [centreLat, centreLon],
    radius=10,
    color='red',
    popup='Centre point',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the coffee shops as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

In [None]:
import folium
from folium import plugins
from folium.plugins import HeatMap
venues_heatmap = folium.Map(location=[latitude, longitude], zoom_start=14) # generate map centred around the "Downtown address"

locs = venues_loc.to_numpy()
HeatMap(locs).add_to(venues_heatmap)
venues_heatmap

In [None]:
#!pip install pyproj
import matplotlib.pyplot as plt

import pyproj
import math

X = venues_loc['lat']
y =  venues_loc['lng']

k_means = KMeans(init = "k-means++", n_clusters = 5, n_init = 12)
k_means.fit(venues_loc)
k_means_labels = k_means.labels_
k_means_labels
k_means_cluster_centers = k_means.cluster_centers_

cluster_centers2 = pd.DataFrame.from_records(k_means_cluster_centers)
cluster_centers3 = cluster_centers2.rename(columns={ 0 : "lat", 1 : "lng"})
cluster_centers3

In [11]:

venues_map2 = folium.Map(location=[latitude, longitude], zoom_start=14) # generate map centred around the "Downtown address"

for lat, lng in zip(cluster_centers3.lat, cluster_centers3.lng):
    folium.CircleMarker(
        [lat, lng], 
        radius=10, 
        color='red', 
        fill=True, 
        fill_opacity=0.6
      ).add_to(venues_map2) 

for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.3
    ).add_to(venues_map2)

HeatMap(locs).add_to(venues_map2)
venues_map2

NameError: name 'folium' is not defined

In [12]:
## Results and Conclusions

In [13]:
### We first determined that the five neighbourhoods with the most coffee shops were Queen's Park, Central Bay Street, Regent Park, TD centre and First Canadian Place. We then found the centre point of those areas and used the foursquare API to pull venues from that area. By plotting the venues on a heatmap, we were able to see some potential locations. To go about this a bit more analytically, we used K means clustering to find five cluster centres to narrow down where to open our coffee shops. We found that they should be located around 110 Lombard St, the Eaton Centre, King St and Bay St, 380 University Ave and 760 Bay St. Unsuprisingly, the cluster centres were close to the heat map hotspots.

### Our goal was to find five potential locations for our new coffee shops. In order to identify these locations, we found areas that already had a large number of coffee shops as these should also be the most popular areas for coffee consumers. By mapping out locations and using clustering analysis we have identified our five potential locations.