**1. Introduction**

Cafes are around the streetcorners of all large cities and have become a symbol of urban life. But if someone is looking to open a cafe in, for example, Toronto, where should they open it? Basic economic theory maintains that supply and demand are the determinants of price. According to this logic, at least two factors need to be taken into consideration when deciding where to open a cafe: the number of cafes in a neighborhood (supply) and the number of residents in the neighborhood (demand). Ideally, the new cafe should open in a neighborhood which has few cafes but a large number of residents.

**2. Data**

With Foursquare location data, one can easily obtain the number of cafes in each Toronto neighborhood. One could also find data online on the number of residents in each neighborhood in Toronto. One can then calculate the cafe/capita index for each neighborhood and decide to open a new cafe in the neighborhood with the lowest cafe per capita. 

**3. Methodology**

First, I will use BeautifulSoup package to scrape the geolocation data of Toronto's neighborhoods and transform the data.

In [1]:
from bs4 import BeautifulSoup
import requests
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(source, 'lxml')

#search for the table
table = soup.find('table')

#create a csv file that contains the table 
import csv
output_rows = []
for table_row in table.find_all('tr'):
    columns = table_row.find_all('td')
    output_row = []
    for column in columns:
        output_row.append(column.text)
    output_rows.append(output_row)
with open('toronto.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(output_rows)
import pandas as pd
toronto = pd.read_csv('toronto.csv', header = None)

#transform the csv file into a pandas dataframe and add headers 
toronto_df = pd.DataFrame(data = toronto)
toronto_df.columns = ['PostalCode', 'Borough', 'Neighborhood']

#drop Borough = 'Not assigned'
toronto_df = toronto_df[toronto_df.Borough != 'Not assigned']

#if a row has "Not assigned" for neighborhood, use borough for neighborhood
toronto_df['Neighborhood'] = toronto_df['Neighborhood'].replace('Not assigned', toronto_df['Borough'])

#strip the \n in the neighborhood column
toronto_df['Neighborhood'] = toronto_df['Neighborhood'].replace('\n','', regex=True)

toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


Add the longitude and latitude coordinates of each neighborhood to this dataset. The coordinates are available in a dataset: http://cocl.us/Geospatial_data.

In [3]:
#get the coordinates 
coords = pd.read_csv('http://cocl.us/Geospatial_data')
coords = pd.DataFrame(data = coords)

#set index for merging
coords.rename(columns = {'Postal Code': 'PostalCode'}, inplace = True)
coords.set_index('PostalCode')

#merge the two dataframes
toronto_df.set_index("PostalCode")
toronto_new = pd.merge(toronto_df, coords)
toronto_new

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636
3,M6A,North York,Lawrence Heights,43.718518,-79.464763
4,M6A,North York,Lawrence Manor,43.718518,-79.464763
...,...,...,...,...,...
205,M8Z,Etobicoke,Kingsway Park South West,43.628841,-79.520999
206,M8Z,Etobicoke,Mimico NW,43.628841,-79.520999
207,M8Z,Etobicoke,The Queensway West,43.628841,-79.520999
208,M8Z,Etobicoke,Royal York South West,43.628841,-79.520999


Now, I will scrape the population data from the Wikipedia page: https://en.wikipedia.org/wiki/Demographics_of_Toronto_neighbourhoods. <br/> Note, however, this data is from 2013 and might be dated.

In [19]:
source2 = requests.get('https://en.wikipedia.org/wiki/Demographics_of_Toronto_neighbourhoods').text
soup2 = BeautifulSoup(source2, 'lxml')
print(soup2.prettify())


<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Demographics of Toronto neighbourhoods - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRequestId":"Xia1CQpAICwAAEzjC44AAADI","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Demographics_of_Toronto_neighbourhoods","wgTitle":"Demographics of Toronto neighbourhoods","wgCurRevisionId":912053942,"wgRevisionId":912053942,"wgArticleId":38958143,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCate

In [21]:
#search for the table
table2 = soup2.find(cellpadding = '5')
print(table2)

<table border="1" cellpadding="5" cellspacing="0" class="wikitable sortable">
<tbody><tr>
<th width="20%">Name
</th>
<th width="5%">FM
</th>
<th width="20%">Census Tracts
</th>
<th width="5%">Population
</th>
<th width="5%">Land area (km2)
</th>
<th width="5%">Density (people/km2)
</th>
<th width="5%">% Change in Population since 2001
</th>
<th width="5%">Average Income
</th>
<th width="5%">Transit Commuting %
</th>
<th width="5%">% Renters
</th>
<th width="10%">Second most common language (after English) by name
</th>
<th width="10%">Second most common language (after English) by percentage
</th>
<th width="10%">Map
</th></tr>
<tr>
<td><b>Toronto <a class="mw-redirect" href="/wiki/Census_metropolitan_area" title="Census metropolitan area">CMA</a> Average</b>
</td>
<td>
</td>
<td>All
</td>
<td><b>5,113,149</b>
</td>
<td><b>5903.63</b>
</td>
<td><b>866</b>
</td>
<td><b>9.0</b>
</td>
<td><b>40,704</b>
</td>
<td><b>10.6</b>
</td>
<td><b>11.4</b>
</td>
<td>
</td>
<td>
</td>
<td>
</td></tr>

In [27]:
#create a csv file that contains the table 
output_rows = []
for table_row in table2.find_all('tr'):
    columns = table_row.find_all('td')
    output_row = []
    for column in columns:
        output_row.append(column.text)
    output_rows.append(output_row)
    
with open('population.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(output_rows)
    
population = pd.read_csv('population.csv', header = None)


In [32]:
#transform the csv file into a pandas dataframe and add headers 
population_df = pd.DataFrame(data = population)
population_df.columns = ['Neighborhood', 'FM', 'Census Tracts', 'Population', 'Land Area',
                        'Density', '%Change in Population', 'Average Income', 
                         'Transit Commuting%', '%Renters', 'SMCL', 'SMCL%', 'Map']

#drop the useless columns 
population_df.drop(columns = ['FM', 'Census Tracts','Land Area',
                        'Density', '%Change in Population', 'Average Income', 
                         'Transit Commuting%', '%Renters', 'SMCL', 'SMCL%', 'Map'],
                  axis = 1, inplace = True)

#strip the \n in both columns
population_df['Neighborhood'] = population_df['Neighborhood'].replace('\n','', regex = True)
population_df['Population'] = population_df['Population'].replace('\n','', regex = True)
population_df

Unnamed: 0,Neighborhood,Population
0,Toronto CMA Average,5113149
1,Agincourt,44577
2,Alderwood,11656
3,Alexandra Park,4355
4,Allenby,2513
...,...,...
170,Woburn,48507
171,Wychwood,4182
172,York Mills,17564
173,York University Heights,26140


Combine the population data and the geolocation data. Many neighborhoods don't have corresponding population numbers and are dropped from the merged dataset. 

In [33]:
#set index for merging
toronto_new.set_index('Neighborhood')
population_df.set_index('Neighborhood')

#merge the two dataframes
merged_df = pd.merge(toronto_new, population_df)
merged_df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Population
0,M3A,North York,Parkwoods,43.753259,-79.329656,26533
1,M4A,North York,Victoria Village,43.725882,-79.315572,17047
2,M6A,North York,Lawrence Heights,43.718518,-79.464763,3769
3,M6A,North York,Lawrence Manor,43.718518,-79.464763,13750
4,M1B,Scarborough,Rouge,43.806686,-79.194353,22724
...,...,...,...,...,...,...
75,M8W,Etobicoke,Long Branch,43.602414,-79.543484,9625
76,M4X,Downtown Toronto,Cabbagetown,43.667967,-79.367675,11120
77,M8X,Etobicoke,The Kingsway,43.653654,-79.506944,8780
78,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,13397


In [34]:
merged_df.dtypes

PostalCode       object
Borough          object
Neighborhood     object
Latitude        float64
Longitude       float64
Population       object
dtype: object

In [37]:
#transform population into integer 
merged_df['Population'] = merged_df['Population'].replace(',','', regex = True)
merged_df['Population'].astype(str).astype(int)
merged_df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Population
0,M3A,North York,Parkwoods,43.753259,-79.329656,26533
1,M4A,North York,Victoria Village,43.725882,-79.315572,17047
2,M6A,North York,Lawrence Heights,43.718518,-79.464763,3769
3,M6A,North York,Lawrence Manor,43.718518,-79.464763,13750
4,M1B,Scarborough,Rouge,43.806686,-79.194353,22724
...,...,...,...,...,...,...
75,M8W,Etobicoke,Long Branch,43.602414,-79.543484,9625
76,M4X,Downtown Toronto,Cabbagetown,43.667967,-79.367675,11120
77,M8X,Etobicoke,The Kingsway,43.653654,-79.506944,8780
78,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,13397


Load the libraries and get the location data of Toronto.

In [40]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [41]:
address = 'Toronto, ON, Canada'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Create a map of Toronto.

In [43]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(merged_df['Latitude'], merged_df['Longitude'], merged_df['Borough'], merged_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Next, I will use the Foursquare API to get the venues data for the Toronto neighborhoods. I will start by exploring one neighborhood.

In [46]:
CLIENT_ID = '1UXHNF3PPC4GZVBXCXFPAA5YHQ224BS1IY1KBOCWXEUDNBN4' # your Foursquare ID
CLIENT_SECRET = '3YKDJY4XQXM5HDOJABMOT0TJHR3M35OQGAPAAUS1UM4EQPWQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#just one neighborhood
nh_lat = merged_df.loc[0, 'Latitude']
nh_lng = merged_df.loc[0, 'Longitude']

nh_name = merged_df.loc[0, 'Neighborhood']

print('Latitude and longitude values of {} are {}, {}.'.format(nh_name, 
                                                               nh_lat, 
                                                               nh_lng))

#creating the get request 
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    nh_lat, 
    nh_lng, 
    radius, 
    LIMIT)

#send the get request
results = requests.get(url).json()

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

    venues = results['response']['groups'][0]['items']

venues = results['response']['groups'][0]['items']

nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114


Extend the analysis for all neighborhoods in the merged_df dataset.

In [60]:
#create a function that does the same for all neighborhoods in Toronto
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood_Latitude', 
                  'Neighborhood_Longitude', 
                  'Venue', 
                  'Venue_Latitude', 
                  'Venue_Longitude', 
                  'Venue_Category']
    return(nearby_venues)

#run the code and create a new dataframe 
toronto_venues = getNearbyVenues(names = merged_df['Neighborhood'],
                                   latitudes = merged_df['Latitude'],
                                   longitudes = merged_df['Longitude']
                                  )

Parkwoods
Victoria Village
Lawrence Heights
Lawrence Manor
Rouge
Malvern
Garden District
Princess Gardens
West Deane Park
Highland Creek
Rouge Hill
Port Union
Flemingdon Park
St. James Town
St. James Town
Eringate
Markland Wood
Guildwood
Morningside
West Hill
The Beaches
Woburn
Leaside
Bathurst Manor
Wilson Heights
Thorncliffe Park
Scarborough Village
Henry Farm
Toronto Islands
Little Portugal
Ionview
Bayview Village
Riverdale
Brockton
Clairlea
Oakridge
York Mills
Downsview
Humber Summit
Cliffcrest
Cliffside
Newtonbrook
Willowdale
Bedford Park
Mount Dennis
Silverthorn
Humberlea
Birch Cliff
Lawrence Park
Runnymede
Runnymede
Weston
Dorset Park
Westmount
Maryvale
Wexford
The Annex
Yorkville
Parkdale
Roncesvalles
Kingsview Village
Agincourt
Davisville
Swansea
Moore Park
Grange Park
Kensington Market
Milliken
Deer Park
South Hill
Humber Bay Shores
New Toronto
Thistletown
Rosedale
Alderwood
Long Branch
Cabbagetown
The Kingsway
Church and Wellesley
Sunnylea


In [61]:
#transform into a pandas dataframe
toronto_venues = pd.DataFrame(data = toronto_venues)
toronto_venues

Unnamed: 0,Neighborhood,Neighborhood_Latitude,Neighborhood_Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
5,Victoria Village,43.725882,-79.315572,The Frig,43.727051,-79.317418,French Restaurant
6,Victoria Village,43.725882,-79.315572,Pizza Nova,43.725824,-79.31286,Pizza Place
7,Lawrence Heights,43.718518,-79.464763,Roots,43.718221,-79.466776,Boutique
8,Lawrence Heights,43.718518,-79.464763,Kitchen Stuff Plus (Clearance Outlet),43.719096,-79.462675,Furniture / Home Store
9,Lawrence Heights,43.718518,-79.464763,Lac Vien Vietnamese Restaurant,43.721259,-79.468472,Vietnamese Restaurant


Let's check how many cafes there are in each neighborhood. Note that there are two categories for cafes: cafes and coffee shops.

In [84]:
toronto_cafes = toronto_venues.loc[toronto_venues['Venue_Category'].isin(['Coffee Shop', 
                                                          'Café'])]
neighborhood_cafes = toronto_cafes.groupby('Neighborhood').count()
neighborhood_cafes = neighborhood_cafes.filter(['Venue_Category'])
neighborhood_cafes.rename(columns = {'Venue_Category': 'Number of Cafes'}, inplace = True)
neighborhood_cafes 

Unnamed: 0_level_0,Number of Cafes
Neighborhood,Unnamed: 1_level_1
Alderwood,1
Bathurst Manor,2
Bayview Village,1
Bedford Park,3
Birch Cliff,1
Brockton,4
Cabbagetown,6
Church and Wellesley,9
Davisville,4
Deer Park,2


Combine the data about the number of cafes in a neighborhood with the demographic data.

In [109]:
merged_final = pd.merge(merged_df,
                       neighborhood_cafes,
                       on = 'Neighborhood',
                       how = 'outer')
merged_final.fillna(0)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Population,Number of Cafes
0,M3A,North York,Parkwoods,43.753259,-79.329656,26533,0.0
1,M4A,North York,Victoria Village,43.725882,-79.315572,17047,1.0
2,M6A,North York,Lawrence Heights,43.718518,-79.464763,3769,1.0
3,M6A,North York,Lawrence Manor,43.718518,-79.464763,13750,1.0
4,M1B,Scarborough,Rouge,43.806686,-79.194353,22724,0.0
5,M1B,Scarborough,Malvern,43.806686,-79.194353,44324,0.0
6,M5B,Downtown Toronto,Garden District,43.657162,-79.378937,8240,11.0
7,M9B,Etobicoke,Princess Gardens,43.650943,-79.554724,9288,0.0
8,M9B,Etobicoke,West Deane Park,43.650943,-79.554724,4395,0.0
9,M1C,Scarborough,Highland Creek,43.784535,-79.160497,12853,0.0


**4. Results**

In this section, I will creata an index called the "per capita cafe" index, which calculates the number of cafes in a neighborhood for each neighborhood resident.

In [110]:
#check the data types and change them into the correct version
merged_final.dtypes
merged_final["Population"] = merged_final.Population.astype(float)
merged_final

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Population,Number of Cafes
0,M3A,North York,Parkwoods,43.753259,-79.329656,26533.0,
1,M4A,North York,Victoria Village,43.725882,-79.315572,17047.0,1.0
2,M6A,North York,Lawrence Heights,43.718518,-79.464763,3769.0,1.0
3,M6A,North York,Lawrence Manor,43.718518,-79.464763,13750.0,1.0
4,M1B,Scarborough,Rouge,43.806686,-79.194353,22724.0,
5,M1B,Scarborough,Malvern,43.806686,-79.194353,44324.0,
6,M5B,Downtown Toronto,Garden District,43.657162,-79.378937,8240.0,11.0
7,M9B,Etobicoke,Princess Gardens,43.650943,-79.554724,9288.0,
8,M9B,Etobicoke,West Deane Park,43.650943,-79.554724,4395.0,
9,M1C,Scarborough,Highland Creek,43.784535,-79.160497,12853.0,


In [112]:
merged_final['Per Capita Cafe'] = merged_final['Number of Cafes']/merged_final['Population']
merged_final.fillna(0)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Population,Number of Cafes,Per Capita Cafe
0,M3A,North York,Parkwoods,43.753259,-79.329656,26533.0,0.0,0.0
1,M4A,North York,Victoria Village,43.725882,-79.315572,17047.0,1.0,5.9e-05
2,M6A,North York,Lawrence Heights,43.718518,-79.464763,3769.0,1.0,0.000265
3,M6A,North York,Lawrence Manor,43.718518,-79.464763,13750.0,1.0,7.3e-05
4,M1B,Scarborough,Rouge,43.806686,-79.194353,22724.0,0.0,0.0
5,M1B,Scarborough,Malvern,43.806686,-79.194353,44324.0,0.0,0.0
6,M5B,Downtown Toronto,Garden District,43.657162,-79.378937,8240.0,11.0,0.001335
7,M9B,Etobicoke,Princess Gardens,43.650943,-79.554724,9288.0,0.0,0.0
8,M9B,Etobicoke,West Deane Park,43.650943,-79.554724,4395.0,0.0,0.0
9,M1C,Scarborough,Highland Creek,43.784535,-79.160497,12853.0,0.0,0.0


Rank the neighbors and get ten with the highest per capita cafe indexes.

In [122]:
merged_final.sort_values(by = ['Per Capita Cafe'], inplace=True, ascending = False)
merged_final.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Population,Number of Cafes,Per Capita Cafe
28,M5J,Downtown Toronto,Toronto Islands,43.640816,-79.381752,627.0,16.0,0.025518
66,M5T,Downtown Toronto,Kensington Market,43.653206,-79.400049,3740.0,10.0,0.002674
27,M2J,North York,Henry Farm,43.778517,-79.346556,2790.0,5.0,0.001792
49,M6N,York,Runnymede,43.673185,-79.487262,4382.0,6.0,0.001369
50,M6S,West Toronto,Runnymede,43.651571,-79.48445,4382.0,6.0,0.001369
6,M5B,Downtown Toronto,Garden District,43.657162,-79.378937,8240.0,11.0,0.001335
13,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,14666.0,19.0,0.001296
14,M4X,Downtown Toronto,St. James Town,43.667967,-79.367675,14666.0,19.0,0.001296
65,M5T,Downtown Toronto,Grange Park,43.653206,-79.400049,9007.0,10.0,0.00111
29,M6J,West Toronto,Little Portugal,43.647927,-79.41975,5013.0,5.0,0.000997


Notice that there are a number of neighborhoods in Toronto that do not have cafes. Here is a list of them.

In [125]:
no_cafes = merged_final.loc[merged_final['Per Capita Cafe'].isin(['NaN'])]
no_cafes.fillna(0)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Population,Number of Cafes,Per Capita Cafe
0,M3A,North York,Parkwoods,43.753259,-79.329656,26533.0,0.0,0.0
4,M1B,Scarborough,Rouge,43.806686,-79.194353,22724.0,0.0,0.0
5,M1B,Scarborough,Malvern,43.806686,-79.194353,44324.0,0.0,0.0
7,M9B,Etobicoke,Princess Gardens,43.650943,-79.554724,9288.0,0.0,0.0
8,M9B,Etobicoke,West Deane Park,43.650943,-79.554724,4395.0,0.0,0.0
9,M1C,Scarborough,Highland Creek,43.784535,-79.160497,12853.0,0.0,0.0
10,M1C,Scarborough,Rouge Hill,43.784535,-79.160497,11167.0,0.0,0.0
11,M1C,Scarborough,Port Union,43.784535,-79.160497,12450.0,0.0,0.0
17,M1E,Scarborough,Guildwood,43.763573,-79.188711,12820.0,0.0,0.0
18,M1E,Scarborough,Morningside,43.763573,-79.188711,11472.0,0.0,0.0


In [127]:
no_cafes.shape

(36, 8)

So here is a list of 36 sad neighborhoods that do not have any cafes.

**5. Discussion**

Obviously, there are a number of neighborhoods in Toronto that do have not cafes. These might be ideal locations for opening up a new cafe. <br/>
I also noticed that the neighborhoods with the highest per capita cafe indexes are located in central Toronto, presumably the business district. If one were to open a new cafe there, they might face greater competition.

**6. Conclusion**

This project aimed to find the best locations in Toronto to open a new cafe. It used data from Foursquare API and Wikipedia and created a "per capita cafe" index to calculate the number of cafes for each resident in a neighborhood. Of course, there are many limitations to this project. For example, the data from Foursquare and Wikipedia tend to be a bit dated (e.g., from 2013). Another limitation is that the number of residents in a neighborhood does not necessary equate to the demand for coffee at cafes. In the business district, for example, there are few residents but large amounts of white-collar workers who are there during day time. The demand for coffee in these areas are suppressed by my model. Nevertheless, the analysis reveals that many areas in Toronto lack coffee shops and that opening one in these areas might be a smart business move.