# Identifying locations in Toronto to open Coffee Shops (Week 2)

#### Applied Data Science Capstone by Parth Thakurdesai

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data Requirement](#data)
* [Methodology](#method)
    * [1) Creating Pandas DataFrame](#dataframe)
    * [2) Data cleaning](#datac)
    * [3) geocoder](#geocoder)
* [Data Gathering using FourSquare API](#API)
* [Analysis](#Analysis)
* [Conclusion](#conclusion)

__NOTE__ : Unable to view folium maps on GitHub Repository. 

## Introduction: Business Problem   <a name="introduction"></a>

Coffee shops are part of every neighbourhood. Let's say you are a Director of Western operations for a leading Coffee Chain. You want to increase your presence and dominance in the Toronto Market.  How would you narrow down your search of communities? 

In order to do that we will explore: 
* Number of coffee shops are present in each neighbourhood  
* Distance of neighbourhood to City Center (Location of CN Tower)  
* Average rating of coffee shop in Neighbourhood  


# Data Requirement  <a name="data"></a>
Good starting point for this analysis is to isolate Boroughs and Neighbhoods that are relatively close to city center (Location of CN Tower) and has lower density of Coffee_shops and on average have lower rated Coffee Shops. 

We will be using Wikipedia to gather data on Toronto Neighbourhoods. We will be making API calls to FourSquare API to get number of coffee shops and the average rating. 


__Scrapping data on Toronto Neighbourhoods__
In this assignment we will be scrapping data on Toronto Neighborhoods from the wikipedia page,
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

We will be fetching the following:

Fetch __PostalCode__ , __Borough__ , __Neighborhood__

# Methodology: <a name="method"></a>
In this section, we run through the data gathering, data clearning, data manipulation to end up with a datafram that we can use for our analysis. 

### 1) Creating Pandas DataFrame<a name="dataframe"></a>
    First we convert the url webpage into a pandas data frame. 

In [1]:
import pandas as pd
url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

df=pd.read_html(url, header=0)[0]

df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


### 2) Data Cleaning<a name="datac"></a>

In [2]:
# We will replace all the "Not assigned" values in the Borough dataframe and convert it to np.Nan values

# import numpy liabrary for np.Nan function
import numpy as np

df['Borough'].replace('Not assigned', np.NaN, inplace=True)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,,Not assigned
1,M2A,,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


As you can see above, all the null values in the Borough column are converted to __np.Nan__

This makes droping the null value rows in a column easier. 

__.dropna()__ function can be used to drop all the __np.Nan__ values
from the dataframe. 

In [3]:
#Use the .dropna() function to drop all the null values. 
# inplace=True modifies/updates the original dataframe. 

df.dropna(inplace=True)
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Queen's Park,Not assigned
9,M9A,Queen's Park,Queen's Park
10,M1B,Scarborough,Rouge
11,M1B,Scarborough,Malvern
13,M3B,North York,Don Mills North


We need to create a __for__ loop to replace 'Not assigned' values of the 'Neighbourhood' column to have the same values of the Borough column. 

In [4]:
for i in range(len(df)):
    if df.iloc[i,2]=='Not assigned':
        df.iloc[i,2]=df.iloc[i,1]

# Take a look at df
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Queen's Park,Queen's Park
9,M9A,Queen's Park,Queen's Park
10,M1B,Scarborough,Rouge
11,M1B,Scarborough,Malvern
13,M3B,North York,Don Mills North


As you can see, the value of Neighbourhood column at idex=7 is changed to __Queen's Park__

Next we will use the __groupby__ command to group by unique values of the Postcode

In [5]:
# groupby() to group elements of Postalcode
df.groupby(['Postcode']).head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
...,...,...,...
281,M8Z,Etobicoke,Kingsway Park South West
282,M8Z,Etobicoke,Mimico NW
283,M8Z,Etobicoke,The Queensway West
284,M8Z,Etobicoke,Royal York South West


Next we need to __Aggregate__ the values of the __Neighbourhood__ column based on the __Postcode__ values.

Here, we update the original dataframe and use __.agg(','.join)__ command to aggreagate the values of __Neighbourhood__ and __Borough__ column. 

The __.agg__ will the __join__ the values seperated by __","__

In [6]:
df=df[['Postcode','Borough','Neighbourhood']].groupby('Postcode',as_index=False).agg(','.join)

Notice that in the table below, the values of __Borough__ table also got __aggregated__  resulting in a lot of repeat values.

example: Scarborough,Scarborough,Scarborough

In [7]:
# Let's take a look at the updated dataframe. 
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,"Scarborough,Scarborough","Rouge,Malvern"
1,M1C,"Scarborough,Scarborough,Scarborough","Highland Creek,Rouge Hill,Port Union"
2,M1E,"Scarborough,Scarborough,Scarborough","Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Next, we need to get rid of the repeat values in Borough column.

First we create object __'col'__. Then we create the column value into a string and __split__ the string on __","__ . 

Next we conver the resulting list to a __set__. Recall that __set__ does not contain repeate values. We then __join__ the strings with a __","__ seperating the strings. 

In [8]:
# use .apply(set) method to convert the array into a set. 
col=df['Borough'].str.split(',').apply(set).str.join(',')

In [9]:
# Next we update the df with the col values. 
df.update(col)

In [10]:
#Lets take a look at the final dataframe
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


#### shape of the dataframe

In [11]:
df.shape

(103, 3)

### 3) Geocoder<a name="geocoder"></a>

__Assignment__:
Now that you have built a dataframe of the postal code of each neighbourhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the __latitude__ and the __longitude__ coordinates of each neighborhood. 

The geocoder package kept denying the API request, hence we decide to go ahead with __pgeocoder__

In [12]:
#import python geocoder
import pgeocode as pgeo

Let us explore how geocoder works:
#### 1) First we create an object 
#### 2) Post Query

In [13]:
# creat an object
nomi=pgeo.Nominatim("ca")

#post query
nomi.query_postal_code("M1B")

postal_code                                       M1B
country code                                       CA
place_name        Scarborough (Malvern / Rouge River)
state_name                                    Ontario
state_code                                         ON
county_name                               Scarborough
county_code                                       NaN
community_name                                    NaN
community_code                                    NaN
latitude                                      43.8113
longitude                                     -79.193
accuracy                                            6
Name: 0, dtype: object

As you can see in the dataframe above. The query returns: 

postal_code, country code, place_name, state_name, state_code, country_code, community_name, community_code, __latitude__ , __longitude__ .

We are only interested in the latitude and longitude of the place, this is how we go about accessing thse:

In [14]:
# save the output of the query in a variable
a=nomi.query_postal_code("M1C")

# convert to pandas dataframe
df_2=pd.DataFrame(a)
df_2.head()

Unnamed: 0,0
postal_code,M1C
country code,CA
place_name,Scarborough (Rouge Hill / Port Union / Highlan...
state_name,Ontario
state_code,ON


In [15]:
#Accessing latitude and longitude
print("Latitude  :" + str(a.latitude))
print("Longitude :" + str(a.longitude))

Latitude  :43.7878
Longitude :-79.1564


Now lets turn back to our original dataframe. 

First we create additional columns in the dataframe. Columns titled __Latitude__ and __Longitude__

In [16]:
df.insert(3, "Latitude","")
df.insert(4, "Longitude","")

df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",,
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",,
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",,
3,M1G,Scarborough,Woburn,,
4,M1H,Scarborough,Cedarbrae,,


Next, we make query calls inside a for loop and extracting the latitude and longitude values from the results and then update the dataframe

In [17]:
# For loop to extract latitude and longitude and update dataframe
for i in range(len(df)):
    A=nomi.query_postal_code(df.iloc[i,0])
    df.iloc[i,3]=A.latitude
    df.iloc[i,4]=A.longitude

In [18]:
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.8113,-79.193
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.7878,-79.1564
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.7678,-79.1866
3,M1G,Scarborough,Woburn,43.7712,-79.2144
4,M1H,Scarborough,Cedarbrae,43.7686,-79.2389
5,M1J,Scarborough,Scarborough Village,43.7464,-79.2323
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.7298,-79.2639
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.7122,-79.2843
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.7247,-79.2312
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.6952,-79.2646


The Above Data frame is complete. 

# Data Gathering using FourSquare API  <a name="API"></a>

### Question: 
Coffee shops are part of every neighbourhood. Let's say you are a Director of Western operations for a leading Coffee Chain. You want to increase your presence and dominance in the Toronto Market.  How would you narrow down your search of communities? 

In order to do that we will explore: 
* Number of coffee shops are present in each neighbourhood  
* Distance of neighbourhood to City Center (Location of CN Tower)  
* Average rating of coffee shop in Neighbourhood  

Good starting point for this analysis is to isolate Boroughs and Neighbhoods that are relatively close to city center (Location of CN Tower) and has lower density of Coffee_shops and on average have lower rated Coffee Shops. 

You are confident that you can beat them on quality. 

### 1) Define a Function that returns latitude and longitude from Postal Code:

In [19]:
#import python geocoder
import pgeocode as pgeo

def get_coordinates(postal_code):
    nomi=pgeo.Nominatim("ca") #ca: Canada
    info=nomi.query_postal_code(postal_code)
    return [info.latitude , info.longitude]

We will assume the location of City Center to be the postal code for CN Tower, which is __M5V__. We will use this location to calculate distance to each restaurent. 

In [20]:
# Coordinates for City Center (CN Tower: "M5V")
CN_Tower = get_coordinates("M5V")
print("Coordinates of CN_Tower : ",CN_Tower)

Coordinates of CN_Tower :  [43.6404, -79.3995]


### 2) Create Functions to Convert Lat & Long in degrees to Cartesian Coordinates in meters. 
To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters). 

In [21]:
#!conda install -c conda-forge shapely
#!pip install shapely
import shapely.geometry

import math

!pip install pyproj
import pyproj

def lonlat_to_xy(lon,lat):
    proj_latlon=pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj='utm',zone=33,datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x,y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj='utm', zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x,y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dist = math.sqrt((x2 - x1)**2 + (y2 - y1)**2)
    return dist
    
print('Coordinate transfomration check')
print('-------------------------------')
print('CN Tower longitude={}, latitude={}'.format(CN_Tower[1], CN_Tower[0]))
x, y = lonlat_to_xy(CN_Tower[0],CN_Tower[1])
print('CN Tower UTM X={}, Y={}'.format(x,y))
la,lo=xy_to_lonlat(x,y)
print('CN Tower longitude={}, latitude={}'.format(lo,la))

Coordinate transfomration check
-------------------------------
CN Tower longitude=-79.3995, latitude=43.6404
CN Tower UTM X=1065456.1759511982, Y=-8956640.641693817
CN Tower longitude=-79.39949999999999, latitude=43.640399999999964


### 3) Visualization on Folium map
Let's visualize the data we have so far: city center location and candidate neighborhood centers:

In [22]:
import folium
map_Toronto = folium.Map(location=CN_Tower, zoom_start=12)

# Circle Mark CN_Tower 
folium.Marker(CN_Tower, popup='CN Tower').add_to(map_Toronto)

map_Toronto

Let us now put all the other neighbourhoods on the map of Toronto. 

For some reason, pgeo won't return latitude and longitude values for postal code __"M7R"__ hence we decide to drop it from out dataframe: 

In [23]:
df.dropna(inplace=True)
df.shape

(102, 5)

As you can see from the shape of the dataframe that one row is dropped (Row containing postal_code "M7R")

In [24]:
# Create Latitude, Longitude, and Borough arrays:
Latitude=np.array(df['Latitude'])
Longitude=np.array(df['Longitude'])
Borough=np.array(df['Borough'])

# Add neighbourhood markets to the map
for lat,long,name in zip(Latitude, Longitude, Borough):
        folium.CircleMarker([lat, long],
                       radius=5, color='red',
                       fill=True, fill_color='red',
                            popup=name,
                       fill_opacity=0.6).add_to(map_Toronto)
map_Toronto

### 4) Calculate the distance between Neighbourhood and CN Tower
Now let us try to calculate distance from the Neighbourhood to CN Tower and add it to dataframe:

In [25]:
# for loop to create distance in km
from geopy import distance

dist_km=[]
for i in range(len(df)):
    coordinates=[df.iloc[i,3],df.iloc[i,4]]
    dist=distance.distance(coordinates,CN_Tower).km
    dist_km.append(dist)

# Add Distance (km) column to dataframe
df.insert(5,"Distance (km)",dist_km)

# View the updated dataframe
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Distance (km)
0,M1B,Scarborough,"Rouge,Malvern",43.8113,-79.193,25.24667
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.7878,-79.1564,25.535002
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.7678,-79.1866,22.245142
3,M1G,Scarborough,Woburn,43.7712,-79.2144,20.827534
4,M1H,Scarborough,Cedarbrae,43.7686,-79.2389,19.247252


#### Mean Distance between Neighbourhood and CN Tower (Downtown)

In [26]:
print("Mean distance between Neighbourhood and CN Tower is " + str(df['Distance (km)'].mean()) + "  km")

Mean distance between Neighbourhood and CN Tower is 10.447683555300506  km


### 6) Make API calls to foursquare to get information on Coffee Shops (Tim Hortons vs. Starbucks)

In this section, we will make API calls to retrieve number of coffee shops in neighbourhood and the average rating of coffee shops in that particular neighbourhood. 
We are going to place the API calls inside a for loop to get the above information.

In [27]:
import requests
import json

# Foursquare Credentials:
CLIENT_ID='WO55WVEUHEFE50IEG1WFZ5YWDWFV4CP5NP5WTW3LUKRZUNKN'
CLIENT_SECRET='V2ESN44DQNRUATPTW43CK05HWBDXULCJEF4YWFUQTFND110A'
VERSION='20180602'

In [28]:
# Function to conver lat and lon to string:
# used to make API calls:
def to_string(lat,lon):
    return str(lat)+','+str(lon)

In [29]:
num_coffee_shops=[]
avg_rating=[]

for i in range(len(df)):
    coordinates=[df.iloc[i,3],df.iloc[i,4]]
    
    url='https://api.foursquare.com/v2/venues/search'
    parameters={'client_id':CLIENT_ID,
        'client_secret':CLIENT_SECRET,
        'v': VERSION,
        'll':to_string(df.iloc[i,3],df.iloc[i,4]),
        'radius':'1000',   # Search for venues in 1000 m radius (1km)
        'query':'Coffee', # Search for coffee shops
        'limit':'10'
        }
    
    r=requests.get(url,params=parameters).json()
    a=r['response']['venues']
    data_frame=pd.DataFrame(a)
    
    ratings=[]
    
    for j in range(len(data_frame)):
        venue_id = data_frame['id'][j]
        url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'\
        .format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
        
        result = requests.get(url).json()
        
        try:
            r=result['response']['venue']['rating']
            ratings.append(r)
        except:
            r=np.nan
            ratings.append(r)
    
    a=np.array(ratings)
    
    # drop any np.nan values from the dataframe
    a = a[~np.isnan(a)]
    
    avg_rating.append(a.mean())
    num_coffee_shops.append(len(data_frame))    

  ret = ret.dtype.type(ret / rcount)


Add the number of coffee shops and the average rating of coffee shops in the neighbourhood to the original dataframe

In [30]:
# Add "Number of Coffee Shops" and "Average Rating" to the dataframe. 
df.insert(6,"Number of Coffee Shops",num_coffee_shops)
df.insert(7,"Average Rating",avg_rating)


# View the updated dataframe
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Distance (km),Number of Coffee Shops,Average Rating
0,M1B,Scarborough,"Rouge,Malvern",43.8113,-79.193,25.24667,0,
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.7878,-79.1564,25.535002,2,
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.7678,-79.1866,22.245142,0,
3,M1G,Scarborough,Woburn,43.7712,-79.2144,20.827534,1,
4,M1H,Scarborough,Cedarbrae,43.7686,-79.2389,19.247252,1,


Unfortunatyely as it turns out, most of the coffee shops in the area are not rated. This makes our analysis difficult as most of the API calls to collect information are premium calls and require additional payments.

# Analysis <a name="Analysis"></a>

#### 5.a) Neighbourhoods that have rated Coffee shops

In [31]:
#Change Nan values to 'Not Available'
df['Average Rating'].replace(np.NaN, 'Not Available', inplace=True)

In [43]:
df.loc[df['Average Rating'] != 'Not Available']

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Distance (km),Number of Coffee Shops,Average Rating


#### 5.b) Neighbourhoods with more than 5 coffee shops

In [33]:
df.loc[df['Number of Coffee Shops'] > 5]

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Distance (km),Number of Coffee Shops,Average Rating
22,M2N,North York,Willowdale South,43.7673,-79.4111,14.130379,8,Not Available
36,M4C,East York,Woodbine Heights,43.6913,-79.3116,9.068859,8,Not Available
37,M4E,East Toronto,The Beaches,43.6784,-79.2941,9.492545,7,Not Available
40,M4J,East York,East Toronto,43.6872,-79.3368,7.253483,10,Not Available
41,M4K,East Toronto,"The Danforth West,Riverdale",43.6803,-79.3538,5.765507,7,Not Available
43,M4M,East Toronto,Studio District,43.6561,-79.3406,5.06201,10,Not Available
45,M4P,Central Toronto,Davisville North,43.7135,-79.3887,8.168405,10,Not Available
46,M4R,Central Toronto,North Toronto West,43.7143,-79.4065,8.23011,9,Not Available
47,M4S,Central Toronto,Davisville,43.702,-79.3853,6.939268,7,Not Available
51,M4X,Downtown Toronto,"Cabbagetown,St. James Town",43.6684,-79.3689,3.971345,10,Not Available


As you can see from the table, most of the communities that have 10 coffee shops are fairly close to the CN Tower. Out job in this assignment is to find anomoly where the community is relatively close to the CN Tower but has fewer coffee shops. Average rating of coffee shops would have helped narrow down our search but due to unavailability of data this conjecture is difficult.

#### 5.c) Mean Distance and Average number of shops per Neighbourhood

In [34]:
#Calculate Mean_distance of community and Average Number of Coffee Shops 
mean_dist = df['Distance (km)'].mean()
avg_num_shops = df['Number of Coffee Shops'].mean()

print('Average distance to Neighbourhoods : ' + str(mean_dist))
print('Average number of shops : '+ str(avg_num_shops))

Average distance to Neighbourhoods : 10.447683555300506
Average number of shops : 3.950980392156863


#### 5.d) Analysis of Neighbourhoods

In this section we will look at communities that are closer than the average distance to Toronto City Center but have less than Average number of coffee shops.  
These are the potential coffee shop locations to expand franchines

In [35]:
df.loc[(df['Distance (km)'] < mean_dist) & (df['Number of Coffee Shops'] < avg_num_shops)]

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Distance (km),Number of Coffee Shops,Average Rating
35,M4B,East York,"Woodbine Gardens,Parkview Hill",43.7063,-79.3094,10.315336,3,Not Available
38,M4G,East York,Leaside,43.7124,-79.3644,8.485618,2,Not Available
39,M4H,East York,Thorncliffe Park,43.7059,-79.3464,8.443861,0,Not Available
44,M4N,Central Toronto,Lawrence Park,43.7301,-79.3935,9.977952,1,Not Available
48,M4T,Central Toronto,"Moore Park,Summerhill East",43.6899,-79.3853,5.617722,2,Not Available
50,M4W,Downtown Toronto,Rosedale,43.6827,-79.373,5.163019,2,Not Available
59,M5J,Downtown Toronto,"Harbourfront East,Toronto Islands,Union Station",43.623,-79.3936,1.990998,1,Not Available
62,M5M,North York,"Bedford Park,Lawrence Manor East",43.7335,-79.4177,10.447546,1,Not Available
63,M5N,Central Toronto,Roselawn,43.7113,-79.4195,8.040821,2,Not Available
71,M6A,North York,"Lawrence Heights,Lawrence Manor",43.7223,-79.4504,9.982361,2,Not Available


# Conclusion <a name="conclusion"></a>
The following __Boroughs__ are potential sites to build new coffee shops.

In [36]:
data_frame=df.loc[(df['Distance (km)'] < mean_dist) & (df['Number of Coffee Shops'] < avg_num_shops)]

#isolate Boroughs
a=data_frame['Borough']
b=np.array(a)
set(a)

{'Central Toronto',
 'Downtown Toronto',
 'East York',
 'Etobicoke',
 'North York',
 'York'}

The following __Neighbourhoods__ are potential sites to build new coffee shops

In [37]:
a=data_frame['Neighbourhood']

for i in a:
    i.split(',')
    
set(a)

{'Bedford Park,Lawrence Manor East',
 'Caledonia-Fairbanks',
 'Glencairn',
 'Harbourfront East,Toronto Islands,Union Station',
 'Humber Bay Shores,Mimico South,New Toronto',
 "Humber Bay,King's Mill Park,Kingsway Park South East,Mimico NE,Old Mill South,The Queensway East,Royal York South East,Sunnylea",
 'Kingsway Park South West,Mimico NW,The Queensway West,Royal York South West,South of Bloor',
 'Lawrence Heights,Lawrence Manor',
 'Lawrence Park',
 'Leaside',
 'Moore Park,Summerhill East',
 'Rosedale',
 'Roselawn',
 'The Junction North,Runnymede',
 'The Kingsway,Montgomery Road,Old Mill North',
 'Thorncliffe Park',
 'Woodbine Gardens,Parkview Hill'}

__NOTE__ : The scope of the project was severely reduced due to big requirement of Premium API calls to FourSquare API. 