# Capstone Final Project - The Battle of the Neighbourhoods (Week 2)

### Applied Data Science Capstone by IBM Coursera

## Table of Contents
* [Introduction: Business Problem](#introduction)
* [Data Description](#data)
* [Data Preparation](#prep)
* [Data Exploration and Clustering](#dataexp)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem  <a name="introduction"></a>

#### Due to the crisis in Syria, there has been a great influx of Syrian refugees all of over Canada, especially in Toronto. 

#### The business problem I would like to address is the lack of Syrian restaurants in Toronto. This report will demonstrate to stakeholders where the best location would be to open a Syrian restaurant to increase the market for Syrian food as well as make the refugees feel more at home. 

#### For competition and marketing purposes, it is important to keep the location in a hotspot of restaurants, yet away from other Syrian restaurants. It would also be essential for the location to be as close to downtown Toronto as much as possible, as that is the area with the most local and tourist popularity. 

## Data Description  <a name="data"></a>

#### First, the neighbourhoods of Toronto via postal codes will be scraped from a Wikipedia page. Second, an analysis of the venues will be performed to find out the top 5 hotspots of restaurant areas. Refugee housing data will be used to find out a few locations where the Syrian refugees live. 

#### It is important for the restaurant to be in an already populated area close to downtown Toronto so that it can be known and marketed easily, however be as far away possible from other Syrian restaurants.

#### The following factors will influence our decision of optimal location for the Syrian restaurant:
   #### 1. Number of existing restaurants in the vicinity
   #### 2. Number of existing Syrian restaurants in the vicinity or their distance to the optimal location
   #### 3. Distance of optimal location to the downtown Toronto area
#### Therefore we will be using the following data to address these factors:
   #### 1. Toronto postal code data scraped from Wikipedia page to make a dataframe of boroughs, neighbourhoods, latitude, and longitude
   #### 2. Number of restaurants and their type and location in every neighborhood will be obtained using Foursquare API
   #### 3. Coordinate of downtown Toronto will be obtained using Google coordinates of the CN tower, which is a well known tourist attraction located in downtown Toronto

## Data Preparation <a name="prep"></a>

### Import Libaries

In [1]:
import pandas as pd
import numpy as np

### Scrape Wikipedia table of Toronto Postal Codes

In [2]:
df=pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")[0]

### Format Table

In [3]:
# Change 'Postcode' column title to 'PostalCode'
df.rename(columns={"Postcode": "PostalCode"})

# Drop all rows that are not assigned to a borough
indexNames = df[ df['Borough'] == 'Not assigned' ].index
df.drop(indexNames , inplace=True)

# Regroup all neighbourhoods that fall under the same borough so that they are comma seperated and in one row
df1 = df.groupby(by=['Postcode','Borough']).agg(lambda x: ', '.join(x))
df1.reset_index(level=['Postcode','Borough'], inplace=True)
df1.loc[df1['Neighbourhood']=="Not assigned",'Neighbourhood']=df1.loc[df1['Neighbourhood']=="Not assigned",'Borough']
df1

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


### Open link to a csv file that has the geographical coordinates of each postal code

In [4]:
geo_data=pd.read_csv("https://cocl.us/Geospatial_data")
geo_data

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


### Create the Dataframe

In [5]:
df1['Latitude']=geo_data['Latitude'].values
df1['Longitude']=geo_data['Longitude'].values
df1

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


## Data Exploration and Clustering of Toronto Neighbourhoods <a name="dataexp"></a>

### Import all required libraries.

In [6]:
import json
import requests
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes
import folium 
print('All libraries imported!')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    certifi-2019.6.16          |           py36_1         149 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.3 MB

The following NEW packages will be 

## Transform to pandas dataframe for neighbourhoods in Toronto

In [7]:
toronto_data= df1[df1['Borough'].str.contains('Toronto', na = False)].reset_index(drop=True)
toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


## Enter Toronto's coodinates from Google

In [8]:
latitude = 43.6532
longitude= -79.3832

## Create the map of Toronto using longitude and latitude with markers

In [9]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Define Foursquare Credentials and Version

In [26]:
CLIENT_ID = 'XWCAJYVE5H1ZRLXFO1H5HQRKSYVLOTRW2M1ITBMJXIV0BO5K' # your Foursquare ID
CLIENT_SECRET = 'K0M510H4PGPZMGGBJN0COJAOA4TI3G5HRSLKKRUKSQRCEFTB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: XWCAJYVE5H1ZRLXFO1H5HQRKSYVLOTRW2M1ITBMJXIV0BO5K
CLIENT_SECRET:K0M510H4PGPZMGGBJN0COJAOA4TI3G5HRSLKKRUKSQRCEFTB


## Create a function to repeat the same process to all the neighborhoods in Toronto

In [11]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Create a code to run the above function on each neighborhood and create a new dataframe called toronto_venues

In [12]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighbourhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The 

## Check size of resulting dataframe

In [13]:
print(toronto_venues.shape)
toronto_venues.head()

(1690, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Glen Stewart Ravine,43.6763,-79.294784,Other Great Outdoors
4,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood


## Analyze each neighbourhood

In [14]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
cols=list(toronto_onehot.columns.values)
cols.pop(cols.index('Neighborhood'))
toronto_onehot=toronto_onehot[['Neighborhood']+cols]

# rename Neighborhood for Neighbourhood so that future merge works
toronto_onehot.rename(columns = {'Neighborhood': 'Neighbourhood'}, inplace = True)
toronto_onehot.head()
toronto_onehot.shape

(1690, 232)

## Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category and check the new size

In [15]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped
toronto_grouped.shape

(38, 232)

## Print each neighborhood along with the top 5 most common venues

In [16]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
             venue  freq
0      Coffee Shop  0.08
1             Café  0.05
2              Bar  0.04
3  Thai Restaurant  0.04
4       Steakhouse  0.04


----Berczy Park----
            venue  freq
0     Coffee Shop  0.09
1    Cocktail Bar  0.05
2  Farmers Market  0.04
3        Beer Bar  0.04
4      Steakhouse  0.04


----Brockton, Exhibition Place, Parkdale Village----
            venue  freq
0  Breakfast Spot  0.09
1     Coffee Shop  0.09
2            Café  0.09
3             Bar  0.04
4       Pet Store  0.04


----Business Reply Mail Processing Centre 969 Eastern----
           venue  freq
0    Pizza Place  0.06
1     Restaurant  0.06
2        Brewery  0.06
3  Burrito Place  0.06
4     Smoke Shop  0.06


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue  freq
0   Airport Service  0.19
1    Airport Lounge  0.12
2  Airport Terminal  0.12
3           Airport  0.06
4      

## Put that into a pandas dataframe, write a function to sort the venues in descending order

In [17]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Create the new dataframe and display the top 10 venues for each neighborhood

In [18]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Bar,Steakhouse,Thai Restaurant,Hotel,American Restaurant,Restaurant,Gym,Asian Restaurant
1,Berczy Park,Coffee Shop,Cocktail Bar,Cheese Shop,Farmers Market,Bakery,Café,Beer Bar,Seafood Restaurant,Steakhouse,Irish Pub
2,"Brockton, Exhibition Place, Parkdale Village",Breakfast Spot,Café,Coffee Shop,Yoga Studio,Pet Store,Burrito Place,Caribbean Restaurant,Restaurant,Climbing Gym,Performing Arts Venue
3,Business Reply Mail Processing Centre 969 Eastern,Fast Food Restaurant,Auto Workshop,Park,Gym / Fitness Center,Pizza Place,Restaurant,Burrito Place,Skate Park,Smoke Shop,Brewery
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Airport Lounge,Airport Terminal,Boutique,Harbor / Marina,Coffee Shop,Boat or Ferry,Sculpture Garden,Bar,Airport Gate
5,"Cabbagetown, St. James Town",Coffee Shop,Restaurant,Café,Flower Shop,Italian Restaurant,Bakery,Pub,Pizza Place,Pet Store,Breakfast Spot
6,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Ice Cream Shop,Sandwich Place,Burger Joint,Indian Restaurant,Middle Eastern Restaurant,Spa,Chinese Restaurant
7,"Chinatown, Grange Park, Kensington Market",Café,Vegetarian / Vegan Restaurant,Chinese Restaurant,Bar,Mexican Restaurant,Coffee Shop,Dumpling Restaurant,Bakery,Vietnamese Restaurant,Dessert Shop
8,Christie,Grocery Store,Café,Park,Diner,Restaurant,Italian Restaurant,Convenience Store,Baby Store,Coffee Shop,Athletics & Sports
9,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Restaurant,Pub,Bubble Tea Shop,Café,Men's Store,Mediterranean Restaurant


# Cluster Neighbourhoods

## Run k-means to cluster the neighborhood into 5 clusters

In [19]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

## Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [20]:
toronto_merged = toronto_data

# add clustering labels
toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Health Food Store,Other Great Outdoors,Trail,Pub,Dive Bar,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Furniture / Home Store,Pizza Place,Bookstore,Brewery,Bubble Tea Shop,Caribbean Restaurant
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0,Gym,Fish & Chips Shop,Sushi Restaurant,Sandwich Place,Brewery,Steakhouse,Ice Cream Shop,Pub,Movie Theater,Italian Restaurant
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Gastropub,Bakery,American Restaurant,Italian Restaurant,Chinese Restaurant,Bookstore,Fish Market,Brewery
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Photography Studio,Park,Bus Line,Swim School,Dive Bar,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store


## Visualize the resulting clusters

In [21]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Analysis <a name="analysis"></a>

## Lets go back to our original dataframe and view the data

In [22]:
toronto_merged

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Health Food Store,Other Great Outdoors,Trail,Pub,Dive Bar,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Furniture / Home Store,Pizza Place,Bookstore,Brewery,Bubble Tea Shop,Caribbean Restaurant
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0,Gym,Fish & Chips Shop,Sushi Restaurant,Sandwich Place,Brewery,Steakhouse,Ice Cream Shop,Pub,Movie Theater,Italian Restaurant
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Gastropub,Bakery,American Restaurant,Italian Restaurant,Chinese Restaurant,Bookstore,Fish Market,Brewery
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Photography Studio,Park,Bus Line,Swim School,Dive Bar,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0,Clothing Store,Gym / Fitness Center,Gym,Grocery Store,Park,Breakfast Spot,Hotel,Sandwich Place,Food & Drink Shop,Asian Restaurant
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0,Sporting Goods Shop,Coffee Shop,Clothing Store,Yoga Studio,Dessert Shop,Spa,Burger Joint,Metro Station,Mexican Restaurant,Salon / Barbershop
7,M4S,Central Toronto,Davisville,43.704324,-79.38879,0,Pizza Place,Sandwich Place,Dessert Shop,Café,Italian Restaurant,Coffee Shop,Sushi Restaurant,Gourmet Shop,Deli / Bodega,Seafood Restaurant
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,0,Playground,Trail,Summer Camp,Yoga Studio,Dive Bar,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
9,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049,0,Pub,Coffee Shop,Fried Chicken Joint,Light Rail Station,Sports Bar,Supermarket,Restaurant,Sushi Restaurant,Bagel Shop,Liquor Store


## Only include to food-related areas to find a few hotspots. We will assume that 10 food-related business can be considered a hotspot. 

In [23]:
# Determine all venues that are not food-related
non_resto= ['Gym', 'Park', 'Hotel', 'Light Rail Station', 'Studio', 'Spa', 'Garden', 'Garden Center', 'Gift Shop', 'Bookstore', 'Airport', 'Boutique', 'Sporting Goods Shop', 'Summer Camp', 'Jewelry Store', 'Airport Service', 'Comic Shop', 'Pharmacy', 'Supermarket', 'Trail', 'Cosmetics Shop', 'Men\'s Store', 'Stadium', 'Yoga Studio', 'Performing Arts Venue', 'Arts & Crafts Store']

# Build the dataframe 
hs1 = toronto_merged[-toronto_merged['1st Most Common Venue'].isin(non_resto)]
hs2 = hs1[-hs1['2nd Most Common Venue'].isin(non_resto)]
hs3 = hs2[-hs2['3rd Most Common Venue'].isin(non_resto)]
hs4 = hs3[-hs3['4th Most Common Venue'].isin(non_resto)]
hs5 = hs4[-hs4['5th Most Common Venue'].isin(non_resto)]
hs6 = hs5[-hs5['6th Most Common Venue'].isin(non_resto)]
hs7 = hs6[-hs6['7th Most Common Venue'].isin(non_resto)]
hs8 = hs7[-hs7['8th Most Common Venue'].isin(non_resto)]
hs9 = hs8[-hs8['9th Most Common Venue'].isin(non_resto)]
hotspots = hs9[-hs9['10th Most Common Venue'].isin(non_resto)]

#Show resulting hotspots
hotspots





Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,M4S,Central Toronto,Davisville,43.704324,-79.38879,0,Pizza Place,Sandwich Place,Dessert Shop,Café,Italian Restaurant,Coffee Shop,Sushi Restaurant,Gourmet Shop,Deli / Bodega,Seafood Restaurant
11,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675,0,Coffee Shop,Restaurant,Café,Flower Shop,Italian Restaurant,Bakery,Pub,Pizza Place,Pet Store,Breakfast Spot
16,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Coffee Shop,Cocktail Bar,Cheese Shop,Farmers Market,Bakery,Café,Beer Bar,Seafood Restaurant,Steakhouse,Irish Pub
26,M5T,Downtown Toronto,"Chinatown, Grange Park, Kensington Market",43.653206,-79.400049,0,Café,Vegetarian / Vegan Restaurant,Chinese Restaurant,Bar,Mexican Restaurant,Coffee Shop,Dumpling Restaurant,Bakery,Vietnamese Restaurant,Dessert Shop
32,M6J,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975,0,Bar,Coffee Shop,Asian Restaurant,Vietnamese Restaurant,Café,Restaurant,French Restaurant,Cocktail Bar,Bakery,New American Restaurant
36,M6S,West Toronto,"Runnymede, Swansea",43.651571,-79.48445,0,Coffee Shop,Pizza Place,Café,Sushi Restaurant,Italian Restaurant,Burrito Place,Food,Fish Market,Fish & Chips Shop,Smoothie Shop


In [24]:
(43.704324, -79.388790)
(43.667967, -79.367675)
(43.644771, -79.373306)
(43.653206, -79.400049)
(43.647927, -79.419750)
(43.651571, -79.484450)

(43.651571, -79.48445)

In [25]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [51]:
hotspots.loc[16, 'Neighbourhood']

'Berczy Park'

In [52]:
neighbourhood_latitude = hotspots.loc[16, 'Latitude'] # neighbourhood latitude value
neighbourhood_longitude = hotspots.loc[16, 'Longitude'] # neighbourhood longitude value

neighbourhood_name = hotspots.loc[16, 'Neighbourhood'] # neighbourhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of Berczy Park are 43.644770799999996, -79.3733064.


In [53]:
# create map of Manhattan using latitude and longitude values
map_hotspots = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(hotspots['Latitude'], hotspots['Longitude'], hotspots['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hotspots)  
    
map_hotspots

In [55]:
import requests
client_id = "XWCAJYVE5H1ZRLXFO1H5HQRKSYVLOTRW2M1ITBMJXIV0BO5K"
client_secret = "K0M510H4PGPZMGGBJN0COJAOA4TI3G5HRSLKKRUKSQRCEFTB"
version = "20180605"
category = "5bae9231bedf3950379f89da"

optimalLocations = [
    (43.704324, -79.388790),
    (43.667967, -79.367675),
    (43.644771, -79.373306),
    (43.653206, -79.400049), 
    (43.647927, -79.419750),
    (43.651571, -79.484450)
]


def getRestaurants(lat, lon, radius=500, limit=100):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}".format(client_id, client_secret, version, lat, lon, category, radius, limit)
    response = requests.get(url).json()

    print(response)

def getAllLocations():
    for coords in optimalLocations:
        getRestaurants(coords[0],coords[1])

getAllLocations()




In [56]:
#!conda install -c conda-forge geopy 
        
import geopy.distance
import sys

cn = (43.6426, -79.3871)

def getClosestToCNTower():
    minDistance = sys.maxsize
    minCoords = cn
    for coords in optimalLocations:
        distance = (geopy.distance.distance(coords,cn))
        if distance < minDistance:
            minDistance = distance
            minCoords = coords
    return minCoords
print (sys.maxsize)
print(getClosestToCNTower())


9223372036854775807
(43.644771, -79.373306)


## Results and Discussion  <a name="results"></a>

   ### The data was gathered by assembling the neighbourhood, boroughs and post codes in Toronto with their latitude and longitude coordinates. A pandas dataframe was created by this data, and using Toronto's coordinates, a map was generated using Folium. The neighbourhoods were clustered and segemented and shown on the map. Using this data and Foursquare API, venues were explored in these neighbourhoods. Particularly, restaurants and food areas were highlighted in the pandas dataframe, and sorted to isolate the restaurant hotspots in Toronto. This was done by assuming that if a neigbourhood had a total of 10 restaurants in its area, then it would be considered a hotspot. This brought the data down to 6 hotspots, which brings us to the next challenge: Syrian restaurants. 

   ### Foursquare API exploration resulted in a lack of Syrian restaurants in any of the isolated restaurant hotspots. This is excellent news! We can choose any one of these areas for the Syrian restaurant. However, to optimize the location and ensure marketting success, we found the hotspot that is the closest to the CN tower. The CN tower is the main tourist attraction of Toronto: it is a 553.3 m high communications and observations tower located in the heart of Downtown Toronto. After mapping the distances, we found that Berczy Park is within the closest proximity to the CN tower, and it is the restaurant hotspot that would be the most optimal location for a Syrian restaurant. 

## Conclusion  <a name="conclusion"></a>

   ### There is the lack of Syrian restaurants in Toronto compared to others ex. Indian, Chinese, Italian etc. It is concluded at the end of this report that Berczy Park will demonstrate is the best location to open a Syrian restaurant to increase the market for Syrian food as well as make the refugees feel more at home. The location is in a hotspot of restaurants, as demonstrated by Foursquare API,  yet away from other Syrian restaurants. This report also evaluated the location as within close proximity to CN tower in Downtown Toronto, attracting an abundance of local and tourist attention.