# Capstone Project - The Battle of Neighborhoods

### Business Problem

Toronto is the capital city of the Canadian province of Ontario. 
With a recorded population of 2,731,571 in 2016 it is the most populous city in Canada and the fourth most populous city in North America.
The Province Ontario is one of the thirteen provinces and territories of Canada. 
Located in Central Canada, it is Canada's most populous province, with 38.3 percent of the country's population, and is the second-largest province by total area.

The diverse population of Toronto reflects its current and historical role as an important destination for immigrants to Canada.
More than 50 percent of residents belong to a visible minority population group, and over 200 distinct ethnic origins are represented among its inhabitants.
While the majority of Torontonians speak English as their primary language, over 160 languages are spoken in the city.

With it’s diverse culture, comes diverse food items and many different types of restaurants in Toronto.
As a part of this project, I will list and visualize this for Toronto.

Questions that can be asked regarding this:
- What is best location in Totonto for a new sushi restaurant?
- Which areas have large number of sushi resturants?
- Which areas have less number of sushi resturants?
- Which is the best place to stay if I prefer sushi resturants?

### Data Description

For this project I need the following data:
- Toronto data with the different districts and postal codes 
- Restaturant data for Totonto

Totonto data:
Datasource : https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
Description: I will Scrap Totonto districts Table from Wikipedia and get the coordinates owith the Geopy client.

Totronto resturants data: needs to containn Locality, Resturant name, their latitude and longitude.
Data source : Fousquare API : "https://developer.foursquare.com/"
Description : By using this api I will get all the venues in each neighborhood.

In [7]:
import pandas as pd 
import json 
import requests 
from bs4 import BeautifulSoup
import xml
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors

In [8]:
List_url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
source = requests.get(List_url).text
soup = BeautifulSoup(source, 'xml')
table=soup.find('table')
column_names = ['Postalcode','Borough','Neighborhood']
df = pd.DataFrame(columns = column_names)

In [9]:
for tr_cell in table.find_all('tr'):
    row_data=[]
    for td_cell in tr_cell.find_all('td'):
        row_data.append(td_cell.text.strip())
    if len(row_data)==3:
        df.loc[len(df)] = row_data

In [10]:
df.head(10)

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


### Methodology section 

My approach will consist of:
- Data will be collected from "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M" and cleaned and processed into a dataframe.
- FourSquare be used to locate all venues and then filtered by Sushi restaurants. 
- Finally, the data be will be visually assessed using graphing from Python libraries.

### Problem Statement

What is the best location(s) for Sushi venue in Toronto?

In what Neighborhood and/or borough should the investor open a sushi restaurant to have the best chance of being successful?

### Clean data

My variables:

In [11]:
# Define Foursquare Credentials and Version
CLIENT_ID = '1LEHHKR14HBTIBTL03KTOM53KJZQT34IPPG3O0IDR14SAQNY' # your Foursquare ID
CLIENT_SECRET = 'QGH0NYTR5DIOPKGYMLTYMAIFZGVNL2OQ4TSNX4JCPRDXOMHK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

Define functions: 

In [12]:
def geo_location(address):
    # get geo location of address
    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    return latitude,longitude


def get_venues(lat,lng):
    #set variables
    radius=400
    LIMIT=100
    #url to fetch data from foursquare api
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
    # get all the data
    results = requests.get(url).json()
    venue_data=results['response']['groups'][0]['items']
    venue_details=[]
    for row in venue_data:
        try:
            venue_id=row['venue']['id']
            venue_name=row['venue']['name']
            venue_category=row['venue']['categories'][0]['name']
            venue_details.append([venue_id,venue_name,venue_category])
        except KeyError:
            pass
    column_names=['ID','Name','Category']
    df = pd.DataFrame(venue_details,columns=column_names)
    return df


def get_venue_details(venue_id):
    #url to fetch data from foursquare api
    url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(
            venue_id,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION)
    # get all the data
    results = requests.get(url).json()
    print(results)
    venue_data=results['response']['venue']
    venue_details=[]
    try:
        venue_id=venue_data['id']
        venue_name=venue_data['name']
        venue_likes=venue_data['likes']['count']
        venue_rating=venue_data['rating']
        venue_tips=venue_data['tips']['count']
        venue_details.append([venue_id,venue_name,venue_likes,venue_rating,venue_tips])
    except KeyError:
        pass
    column_names=['ID','Name','Likes','Rating','Tips']
    df = pd.DataFrame(venue_details,columns=column_names)
    return df


def get_new_york_data():
    url='https://cocl.us/new_york_dataset'
    resp=requests.get(url).json()
    # all data is present in features label
    features=resp['features']
    # define the dataframe columns
    column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
    # instantiate the dataframe
    new_york_data = pd.DataFrame(columns=column_names)
    for data in features:
        borough = data['properties']['borough'] 
        neighborhood_name = data['properties']['name']
        neighborhood_latlon = data['geometry']['coordinates']
        neighborhood_lat = neighborhood_latlon[1]
        neighborhood_lon = neighborhood_latlon[0]
        new_york_data = new_york_data.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    return new_york_data

 Ignore cells with a borough that is Not assigned

In [13]:
df.drop(df[df.Borough == 'Not assigned'].index, inplace=True)

Since all other exercises are done in the data, I find the number of rows:

In [14]:
df.shape

(103, 3)

Load Geo data

In [15]:
df_geo = pd.read_csv('Geospatial_Coordinates.csv')
df_geo.head(10)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


Start by renaming the column name

In [16]:
df_geo.rename(columns={'Postal Code':'Postalcode'},inplace=True)

Merge the two dataframes

In [17]:
df_merged = df.join(df_geo.set_index('Postalcode'), on='Postalcode')

In [18]:
df_merged.head(10)

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.753259,-79.329656
3,M4A,North York,Victoria Village,43.725882,-79.315572
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
9,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
11,M3B,North York,Don Mills,43.745906,-79.352188
12,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
13,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


Explore the neigborhoods, by first finding the latitude and longitude of Toronto

In [19]:
#!pip install geopy 

In [20]:
import folium # map rendering library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [21]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_merged['Latitude'], df_merged['Longitude'], df_merged['Borough'], df_merged['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Cluster the Neighborhoods with k-means

In [22]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5

toronto_grouped_clustering = df_merged.drop(['Neighborhood', 'Borough', 'Postalcode'], 1)

toronto_grouped_clustering.head()



Unnamed: 0,Latitude,Longitude
2,43.753259,-79.329656
3,43.725882,-79.315572
4,43.65426,-79.360636
5,43.718518,-79.464763
6,43.662301,-79.389494


In [23]:
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 2, 0, 2, 1, 3, 4, 4, 2])

Add cluster labels to data frame

In [24]:
df_merged.insert(0, 'Cluster Labels', kmeans.labels_)

df_merged.head() # check the last columns!

Unnamed: 0,Cluster Labels,Postalcode,Borough,Neighborhood,Latitude,Longitude
2,4,M3A,North York,Parkwoods,43.753259,-79.329656
3,4,M4A,North York,Victoria Village,43.725882,-79.315572
4,2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
5,0,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
6,2,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


Create map with clusters

In [25]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_merged['Latitude'], df_merged['Longitude'], df_merged['Neighborhood'], df_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [26]:
CLIENT_ID = 'xx' # your Foursquare ID
CLIENT_SECRET = 'xx' # your Foursquare Secret
ACCESS_TOKEN = 'xx' # your FourSquare Access Token - CIHQLC3JKK3KQFR4BT4V2V2JZFB243OSX0V1ID1D53YBIE3Q
VERSION = '20200604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1LEHHKR14HBTIBTL03KTOM53KJZQT34IPPG3O0IDR14SAQNY
CLIENT_SECRET:QGH0NYTR5DIOPKGYMLTYMAIFZGVNL2OQ4TSNX4JCPRDXOMHK


In [27]:
#address = '102 North End Ave, New York, NY'

#geolocator = Nominatim(user_agent="foursquare_agent")
#location = geolocator.geocode(address)
#latitude = location.latitude
#longitude = location.longitude
#print(latitude, longitude)

In [28]:
#Find sushi venues
search_query = 'Sushi'
radius = 5000000
print(search_query + ' .... OK!')

#Define the url
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, 
  CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)

#Send the GET Request
results = requests.get(url).json()

# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
from pandas.io.json import json_normalize
dataframe = json_normalize(venues)
dataframe.head()

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head()

Sushi .... OK!


  dataframe = json_normalize(venues)


Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Sushi-Q,Sushi Restaurant,220 Yonge St.,"in Urban Eatery, Toronto Eaton Centre",43.654801,-79.380813,"[{'label': 'display', 'lat': 43.65480108229508...",291,M5B 2H6,CA,Toronto,ON,Canada,"[220 Yonge St. (in Urban Eatery, Toronto Eaton...",,4b464cd6f964a520f11c26e3
1,Kathy's Sushi and Bento,Sushi Restaurant,187 Dundas St W,at Centre,43.655031,-79.386724,"[{'label': 'display', 'lat': 43.65503087178274...",283,M5G 1C7,CA,Toronto,ON,Canada,"[187 Dundas St W (at Centre), Toronto ON M5G 1...",,4b60aa5cf964a520fef229e3
2,Sushi Shop,Restaurant,130 King St W,,43.648528,-79.383335,"[{'label': 'display', 'lat': 43.648528, 'lng':...",553,M5X 2A2,CA,Toronto,ON,Canada,"[130 King St W, Toronto ON M5X 2A2, Canada]",,5b84f9d3340a58002c50fcc8
3,Sushi & B.B.Q,Sushi Restaurant,294 Dundas St West,McCaul St,43.654292,-79.391024,"[{'label': 'display', 'lat': 43.654292, 'lng':...",578,M5T 1G2,CA,Toronto,ON,Canada,"[294 Dundas St West (McCaul St), Toronto ON M5...",,4afc3b84f964a520b92022e3
4,Sushi Shop,Sushi Restaurant,145 King Street West,York St,43.647198,-79.384004,"[{'label': 'display', 'lat': 43.64719838811793...",699,M5H 1J8,CA,,,Canada,"[145 King Street West (York St), M5H 1J8, Canada]",,51cb1865498e6fe61898a169


In [29]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### Results / Conclusion


Toronto have the best rated Sushi restaurants on average. A more comprehensive analysis and future work would need to incorporate data from other external databases