<H1> Battle of the Neighbourhoods

<H2> New Italian Restaurnat in Munich

Author: Flavio Burri

## Table of contents
* [Introduction and description of business problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#conclusion)

### Introduction and description of business problem <a name="introduction"></a>

To open a new restaurant is a challenge due to many different reasons. One of the hardest task is to decide WHERE to open the restaurant (e.g. an Italian Restaurant) within the city that you decided. The present exercise aim to help the hypothetical owner of the new Italian restaurant to locate his activity within the city of Toronto (in Canada). Leveraging Data Science can help to simplify this task and supporting the owner to find out the best place of the city where opening the restaurant.
The work will be of understanding where are located other Italian restaurants in Toronto  keeping in mind the idea that normally the people prefer going out for a meal in a place where there are also other kind of meeting venues like coffees and bars. We will also investigate if there is a correlation between some specific venues and restaurants in a particular neighborhood in order to evaluate if some meeting places can be found more frequently near the restaurants. This will help us to identify a location for our restaurant where there are less competitors.

### Data <a name="data"></a>

The data we will use are:
-	List of the neighborhoods of Toronto coming from scrapping a Wikipedia web page. (See the notebook)
-	List of Latitudes and Longitudes of all neighborhoods coming from a file used in a previous exercise of this course.
-	List of Italian restaurants and other venues of the city found leveraging Foursquare API.

All those data will helps to do our job which will be starting by understanding where are located all the Italian restaurants in Toronto. 
After we will investigate if there is a correlation between others social venues and restaurants in these neighborhoods. The idea is to evaluate if some venues might be found more frequently near restaurants. If this correlation is applicable e.g. with bars, this will help to identify possible locations for the new restaurant. Those suitable locations will be where there is a high density of the meeting venues (e.g. bars) and just few restaurants (and no Italian restaurants) nearby.

### Methodology <a name="methodology"></a>

First we collect the data necessary to set up the model. We have already created a database of Toronto locations in an exercise before that will be uploaded into this notebook.
We will use these data along with geospatial and Foursquare data to define the area of interest by looking at where the majority of Italian Restaurants are located in Toronto. After that, we will use the data to understand which are the most frequent venues near the other Italian Restaurants, and use these data to create a simple graphic from which we will be able to define the best location for the new Restaurant.

First, let's install some useful libraries

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


In [2]:
#!conda update -n base -c defaults conda --yes

In [3]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported.')

Libraries imported.


## Analysis <a name="analysis"></a>

##### Scraping a table from Wikipedia and clean the table

In [4]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(url)
len(df)

3

In [5]:
#Reading the table which is in the first place and set the header
df_toronto = df[0]
new_header = df_toronto.iloc[0] #grab the first row for the header
df_toronto = df_toronto[1:] #take the data less the header row
df_toronto.columns = new_header #set the header row as the df_toronto header
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [6]:
df_toronto.shape

(180, 3)

##### Creating the dataframe containing the geographical coordinates of the neighborhoods

We import csv data of Toronto from file (reference http://cocl.us/Geospatial_data)

In [7]:
#importing the csv file with coordinates
geo = pd.read_csv('http://cocl.us/Geospatial_data')
geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [8]:
geo.shape

(103, 3)

We can merge the two dataframes and create a new one 

In [9]:
# merging table with neighbourhoods and the one above with coordinates
df_toronto = pd.merge(df_toronto, geo, on='Postal Code')
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


##### Exploring and clustering the neighborhoods in Toronto and visualise them on the maps as clusters

In [10]:
#!wget -q -O 'toronto_data.json' http://adamw523.com/toronto-geojson/
#print('Data downloaded!')

In [11]:
# Define the Foursquare credentials and version---------------
# My personal credentiala after use will be removed for privacy reason

CLIENT_ID = 'PUR3VSJFUQSEHZ3MKTY5OLTU5OJG4NMFKP2WBGIGFJXG10RS' # your Foursquare ID
CLIENT_SECRET = 'THK4KZ2T2BQVWMIPVJFWDMOGYR4SO5XZ5UD5FEPZP0P35LDP' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 10000
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PUR3VSJFUQSEHZ3MKTY5OLTU5OJG4NMFKP2WBGIGFJXG10RS
CLIENT_SECRET:THK4KZ2T2BQVWMIPVJFWDMOGYR4SO5XZ5UD5FEPZP0P35LDP


In [12]:
# Using geopy library to get geopy long. and latit. of Toronto-------------

address_TO = 'Toronto, ON'
geolocator = Nominatim(user_agent="toronto_agent")
location = geolocator.geocode(address_TO)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [13]:
#Visualize Toronto maps with Folium------------------------------

toronto_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map

##### We create a function to get names and information on the venues for all the neighbourhoods in Toronto

In [14]:
# Explore neighbourhoods in Toronto-----------------------
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 'Neighbourhood Latitude', 
                             'Neighbourhood Longitude', 
                             'Venue', 
                             'Venue Latitude', 
                             'Venue Longitude', 
                             'Venue Category']
    
    return(nearby_venues)

In [15]:
toronto_venues = getNearbyVenues(names=df_toronto['Postal Code'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude']
                                  )
toronto_venues.head()

Unnamed: 0,Postal Code,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M3A,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,M3A,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,M4A,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,M4A,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,M4A,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


##### Now we can create a dataframe of venues with category "Italian Restaurant"

In [16]:
toronto_venues_it_rest = toronto_venues[toronto_venues['Venue Category'] == 'Italian Restaurant']
toronto_venues_it_rest.head()

Unnamed: 0,Postal Code,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
66,M7A,43.662301,-79.389494,Mercatto,43.660391,-79.387664,Italian Restaurant
163,M5B,43.657162,-79.378937,Scaddabush Italian Kitchen & Bar,43.65892,-79.382891,Italian Restaurant
181,M5B,43.657162,-79.378937,Trattoria Mercatto,43.654453,-79.380974,Italian Restaurant
222,M3C,43.7259,-79.340923,Sorento Restaurant,43.726575,-79.341989,Italian Restaurant
254,M5C,43.651494,-79.375418,Terroni,43.650927,-79.375602,Italian Restaurant


Now we need to understand which is the most common Venue around an Italian Restaurant.
This will help us in defining in a more precise way the area in which an Italian Restaurant is more likely to have success and customers.
##### Creating a dataframe with the distances between Italian Restaurants and the other Venues.

In [17]:
from math import radians, cos, sin, asin, sqrt

#define a function to evaluate distance between venues
#ref. https://en.wikipedia.org/wiki/Haversine_formula

def haversine(lon1, lat1, lon2, lat2):
  # convert decimal degrees to radians 
  lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
 
  # haversine formula 
  distance_lon = lon2 - lon1 
  distance_lat = lat2 - lat1
  a = sin(distance_lat/2)**2 + cos(lat1) * cos(lat2) * sin(distance_lon/2)**2
  earth_r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    
  return 2 * asin(sqrt(a)) * earth_r #result in km

restaurant_venues = pd.DataFrame(columns = ['Restaurant','Venue type', 'Restaurant longitude', 'Restaurant latitude', 'Venue longitude','Venue latitude', 'Distance'])

for rest, lng, lat in zip(toronto_venues_it_rest['Venue Category'], toronto_venues_it_rest['Venue Longitude'], toronto_venues_it_rest['Venue Latitude']):
    for venue, lng2, lat2 in zip(toronto_venues['Venue Category'], toronto_venues['Venue Longitude'], toronto_venues['Venue Latitude']):
        distance = haversine(lng, lat, lng2, lat2)
        restaurant_venues = restaurant_venues.append({'Restaurant': rest, 
                                                      'Venue type': venue, 
                                                      'Restaurant longitude': lng,
                                                      'Restaurant latitude': lat,
                                                      'Venue longitude': lng2,
                                                      'Venue latitude': lat2,
                                                      'Distance': distance
                                                     }, ignore_index=True)

print('----------------Database created!----------------')
        

----------------Database created!----------------


In [18]:
restaurant_venues.shape

(87781, 7)

We then filter the database by removing informations not useful with the following criteria

1. Rows where distance is equal to 0 (we have evaluated distance between a venue and itself)
2. Rows where distance is greater than 200 meters

After that we sort the values in descending order, in order to find the top venues located near Italian Restaurants

In [19]:
restaurant_venues_mod = restaurant_venues[restaurant_venues['Distance'] != 0] #remove 0 distance rows
restaurant_venues_mod200 = restaurant_venues_mod[restaurant_venues['Distance'] < 0.2] #keep only the venues up to 200 meters
restaurant_venues_mod200.head()

  


Unnamed: 0,Restaurant,Venue type,Restaurant longitude,Restaurant latitude,Venue longitude,Venue latitude,Distance
67,Italian Restaurant,Coffee Shop,-79.387664,43.660391,-79.38583,43.66013,0.150376
68,Italian Restaurant,Portuguese Restaurant,-79.387664,43.660391,-79.386391,43.661728,0.180543
80,Italian Restaurant,Fried Chicken Joint,-79.387664,43.660391,-79.389475,43.659167,0.19935
93,Italian Restaurant,Coffee Shop,-79.387664,43.660391,-79.388696,43.658906,0.184787
459,Italian Restaurant,Coffee Shop,-79.387664,43.660391,-79.38583,43.66013,0.150376


In [20]:
restaurant_venues_mod200.shape

(1062, 7)

In [21]:
restaurant_venues_mod200_sorted = restaurant_venues_mod200.groupby('Venue type').count().sort_values(['Distance'], ascending=[False])
restaurant_venues_mod200_sorted

Unnamed: 0_level_0,Restaurant,Restaurant longitude,Restaurant latitude,Venue longitude,Venue latitude,Distance
Venue type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Coffee Shop,134,134,134,134,134,134
Restaurant,61,61,61,61,61,61
Hotel,54,54,54,54,54,54
Italian Restaurant,46,46,46,46,46,46
Café,42,42,42,42,42,42
Cocktail Bar,39,39,39,39,39,39
Gastropub,38,38,38,38,38,38
Gym,33,33,33,33,33,33
Beer Bar,32,32,32,32,32,32
Japanese Restaurant,32,32,32,32,32,32


##### We want to select only the venues that are not restaurants. 
The first three venues are:

1. Coffee Shop
2. Bar
3. Hotel

We will use those three to better define where to open the restaurant.

In [22]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighbourhood column back to dataframe
toronto_onehot['Postal Code'] = toronto_venues_it_rest['Postal Code']
toronto_onehot['Neighbourhood Latitude'] = toronto_venues_it_rest['Neighbourhood Latitude'] 
toronto_onehot['Neighbourhood Longitude'] = toronto_venues_it_rest['Neighbourhood Longitude'] 

# move neighbourhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot_italian = toronto_onehot[toronto_onehot['Italian Restaurant'] != 0]
toronto_onehot_italian.reset_index()
toronto_onehot_italian.head()

Unnamed: 0,Neighbourhood Longitude,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Business Service,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Cafeteria,College Gym,College Rec Center,College Stadium,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kids Store,Kitchen Supply Store,Korean BBQ Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Luggage Store,Malay Restaurant,Market,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,River,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Social Club,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Postal Code,Neighbourhood Latitude
66,-79.389494,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,M7A,43.662301
163,-79.378937,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,M5B,43.657162
181,-79.378937,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,M5B,43.657162
222,-79.340923,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,M3C,43.7259
254,-79.375418,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,M5C,43.651494


##### Creating a Cluster Map of Toronto showing the number of Italian Restaurants per each location

In [23]:
from folium import plugins
italian_restaurant_map = folium.Map(location=[latitude, longitude], zoom_start=13)

# instantiate a mark cluster object for the incidents in the dataframe
italian_restaurant = plugins.MarkerCluster().add_to(italian_restaurant_map)

# add markers to map
for lat, lng, label in zip(toronto_onehot_italian['Neighbourhood Latitude'], toronto_onehot_italian['Neighbourhood Longitude'], toronto_onehot_italian['Postal Code']):
    label = '{}'.format(label)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(italian_restaurant)
    
italian_restaurant_map

As we can see from these data, 21 Italian Restaurants may be found between Toronto Union Station and the University of Toronto. This probably means that this is one of the most frequented zone in Toronto, therefore this is one of the best place  where to open an Italian Restaurant.

We can then use the data about the other venues near Italian Restaurants to better define the area on which focus our attention.

##### Quering Foursquare and cleaning the data obtained

In [24]:
search_query = 'Italian Restaurant'
radius = 50000
#print(search_query + ' .... OK!')

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=PUR3VSJFUQSEHZ3MKTY5OLTU5OJG4NMFKP2WBGIGFJXG10RS&client_secret=THK4KZ2T2BQVWMIPVJFWDMOGYR4SO5XZ5UD5FEPZP0P35LDP&ll=43.6534817,-79.3839347&v=20180604&query=Italian Restaurant&radius=50000&limit=10000'

And check the obtained dataframe

In [25]:
results = requests.get(url).json()
# assign relevant part of JSON to venues
venues = results['response']['venues']
# tranform venues into a dataframe
dataframe = json_normalize(venues)
#dataframe.drop(dataframe[(dataframe['location.postalCode'] == 'NaN')].index(), inplace=True)

dataframe.shape

(50, 19)

In [26]:
dataframe

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d10f941735', 'name': 'I...",False,4e74ce151838f918898efe72,6350 Tomken Rd.,CA,Mississauga,Canada,Tristar,22882,"[6350 Tomken Rd. (Tristar), Mississauga ON, Ca...","[{'label': 'display', 'lat': 43.65285913569359...",43.652859,-79.66804,,,ON,Roma Italian Restaurant,v-1607940955,
1,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",False,4de024f0b0fbe2cfa5fee3c4,,CA,Toronto,Canada,,3430,"[Toronto ON, Canada]","[{'label': 'display', 'lat': 43.67656199554484...",43.676562,-79.355699,,,ON,Florentina's Italian Restaurant,v-1607940955,
2,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",False,4b4f6a73f964a520960527e3,24-40 Bradwick Dr,CA,Vaughan,Canada,btw Keele St. & Dufferin St.,20061,[24-40 Bradwick Dr (btw Keele St. & Dufferin S...,"[{'label': 'display', 'lat': 43.8182382, 'lng'...",43.818238,-79.485024,,L4K 1K9,ON,Junnio's Italian Restaurant,v-1607940955,
3,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",False,4b199b46f964a5205be023e3,2625 Weston,CA,Toronto,Canada,401,13546,"[2625 Weston (401), Toronto ON, Canada]","[{'label': 'display', 'lat': 43.71194605276710...",43.711946,-79.53151,,,ON,Jolly II Italian Restaurant,v-1607940955,
4,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",False,4b107754f964a520147123e3,4505 Sheppard Ave E,CA,Scarborough,Canada,,17771,"[4505 Sheppard Ave E, Scarborough ON M1S 1V3, ...","[{'label': 'display', 'lat': 43.788071, 'lng':...",43.788071,-79.265134,,M1S 1V3,ON,Joey Bravo's Italian Restaurant,v-1607940955,
5,"[{'id': '56aa371be4b08b9a8d573550', 'name': 'F...",False,55e549c8498ea9b7acd67e8b,6720 Davand Drive,CA,Mississauga,Canada,,22923,"[6720 Davand Drive, Mississauga ON L5T 2K7, Ca...","[{'label': 'display', 'lat': 43.66758, 'lng': ...",43.66758,-79.66792,,L5T 2K7,ON,Il Porcellino Italian Restaurant And Catering,v-1607940955,
6,[],False,4d5577519e508cfa9c72009b,Derry Road West,CA,Mississauga,Canada,,21855,"[Derry Road West, Mississauga ON, Canada]","[{'label': 'display', 'lat': 43.703068, 'lng':...",43.703068,-79.646597,,,ON,Buda's Italian Restaurant,v-1607940955,
7,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",False,4c7d330c5af8b60cb8d38e10,,CA,,Canada,,23522,[Canada],"[{'label': 'display', 'lat': 43.68860510830914...",43.688605,-79.672008,,,,Mia Italian Restaurant,v-1607940955,
8,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",False,4e24ce2bd164e1d5a512bfe6,,CA,,Canada,,27669,"[Ontario, Canada]","[{'label': 'display', 'lat': 43.88753489635225...",43.887535,-79.499824,,,Ontario,Marchellos italian restaurant,v-1607940955,
9,[],False,4b53882ef964a5200fa127e3,,CA,,Canada,,32412,[Canada],"[{'label': 'display', 'lat': 43.44640202412109...",43.446402,-79.666352,,,,Roccos italian restaurant,v-1607940955,


As we can see, the name of the columns are not user friendly, so we work a bit on this dataframe to change the column names.

In [27]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Roma Italian Restaurant,Indian Restaurant,6350 Tomken Rd.,CA,Mississauga,Canada,Tristar,22882,"[6350 Tomken Rd. (Tristar), Mississauga ON, Ca...","[{'label': 'display', 'lat': 43.65285913569359...",43.652859,-79.66804,,,ON,4e74ce151838f918898efe72
1,Florentina's Italian Restaurant,Italian Restaurant,,CA,Toronto,Canada,,3430,"[Toronto ON, Canada]","[{'label': 'display', 'lat': 43.67656199554484...",43.676562,-79.355699,,,ON,4de024f0b0fbe2cfa5fee3c4
2,Junnio's Italian Restaurant,Restaurant,24-40 Bradwick Dr,CA,Vaughan,Canada,btw Keele St. & Dufferin St.,20061,[24-40 Bradwick Dr (btw Keele St. & Dufferin S...,"[{'label': 'display', 'lat': 43.8182382, 'lng'...",43.818238,-79.485024,,L4K 1K9,ON,4b4f6a73f964a520960527e3
3,Jolly II Italian Restaurant,Italian Restaurant,2625 Weston,CA,Toronto,Canada,401,13546,"[2625 Weston (401), Toronto ON, Canada]","[{'label': 'display', 'lat': 43.71194605276710...",43.711946,-79.53151,,,ON,4b199b46f964a5205be023e3
4,Joey Bravo's Italian Restaurant,American Restaurant,4505 Sheppard Ave E,CA,Scarborough,Canada,,17771,"[4505 Sheppard Ave E, Scarborough ON M1S 1V3, ...","[{'label': 'display', 'lat': 43.788071, 'lng':...",43.788071,-79.265134,,M1S 1V3,ON,4b107754f964a520147123e3


We create a new column named "Postcode" that will be used for the graphics.

In [28]:
#Keep only first 3 of Postcode
dataframe_filtered['Postcode'] = dataframe_filtered['postalCode'].str[:3]
dataframe_filtered.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,Postcode
0,Roma Italian Restaurant,Indian Restaurant,6350 Tomken Rd.,CA,Mississauga,Canada,Tristar,22882,"[6350 Tomken Rd. (Tristar), Mississauga ON, Ca...","[{'label': 'display', 'lat': 43.65285913569359...",43.652859,-79.66804,,,ON,4e74ce151838f918898efe72,
1,Florentina's Italian Restaurant,Italian Restaurant,,CA,Toronto,Canada,,3430,"[Toronto ON, Canada]","[{'label': 'display', 'lat': 43.67656199554484...",43.676562,-79.355699,,,ON,4de024f0b0fbe2cfa5fee3c4,
2,Junnio's Italian Restaurant,Restaurant,24-40 Bradwick Dr,CA,Vaughan,Canada,btw Keele St. & Dufferin St.,20061,[24-40 Bradwick Dr (btw Keele St. & Dufferin S...,"[{'label': 'display', 'lat': 43.8182382, 'lng'...",43.818238,-79.485024,,L4K 1K9,ON,4b4f6a73f964a520960527e3,L4K
3,Jolly II Italian Restaurant,Italian Restaurant,2625 Weston,CA,Toronto,Canada,401,13546,"[2625 Weston (401), Toronto ON, Canada]","[{'label': 'display', 'lat': 43.71194605276710...",43.711946,-79.53151,,,ON,4b199b46f964a5205be023e3,
4,Joey Bravo's Italian Restaurant,American Restaurant,4505 Sheppard Ave E,CA,Scarborough,Canada,,17771,"[4505 Sheppard Ave E, Scarborough ON M1S 1V3, ...","[{'label': 'display', 'lat': 43.788071, 'lng':...",43.788071,-79.265134,,M1S 1V3,ON,4b107754f964a520147123e3,M1S


We now check the data obtained

In [29]:
dataframe_filtered.groupby('categories').count()

Unnamed: 0_level_0,name,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,Postcode
categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
American Restaurant,4,3,4,3,4,0,4,4,4,4,4,0,2,3,4,2
Asian Restaurant,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1
Bar,2,2,2,2,2,1,2,2,2,2,2,0,1,2,2,1
Breakfast Spot,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1
Caribbean Restaurant,2,1,2,1,2,1,2,2,2,2,2,0,1,1,2,1
Chinese Restaurant,4,4,4,4,4,2,4,4,4,4,4,0,3,4,4,3
Diner,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1
Embassy / Consulate,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0
Food Service,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1
Food Truck,1,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0


We see from table above we see that from the search query we have also obtained other categories we are not interested in. Therefore we filter again the dataframe to obtain the final version containing only Italian Restaurants.

In [30]:
italian_restaurant_df = dataframe_filtered.loc[(dataframe_filtered['categories'] == 'Italian Restaurant')].reset_index()
italian_restaurant_df

Unnamed: 0,index,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,Postcode
0,1,Florentina's Italian Restaurant,Italian Restaurant,,CA,Toronto,Canada,,3430,"[Toronto ON, Canada]","[{'label': 'display', 'lat': 43.67656199554484...",43.676562,-79.355699,,,ON,4de024f0b0fbe2cfa5fee3c4,
1,3,Jolly II Italian Restaurant,Italian Restaurant,2625 Weston,CA,Toronto,Canada,401,13546,"[2625 Weston (401), Toronto ON, Canada]","[{'label': 'display', 'lat': 43.71194605276710...",43.711946,-79.53151,,,ON,4b199b46f964a5205be023e3,
2,7,Mia Italian Restaurant,Italian Restaurant,,CA,,Canada,,23522,[Canada],"[{'label': 'display', 'lat': 43.68860510830914...",43.688605,-79.672008,,,,4c7d330c5af8b60cb8d38e10,
3,8,Marchellos italian restaurant,Italian Restaurant,,CA,,Canada,,27669,"[Ontario, Canada]","[{'label': 'display', 'lat': 43.88753489635225...",43.887535,-79.499824,,,Ontario,4e24ce2bd164e1d5a512bfe6,
4,10,Nino's Authentic Italian Restaurant,Italian Restaurant,438 Kerr St,CA,Oakville,Canada,,33529,"[438 Kerr St, Oakville ON L6K 3C4, Canada]","[{'label': 'display', 'lat': 43.44530127192647...",43.445301,-79.684267,,L6K 3C4,ON,55bcde11498e244a3a859792,L6K
5,11,Focacia's Italian Restaurant,Italian Restaurant,,CA,,Canada,,36962,[Canada],"[{'label': 'display', 'lat': 43.853653, 'lng':...",43.853653,-79.017173,,,,4c9a4464a004a1cd97de466e,
6,12,Fabbrica Rustic Italian,Italian Restaurant,66 Wellington St W,CA,Toronto,Canada,,726,"[66 Wellington St W, Toronto ON M5K 1E7, Canada]","[{'label': 'display', 'lat': 43.647161, 'lng':...",43.647161,-79.381691,,M5K 1E7,ON,5b897e92db1d81002c91df8c,M5K
7,14,Scaddabush Italian Kitchen & Bar,Italian Restaurant,"382 Yonge Street, Unit #7",CA,Toronto,Canada,Gerrard,611,"[382 Yonge Street, Unit #7 (Gerrard), Toronto ...","[{'label': 'display', 'lat': 43.65892029202872...",43.65892,-79.382891,,M5B 1S8,ON,52f6816f11d24a43115dc834,M5B
8,16,Elm Street Italian Deli,Italian Restaurant,15 Elm Street,CA,Toronto,Canada,,482,"[15 Elm Street, Toronto ON M5G 1G7, Canada]","[{'label': 'display', 'lat': 43.65769, 'lng': ...",43.65769,-79.38248,,M5G 1G7,ON,5e594c8a3de308000870c948,M5G
9,17,The Express Italian Restaurant,Italian Restaurant,1477 Lakeshore Rd,CA,Burlington,Canada,,49531,"[1477 Lakeshore Rd, Burlington ON L7S 1B5, Can...","[{'label': 'display', 'lat': 43.32437, 'lng': ...",43.32437,-79.79667,,L7S 1B5,ON,5b394376c66666002c5e54ef,L7S


##### Creating Marcker Cluster Maps to focus on analysis
Now let's use the data we have to create a Marker Cluster map to see where to focus our attention in the data analysis.

In [31]:
from folium import plugins
italian_restaurant_map = folium.Map(location=[latitude, longitude], zoom_start=14)

# instantiate a mark cluster object for the incidents in the dataframe
#italian_restaurant_df_marker = plugins.MarkerCluster().add_to(italian_restaurant_map)

# add markers to map
for lat, lng, label in zip(italian_restaurant_df['lat'], italian_restaurant_df['lng'], italian_restaurant_df['name']):
    label = '{}'.format(label)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(italian_restaurant_map)
    
italian_restaurant_map

From this graphic we see that the majority of the Italian Restaurants are located between Toronto Union Station and college street.

Now we check if the top venues that are commonly found around Italian Restaurants can help us in defining interesting areas

In [32]:
#First search: Coffee Shop
search_query = 'Coffee Shop'
radius = 50000

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)

coffee_shop = requests.get(url).json()
# assign relevant part of JSON to venues
venues_coffee_shop = coffee_shop['response']['venues']
# tranform venues into a dataframe
coffee_shop = json_normalize(venues_coffee_shop)
#dataframe.drop(dataframe[(dataframe['location.postalCode'] == 'NaN')].index(), inplace=True)

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in coffee_shop.columns if col.startswith('location.')] + ['id']
coffee_shop = coffee_shop.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
coffee_shop['categories'] = coffee_shop.apply(get_category_type, axis=1)

# clean column names by keeping only last term
coffee_shop.columns = [column.split('.')[-1] for column in coffee_shop.columns]

#Keep only first 3 of Postcode
coffee_shop['Postcode'] = coffee_shop['postalCode'].str[:3]

coffee_shop.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,Postcode
0,Super Jet International Coffee Shop,Coffee Shop,267 College St.,CA,Toronto,Canada,Spadina Ave,1371,"[267 College St. (Spadina Ave), Toronto ON M5T...","[{'label': 'display', 'lat': 43.657971, 'lng':...",43.657971,-79.399795,,M5T 1R5,ON,5ae32b6412c8f0002c2b03e7,M5T
1,Bluestone Lane Queen Station Coffee Shop,Café,2 Queen Street East,CA,Toronto,Canada,,406,"[2 Queen Street East, Toronto ON M5C 3G5, Canada]","[{'label': 'display', 'lat': 43.6525684, 'lng'...",43.652568,-79.379047,,M5C 3G5,ON,5d493dc5735c2d0007bc3966,M5C
2,Niagara coffee shop,Coffee Shop,Niagara St.,CA,Toronto,Canada,,1973,"[Niagara St., Toronto ON, Canada]","[{'label': 'display', 'lat': 43.64188299139057...",43.641883,-79.402463,,,ON,4bc796086501c9b6a1ba3e29,
3,GRIP 6th Floor Coffee Shop,Coffee Shop,179 John St.,CA,Toronto,Canada,at Queen St.,649,"[179 John St. (at Queen St.), Toronto ON M5T 1...","[{'label': 'display', 'lat': 43.65103493678703...",43.651035,-79.391256,,M5T 1X4,ON,4c6bd9de69b4ef3b4d51474e,M5T
4,Luis Coffee Shop,Coffee Shop,235 Augusta St,CA,Toronto,Canada,Baldwin & Augusta,1460,"[235 Augusta St (Baldwin & Augusta), Toronto O...","[{'label': 'display', 'lat': 43.654575, 'lng':...",43.654575,-79.402006,,M5T 1M1,ON,4ae92277f964a52072b421e3,M5T


In [33]:
coffee_shop.groupby('categories').count()

Unnamed: 0_level_0,name,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,Postcode
categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Bakery,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1
Business Service,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1
Café,8,3,8,6,8,2,8,8,8,8,8,0,2,6,8,2
Coffee Shop,30,25,30,26,30,19,30,30,30,30,30,2,16,26,30,16
College Cafeteria,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1
Pop-Up Shop,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1
Pub,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0
Restaurant,3,3,3,3,3,0,3,3,3,3,3,0,3,3,3,3
Sporting Goods Shop,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1
Sushi Restaurant,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1


In [34]:
#keep only category Coffee Shop
coffee_shop = coffee_shop.loc[(coffee_shop['categories'] == 'Coffee Shop')].reset_index()
coffee_shop.head()

Unnamed: 0,index,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,Postcode
0,0,Super Jet International Coffee Shop,Coffee Shop,267 College St.,CA,Toronto,Canada,Spadina Ave,1371,"[267 College St. (Spadina Ave), Toronto ON M5T...","[{'label': 'display', 'lat': 43.657971, 'lng':...",43.657971,-79.399795,,M5T 1R5,ON,5ae32b6412c8f0002c2b03e7,M5T
1,2,Niagara coffee shop,Coffee Shop,Niagara St.,CA,Toronto,Canada,,1973,"[Niagara St., Toronto ON, Canada]","[{'label': 'display', 'lat': 43.64188299139057...",43.641883,-79.402463,,,ON,4bc796086501c9b6a1ba3e29,
2,3,GRIP 6th Floor Coffee Shop,Coffee Shop,179 John St.,CA,Toronto,Canada,at Queen St.,649,"[179 John St. (at Queen St.), Toronto ON M5T 1...","[{'label': 'display', 'lat': 43.65103493678703...",43.651035,-79.391256,,M5T 1X4,ON,4c6bd9de69b4ef3b4d51474e,M5T
3,4,Luis Coffee Shop,Coffee Shop,235 Augusta St,CA,Toronto,Canada,Baldwin & Augusta,1460,"[235 Augusta St (Baldwin & Augusta), Toronto O...","[{'label': 'display', 'lat': 43.654575, 'lng':...",43.654575,-79.402006,,M5T 1M1,ON,4ae92277f964a52072b421e3,M5T
4,5,John Ford's Classic Coffee Shop,Coffee Shop,1 Yonge St,CA,Toronto,Canada,Yonge and Queens Quay,1491,"[1 Yonge St (Yonge and Queens Quay), Toronto O...","[{'label': 'display', 'lat': 43.64191404782495...",43.641914,-79.37459,,M5J 2S7,ON,4fe359f6d5fb75c734446273,M5J


In [35]:
italian_restaurant_map = folium.Map(location=[latitude, longitude], zoom_start=15)

# instantiate a mark cluster object for the incidents in the dataframe
#italian_restaurant_df_marker = plugins.MarkerCluster().add_to(italian_restaurant_map)

# add markers to map
for lat, lng, label in zip(italian_restaurant_df['lat'], italian_restaurant_df['lng'], italian_restaurant_df['name']):
    label = '{}'.format(label)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(italian_restaurant_map)

# add markers of Coffee Shops to map
for lat, lng, label in zip(coffee_shop['lat'], coffee_shop['lng'], coffee_shop['name']):
    label = '{}'.format(label)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        location=[lat, lng],
        icon=folium.Icon(color = 'green'),
        popup=label,
    ).add_to(italian_restaurant_map)
    
italian_restaurant_map

From the above map we can see that a good area where there are Coffee Shops and not too many Italian Restaurants is the rectangle between "Osgoode", "St. Patrick", "Dundas" and "Queen".

Let's try to refine our search by looking at other venues

In [36]:
#Second search: Bar
search_query = 'Bar'
radius = 50000

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)

bar = requests.get(url).json()
# assign relevant part of JSON to venues
venues_bar = bar['response']['venues']
# tranform venues into a dataframe
bar = json_normalize(venues_bar)
#dataframe.drop(dataframe[(dataframe['location.postalCode'] == 'NaN')].index(), inplace=True)

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in bar.columns if col.startswith('location.')] + ['id']
bar = bar.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
bar['categories'] = bar.apply(get_category_type, axis=1)

# clean column names by keeping only last term
bar.columns = [column.split('.')[-1] for column in bar.columns]

#Keep only first 3 of Postcode
bar['Postcode'] = bar['postalCode'].str[:3]

bar.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,Postcode
0,St. Louis Bar & Grill,Bar,595 Bay St #A09,CA,Toronto,Canada,Atrium on Bay,356,"[595 Bay St #A09 (Atrium on Bay), Toronto ON M...","[{'label': 'display', 'lat': 43.65656209045694...",43.656562,-79.382737,,M5G 2C2,ON,4af4e6e2f964a52052f721e3,M5G
1,Blue Stone Grill & Bar,Pub,372 Bay St.,CA,Toronto,Canada,at Richmond St.,340,"[372 Bay St. (at Richmond St.), Toronto ON M5H...","[{'label': 'display', 'lat': 43.65118713330148...",43.651187,-79.381139,,M5H 1M7,ON,4af8da92f964a5204a1022e3,M5H
2,Bar Verde,New American Restaurant,260 Yonge Street,CA,Toronto,Canada,,298,"[260 Yonge Street, Toronto ON M5B 2L9, Canada]","[{'label': 'display', 'lat': 43.654837, 'lng':...",43.654837,-79.380742,,M5B 2L9,ON,57dd99cb498ee67580d16390,M5B
3,Quinn's Steakhouse & Bar,Steakhouse,96 Richmond Street West,CA,Toronto,Canada,at Bay,265,"[96 Richmond Street West (at Bay), Toronto ON ...","[{'label': 'display', 'lat': 43.65119745750837...",43.651197,-79.382976,Financial District,M5H 2A3,ON,4b3db5e4f964a520709625e3,M5H
4,Barristers Bar,Restaurant,145 Richmond Street West,CA,Toronto,Canada,Hilton Toronto,437,"[145 Richmond Street West (Hilton Toronto), To...","[{'label': 'display', 'lat': 43.64979669604817...",43.649797,-79.385807,,M5H 2L2,ON,50ca02be245f2d4aa8c2ab5b,M5H


In [37]:
bar.groupby('categories').count()

Unnamed: 0_level_0,name,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,Postcode
categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
American Restaurant,2,2,2,2,2,2,2,2,2,2,2,0,2,2,2,2
BBQ Joint,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1
Bar,5,5,5,5,5,3,5,5,5,5,5,0,4,5,5,4
Beer Bar,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1
Café,4,4,4,4,4,4,4,4,4,4,4,0,1,4,4,1
Cocktail Bar,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1
Coffee Shop,3,3,3,3,3,1,3,3,3,3,3,0,2,3,3,2
Diner,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0
Health & Beauty Service,1,0,1,1,1,0,1,1,1,1,1,0,1,1,1,1
Hotel Bar,5,5,5,5,5,3,5,5,5,5,5,0,4,5,5,4


In [38]:
italian_restaurant_map = folium.Map(location=[latitude, longitude], zoom_start=14)

# instantiate a mark cluster object for the incidents in the dataframe
#italian_restaurant_df_marker = plugins.MarkerCluster().add_to(italian_restaurant_map)

# add markers to map
for lat, lng, label in zip(italian_restaurant_df['lat'], italian_restaurant_df['lng'], italian_restaurant_df['name']):
    label = '{}'.format(label)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(italian_restaurant_map)
    
# add markers of Bars to map
for lat, lng, label in zip(bar['lat'], bar['lng'], bar['name']):
    label = '{}'.format(label)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        location=[lat, lng],
        icon=folium.Icon(color = 'red'),
        popup=label,
    ).add_to(italian_restaurant_map)
    
italian_restaurant_map

This result seems to give us other inputs; the location of venues that are typically found near Italian Restaurant opens the area expanding it up to Toronto Union Station.

Let's try the third popular venue

In [39]:
#Third search: hotel
search_query = 'Hotel'
radius = 50000

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)

hotel = requests.get(url).json()
# assign relevant part of JSON to venues
venues_hotel = hotel['response']['venues']
# tranform venues into a dataframe
hotel = json_normalize(venues_hotel)
#dataframe.drop(dataframe[(dataframe['location.postalCode'] == 'NaN')].index(), inplace=True)

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in hotel.columns if col.startswith('location.')] + ['id']
hotel = hotel.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
hotel['categories'] = hotel.apply(get_category_type, axis=1)

# clean column names by keeping only last term
hotel.columns = [column.split('.')[-1] for column in hotel.columns]

#Keep only first 3 of Postcode
hotel['Postcode'] = hotel['postalCode'].str[:3]

hotel.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,Postcode
0,Sheraton Centre Toronto Hotel,Hotel,123 Queen Street West,CA,Toronto,Canada,at York St.,324,"[123 Queen Street West (at York St.), Toronto ...","[{'label': 'display', 'lat': 43.6505944, 'lng'...",43.650594,-79.38453,,M5H 2M9,ON,4ab2d511f964a5209b6c20e3,M5H
1,Chelsea Hotel,Hotel,33 Gerrard Street West,CA,Toronto,Canada,at Yonge St,562,"[33 Gerrard Street West (at Yonge St), Toronto...","[{'label': 'display', 'lat': 43.65849759157591...",43.658498,-79.383097,,M5G 1Z4,ON,51d212c3498ebf27dc469bc9,M5G
2,One King West Hotel & Residence,Hotel,1 King St W,CA,Toronto,Canada,at Yonge St.,686,"[1 King St W (at Yonge St.), Toronto ON M5H 1A...","[{'label': 'display', 'lat': 43.6491395, 'lng'...",43.649139,-79.377876,,M5H 1A1,ON,4af96fbbf964a520c01122e3,M5H
3,Bond Place Hotel,Hotel,65 Dundas St E,CA,Toronto,Canada,at Bond St.,534,"[65 Dundas St E (at Bond St.), Toronto ON M5B ...","[{'label': 'display', 'lat': 43.65618805882607...",43.656188,-79.378452,,M5B 2G8,ON,4ad4c05bf964a520a3f520e3,M5B
4,Pantages Hotel & Spa,Hotel,200 Victoria St,CA,Toronto,Canada,at Shuter St,410,"[200 Victoria St (at Shuter St), Toronto ON, C...","[{'label': 'display', 'lat': 43.65449797039222...",43.654498,-79.379035,,,ON,4ae61cf6f964a520caa421e3,


In [40]:
hotel.groupby('categories').count()

Unnamed: 0_level_0,name,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,Postcode
categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Ballroom,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0
College Residence Hall,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1
Convention Center,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1
Gym,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1
Gym / Fitness Center,1,0,1,0,1,0,1,1,1,1,1,1,0,0,1,0
Hotel,36,32,36,34,36,21,36,36,36,36,36,1,26,34,36,26
Jazz Club,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1
Lounge,1,0,1,1,1,0,1,1,1,1,1,0,1,1,1,1
Miscellaneous Shop,1,1,1,0,1,0,1,1,1,1,1,0,0,0,1,0
Pool,1,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0


In [41]:
italian_restaurant_map = folium.Map(location=[latitude, longitude], zoom_start=15)

# instantiate a mark cluster object for the incidents in the dataframe
#italian_restaurant_df_marker = plugins.MarkerCluster().add_to(italian_restaurant_map)

# add markers to map
for lat, lng, label in zip(italian_restaurant_df['lat'], italian_restaurant_df['lng'], italian_restaurant_df['name']):
    label = '{}'.format(label)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(italian_restaurant_map)
    
# add markers of Hotels to map
for lat, lng, label in zip(hotel['lat'], hotel['lng'], hotel['name']):
    label = '{}'.format(label)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        location=[lat, lng],
        icon=folium.Icon(color = 'orange'),
        popup=label,
    ).add_to(italian_restaurant_map)
    
italian_restaurant_map

This last search confirms what we found above; the area in which the most popular venues commonly found near Italian Restaurants is from Queen's Park up to Toronto Union Station.

Let's put everything together.

In [42]:
italian_restaurant_map = folium.Map(location=[latitude, longitude], zoom_start=15)

# instantiate a mark cluster object for the incidents in the dataframe
#italian_restaurant_df_marker = plugins.MarkerCluster().add_to(italian_restaurant_map)

# add markers to map
for lat, lng, label in zip(italian_restaurant_df['lat'], italian_restaurant_df['lng'], italian_restaurant_df['name']):
    label = '{}'.format(label)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(italian_restaurant_map)
    
# add markers of Hotels to map
for lat, lng, label in zip(hotel['lat'], hotel['lng'], hotel['name']):
    label = '{}'.format(label)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        location=[lat, lng],
        icon=folium.Icon(color = 'orange'),
        popup=label,
    ).add_to(italian_restaurant_map)
    
# add markers of Bars to map
for lat, lng, label in zip(bar['lat'], bar['lng'], bar['name']):
    label = '{}'.format(label)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        location=[lat, lng],
        icon=folium.Icon(color = 'red'),
        popup=label,
    ).add_to(italian_restaurant_map)
    

# add markers of Coffee Shops to map
for lat, lng, label in zip(coffee_shop['lat'], coffee_shop['lng'], coffee_shop['name']):
    label = '{}'.format(label)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        location=[lat, lng],
        icon=folium.Icon(color = 'green'),
        popup=label,
    ).add_to(italian_restaurant_map)
    
    
italian_restaurant_map

This picture shows Italian Restaurants and other venues we analysed in a single map zoomed to show the area of interest. As we can see, we can now refine the area suggested for an Italian Restaurant to Adelaide Street West or Adelaide Street East.

## Results and Discussion <a name="conclusion"></a>

From above results we can now conclude that

1. Many popular venues are located in a rectangle defined by Richmond Street West, Church Street, Front Street and Queen Street.
2. In this area there are 3 other Italian Restaurants near University Avenue
3. University Avenue splits the area of interest in 2 sub-areas (East and West)
4. Our suggestion for the location of a new Italian Restaurant is then near Adelaide Street West or Adelaide Street East in order to have the Restaurant located near interesting venues but not too close to other Italian Restaurants that are already well established in the city.

This analysis gave us only a suggestion of a possible location of an Italian Restaurant. The variables we considered were only a few of all the possible ones we may have analysed, such as position of parkings, presence of restaurants serving other cuisine type and cost of the rent. Further analysis including these variables may help us in refining our search more and finding a more suitable location to suggest to our customer.