# Toronto’s Best and Worst Neighborhoods for Italian Cuisine

## Introduction

Toronto is a major city in the province of Ontario in eastern Canada. The city has a population of over 2.7 million people. “Toronto is an international centre of business, finance, arts, and culture, and is recognized as one of the most multicultural and cosmopolitan cities in the world”. Immigrants to Canada have most often chosen Toronto as home which is represented by the more than 200 distinct ethnic origins living in the city. Some members of this diverse population have segregated into pockets known as little Italy, Chinatown, Portugal village and Little India. If a neighborhood has an ethnic majority inhabiting it, then is this the best neighborhood to go for ethnic cuisine?

Source: https://en.wikipedia.org/wiki/Toronto


## Business Problem

The objective of this investigation will be to identify the neighborhoods in Toronto that have the most Italian restaurants and determine if Little Italy is in that list. This will be useful information to the Toronto tourism board when advising tourists who are visiting Toronto. Secondly, the neighborhood with the least Italian restaurants will be noted as a recommendation for the best neighborhood to open an Italian restaurant. This will be used by the Toronto community agencies for small business development.


## Data

#### 1.	Use the Wikipedia page for Toronto postal codes and combine with latitude and longitude data from the Week 3 assignment for transformation into the location of Toronto neighborhoods.
#### Link: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

#### 2.	Use the Foursquare location data for making a list of all the venues in each Toronto neighborhood.
#### Link: Foursquare API

#### 3.	Use the latitude and longitude data to map the neighborhoods with Italian restaurants by Python’s Folium library.



### Data Extraction, Cleaning and Preparation

Import libraries to scrape the web for Wikipedia table of Toronto postal codes.

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

from urllib.request import urlopen

Open the URL with the Wikipedia page.

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html = urlopen(url)

Use Beautiful Soup to parse the HTML.

In [3]:
soup = BeautifulSoup(html, 'lxml')
type(soup)

bs4.BeautifulSoup

Transfer the Toronto postal code table into a Pandas Dataframe.

In [4]:
Web_Page = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(Web_Page.content,'lxml')
table = soup.find_all('table')
df = pd.read_html(str(table), header=0)

TO_neigh=pd.DataFrame(df[0])
TO_neigh.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


Clean data by dropping rows that don't have an assigned borough and reset the index values.

In [5]:
TO_neigh2 = TO_neigh.dropna(axis=0)
TO_neigh2 = TO_neigh2[~TO_neigh2["Borough"].isin(["Not assigned"])]
TO_neigh2 = TO_neigh2.reset_index(drop=True)
TO_neigh2.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Transfer the file with the geographical coordinates into a Pandas dataframe.

In [6]:
df_geocoords = pd.read_csv('Geospatial_Coordinates.csv')
df_geocoords.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Add the geographical coordinates with the postal code dataframe and drop the extra postal code column.

In [7]:
TO_neighgeo = pd.merge(left=TO_neigh2, right=df_geocoords,left_on='Postal Code', right_on='Postal Code')

#TO_neighgeo.drop(['Postal Code'], axis=1, inplace=True)
TO_neighgeo.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


### Data Analysis

A map to visualize Toronto's neighbourhoods shows how spreadout all of the neighbourhoods are.

In [8]:
import folium #mapping library
# create map of Toronto boroughs using latitude and longitude values
map_toronto= folium.Map(location=[43.6534817, -79.3839347], zoom_start=11)

# add markers to map
for lat, lng, label in zip(TO_neighgeo['Latitude'], TO_neighgeo['Longitude'], TO_neighgeo['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='orange',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

To determine the number of Italian restaurants in each neighbourhood the venues are needed. Foursquare location data will be used to pull the venues. Define FourSquare Credentials and Version.

In [9]:
CLIENT_ID = 'SQN5I1PTQVXQ3IMZ14RE3E1COYRHRP1WEYJLCOGB3IDMSNTN' # your Foursquare ID
CLIENT_SECRET = 'M0N3LNIOK2NQSVZH2WRQRCHKXROO24EMHYM0RR5VGJMZ43HS' # your Foursquare Secret
VERSION = '20180604' # Foursquare API version


Create a function to explore all the neighborhoods in Toronto.

In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    radius = 1000 # define radius
    LIMIT = 50 # limit of number of venues returned by Foursquare API. Max is 50.
   
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run the function to build the dataframe with the venues in each Toronto neighborhood

In [11]:
TO_venues = getNearbyVenues(names=TO_neighgeo['Neighbourhood'],
                                   latitudes=TO_neighgeo['Latitude'],
                                   longitudes=TO_neighgeo['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

A detailed view of the dataframe indicates there is a wide variety of venues in Toronto.

In [12]:
TO_venues.head(20)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
2,Parkwoods,43.753259,-79.329656,Tim Hortons,43.760668,-79.326368,Café
3,Parkwoods,43.753259,-79.329656,A&W,43.760643,-79.326865,Fast Food Restaurant
4,Parkwoods,43.753259,-79.329656,Bruno's valu-mart,43.746143,-79.32463,Grocery Store
5,Parkwoods,43.753259,-79.329656,High Street Fish & Chips,43.74526,-79.324949,Fish & Chips Shop
6,Parkwoods,43.753259,-79.329656,Shoppers Drug Mart,43.760857,-79.324961,Pharmacy
7,Parkwoods,43.753259,-79.329656,Food Basics,43.760549,-79.326045,Supermarket
8,Parkwoods,43.753259,-79.329656,Shoppers Drug Mart,43.745315,-79.3258,Pharmacy
9,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop


Extract the Italian restaurants in each neighbourhood by filtering the list on Venue Category.

In [13]:
Italianvenues_df = TO_venues[TO_venues['Venue Category'].str.contains('Italian Restaurant')].copy()
Italianvenues_df.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
64,"Regent Park, Harbourfront",43.65426,-79.360636,Mangia and Bevi Resto-Bar,43.65225,-79.366355,Italian Restaurant
74,"Regent Park, Harbourfront",43.65426,-79.360636,Fusaro's,43.653347,-79.369517,Italian Restaurant
169,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Scaddabush Italian Kitchen & Bar,43.65892,-79.382891,Italian Restaurant
280,"Garden District, Ryerson",43.657162,-79.378937,Scaddabush Italian Kitchen & Bar,43.65892,-79.382891,Italian Restaurant
293,"Garden District, Ryerson",43.657162,-79.378937,Trattoria Mercatto,43.654453,-79.380974,Italian Restaurant


Grouping the neighbourhoods will give an indication of the total number of Italian restaurants in each neighborhood. Also, cleaning the dataframe by removing unecessary columns and renaming the "Venue" column to "Number of Italian Restaurants" will make the dataframe more meaningful.

In [14]:
Italianvenues_df2=Italianvenues_df.groupby('Neighbourhood').count()
Italianvenues_df2.drop(['Neighbourhood Latitude','Neighbourhood Longitude','Venue Latitude','Venue Longitude','Venue Category'],axis=1, inplace=True)
Italianvenues_df2.rename(columns={'Venue':'Number of Italian Restaurants'},inplace=True)
Italianvenues_df2.head()

Unnamed: 0_level_0,Number of Italian Restaurants
Neighbourhood,Unnamed: 1_level_1
"Bedford Park, Lawrence Manor East",3
Berczy Park,1
"Brockton, Parkdale Village, Exhibition Place",1
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",2
Central Bay Street,1


### Which neighbourhood has the greatest number of Italian restaurants?

If this dataframe is sorted in descending order then the neighborhoods with the greatest and least number of Italian restaurants will become evident.

In [15]:
Sortedvenues_df2 = Italianvenues_df2.sort_values(by='Number of Italian Restaurants', ascending=False)
Sortedvenues_df2

Unnamed: 0_level_0,Number of Italian Restaurants
Neighbourhood,Unnamed: 1_level_1
Davisville,6
Davisville North,4
"Moore Park, Summerhill East",4
"Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park",4
"North Toronto West, Lawrence Park",3
Roselawn,3
"Runnymede, Swansea",3
"Bedford Park, Lawrence Manor East",3
"The Annex, North Midtown, Yorkville",3
"The Danforth West, Riverdale",3


Interesting that Davisville has the greatest number of Italian restaurants because Christie is the neighborhood known as "Little Italy" and Christie only has one Italian restaurant. Let's visualize these neighborhoods with Italian restaurants on a map and **highlight the locations of Davisville and Christie**.

First the latitudes and longitudes of Davisville and Christie are needed from the Italian restaurants dataframe with the geographical coordinates. This is done by grouping the dataframe by neighbourhood and displaying the first Italian restaurant in each neighbourhood group.

In [16]:
Italianvenues_df3=Italianvenues_df.groupby(['Neighbourhood'])
Italianvenues_df3.first()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Bedford Park, Lawrence Manor East",43.733283,-79.41975,Tutto Pronto,43.728235,-79.418086,Italian Restaurant
Berczy Park,43.644771,-79.373306,The Old Spaghetti Factory,43.646964,-79.374403,Italian Restaurant
"Brockton, Parkdale Village, Exhibition Place",43.636847,-79.428191,Caffino,43.639021,-79.425289,Italian Restaurant
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",43.662744,-79.321558,Gio Rana's Really Really Nice Restaurant,43.663367,-79.330425,Italian Restaurant
Central Bay Street,43.657952,-79.387383,Scaddabush Italian Kitchen & Bar,43.65892,-79.382891,Italian Restaurant
Christie,43.669542,-79.422564,Vinny’s Panini,43.670679,-79.426148,Italian Restaurant
"Clarks Corners, Tam O'Shanter, Sullivan",43.781638,-79.304302,Remezzo Italian Bistro,43.778649,-79.308264,Italian Restaurant
"Commerce Court, Victoria Hotel",43.648198,-79.379817,Terroni,43.650927,-79.375602,Italian Restaurant
Davisville,43.704324,-79.38879,Positano,43.704558,-79.388639,Italian Restaurant
Davisville North,43.712751,-79.390197,Bar Buca,43.706961,-79.394808,Italian Restaurant


Extract the latitudes and longitudes and list of Italian restaurants for both Davisville and Christie so that their locations can be visulaized on a map of neighbourhoods with Italian restaurant venues.

In [17]:
Italianvenues_df3.get_group('Davisville')

Unnamed: 0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
2547,43.704324,-79.38879,Positano,43.704558,-79.388639,Italian Restaurant
2548,43.704324,-79.38879,Bar Buca,43.706961,-79.394808,Italian Restaurant
2552,43.704324,-79.38879,Balsamico,43.701505,-79.397162,Italian Restaurant
2555,43.704324,-79.38879,Florentia Ristorante,43.703594,-79.387985,Italian Restaurant
2568,43.704324,-79.38879,Five Doors North,43.702236,-79.397526,Italian Restaurant
2575,43.704324,-79.38879,Grazie Ristorante,43.709329,-79.398823,Italian Restaurant


In [18]:
Italianvenues_df3.get_group('Christie')

Unnamed: 0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
783,43.669542,-79.422564,Vinny’s Panini,43.670679,-79.426148,Italian Restaurant


In [19]:
# create map of the Toronto neighbourhoods with Italian restaurants using latitude and longitude values
map_Italianvenues= folium.Map(location=[43.6534817, -79.3839347], zoom_start=11)

# add markers to map
for lat, lng, label in zip(Italianvenues_df['Neighbourhood Latitude'], Italianvenues_df['Neighbourhood Longitude'], Italianvenues_df['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Italianvenues)  

# Create feature groups for Davisville and Christie
Davisville=folium.map.FeatureGroup()
Christie=folium.map.FeatureGroup()

# Davisville first

# Style the feature group
Davisville.add_child(
    folium.features.CircleMarker(
    [43.704324,-79.38879],radius = 12,
    color = 'red', fill_color = 'red'
    )
)

# Add the feature group to the map
map_Italianvenues.add_child(Davisville)

# Label the marker
folium.Marker([43.704324,-79.38879],
             popup='Davisville').add_to(map_Italianvenues)

# Christie second

# Style the feature group
Christie.add_child(
    folium.features.CircleMarker(
    [43.669542, -79.422564],radius = 12,
    color = 'Purple', fill_color = 'Purple'
    )
)

# Add the feature group to the map
map_Italianvenues.add_child(Christie)

# Label the marker
folium.Marker([43.669542, -79.422564],
             popup='Christie').add_to(map_Italianvenues)

# Display map of neighbourhoods with Italian venues
map_Italianvenues

The map shows the location of **Davisville circled in red and Christie circled in purple**. The close proximity of these two neighbourhoods leads one to believe that although Christie may be known as "Little Italy", primarily due to the Italian population living there, the residents of Christie may have found that the opportunity for the **Italian restaurant business is more lucrative in Davisville**. With a little research into the demographics of Davisville, it is apparent that the residents of Davisville are more wealthy than the average Torontonian. In fact, twice the rate of the Davisville residents make more that $60,000 when compared to the rest of Toronto residents. 
Source: https://medium.com/real-estate-news/toronto-neighbourhood-demographics-architecture-in-davisville-village-310d02bcf144

**To answer the question of which Toronto neighbourhood has the most number of Italian restaurants - Davisville is at the top of the list and with good reason. If tourists ask where to get the most variety in Italian restaurants then the tourist board should list Davisville as the number one selection.**

### Which neighbourhood(s) has the least number of Italian restaurants and has the venues to attract visitors to the area?

Generally a good location for opening a new restaurant is in an area that is highly accessible. Let's identify the Toronto neighbourhood(s) with venues that bring high foot traffic and have the least number of Italian restaurants.

Use One Hot Coding to put the Toronto venues into a dataframe that can be manipulated into frequency of venue occurence for each neighbourhood.

In [20]:
# one hot encoding
TO_onehot = pd.get_dummies(TO_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
TO_onehot['Neighbourhood'] = TO_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [TO_onehot.columns[-1]] + list(TO_onehot.columns[:-1])
TO_onehot = TO_onehot[fixed_columns]

TO_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Group rows by neighbourhood and take the mean of the frequency of occurrence in each category.

In [21]:
TO_grouped = TO_onehot.groupby('Neighbourhood').mean().reset_index()
TO_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Agincourt,0.0,0.00,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.025000,0.00,0.00,0.000000,0.000000,0.000000,0.00000,0.0
1,"Alderwood, Long Branch",0.0,0.00,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.00,0.00,0.000000,0.000000,0.000000,0.00000,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.00,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.00,0.00,0.000000,0.000000,0.000000,0.00000,0.0
3,Bayview Village,0.0,0.00,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.00,0.00,0.000000,0.000000,0.000000,0.00000,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.00,0.000000,0.000000,0.022222,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.00,0.00,0.022222,0.022222,0.000000,0.00000,0.0
5,Berczy Park,0.0,0.00,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.00,0.00,0.000000,0.000000,0.000000,0.00000,0.0
6,"Birch Cliff, Cliffside West",0.0,0.00,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.00,0.00,0.000000,0.000000,0.000000,0.00000,0.0
7,"Brockton, Parkdale Village, Exhibition Place",0.0,0.00,0.000000,0.000000,0.020000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.00,0.00,0.000000,0.000000,0.000000,0.00000,0.0
8,"Business reply mail Processing Centre, South C...",0.0,0.00,0.000000,0.000000,0.020833,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.00,0.00,0.000000,0.000000,0.000000,0.00000,0.0
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.00,0.000000,0.071429,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.00,0.00,0.000000,0.000000,0.000000,0.00000,0.0


Identify the top 5 venues in each neighborhood group.

In [22]:
num_top_venues = 5

for hood in TO_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = TO_grouped[TO_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                  venue  freq
0    Chinese Restaurant  0.15
1         Shopping Mall  0.08
2  Caribbean Restaurant  0.05
3                Bakery  0.05
4            Restaurant  0.05


----Alderwood, Long Branch----
               venue  freq
0     Discount Store  0.12
1  Convenience Store  0.08
2        Pizza Place  0.08
3      Garden Center  0.04
4      Shopping Mall  0.04


----Bathurst Manor, Wilson Heights, Downsview North----
                venue  freq
0         Coffee Shop  0.07
1         Pizza Place  0.07
2                Bank  0.07
3                Park  0.07
4  Chinese Restaurant  0.03


----Bayview Village----
                 venue  freq
0          Gas Station  0.14
1                 Bank  0.14
2        Grocery Store  0.14
3  Japanese Restaurant  0.14
4                 Park  0.07


----Bedford Park, Lawrence Manor East----
                venue  freq
0         Coffee Shop  0.07
1  Italian Restaurant  0.07
2         Pizza Place  0.04
3      Sandwich Place  0.

           venue  freq
0       Pharmacy  0.17
1         Bakery  0.08
2     Playground  0.08
3           Bank  0.08
4  Shopping Mall  0.08


----Kennedy Park, Ionview, East Birchmount Park----
                  venue  freq
0    Chinese Restaurant  0.15
1        Discount Store  0.12
2           Coffee Shop  0.12
3  Fast Food Restaurant  0.08
4         Grocery Store  0.08


----Kensington Market, Chinatown, Grange Park----
                           venue  freq
0  Vegetarian / Vegan Restaurant  0.08
1                           Café  0.06
2             Mexican Restaurant  0.06
3                    Coffee Shop  0.06
4           Caribbean Restaurant  0.04


----Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens----
                venue  freq
0            Pharmacy  0.12
1                Bank  0.06
2         Supermarket  0.06
3  Chinese Restaurant  0.06
4            Bus Line  0.06


----Lawrence Manor, Lawrence Heights----
                  venue  freq
0        Clothing S

                  venue  freq
0                   Pub  0.10
1   Japanese Restaurant  0.06
2                  Park  0.04
3                 Beach  0.04
4  Caribbean Restaurant  0.04


----The Danforth West, Riverdale----
                venue  freq
0    Greek Restaurant  0.16
1  Italian Restaurant  0.06
2                Café  0.06
3          Restaurant  0.04
4      Ice Cream Shop  0.04


----The Kingsway, Montgomery Road, Old Mill North----
          venue  freq
0   Coffee Shop  0.09
1  Burger Joint  0.04
2          Bank  0.04
3           Pub  0.04
4   Pizza Place  0.04


----Thorncliffe Park----
               venue  freq
0        Coffee Shop  0.10
1  Indian Restaurant  0.08
2      Grocery Store  0.06
3        Pizza Place  0.04
4        Supermarket  0.04


----Toronto Dominion Centre, Design Exchange----
                 venue  freq
0                 Café  0.12
1          Coffee Shop  0.10
2                Hotel  0.08
3           Restaurant  0.06
4  American Restaurant  0.04


----Unive

Sort the venues from highest to lowest frequency

In [23]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create a dataframe with the top 5 venues in each neighborhood.

In [24]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = TO_grouped['Neighbourhood']

for ind in np.arange(TO_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(TO_grouped.iloc[ind, 1:], num_top_venues)

neighborhoods_venues_sorted


Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Agincourt,Chinese Restaurant,Shopping Mall,Bakery,Pizza Place,Restaurant
1,"Alderwood, Long Branch",Discount Store,Convenience Store,Pizza Place,Park,Donut Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Pizza Place,Bank,Park,Bridal Shop
3,Bayview Village,Bank,Japanese Restaurant,Gas Station,Grocery Store,Restaurant
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Fast Food Restaurant,Sandwich Place,Bank
5,Berczy Park,Coffee Shop,Café,Japanese Restaurant,Hotel,Farmers Market
6,"Birch Cliff, Cliffside West",Convenience Store,Park,Thai Restaurant,General Entertainment,Diner
7,"Brockton, Parkdale Village, Exhibition Place",Café,Bakery,Restaurant,Gift Shop,Bar
8,"Business reply mail Processing Centre, South C...",Park,Coffee Shop,Brewery,Pizza Place,Italian Restaurant
9,"CN Tower, King and Spadina, Railway Lands, Har...",Harbor / Marina,Café,Coffee Shop,Garden,Track


Run k-means to cluster the neighbourhoods into 5 clusters.

In [25]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 6

TO_grouped_clustering = TO_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(TO_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:5] 

array([0, 5, 5, 5, 2])

Construct a new dataframe that includes the cluster as well as the top 5 venues for each neighborhood.

In [26]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge with Toronto data to add latitude/longitude for each neighborhood
TO_merged = pd.merge(left=TO_neighgeo, right=neighborhoods_venues_sorted,left_on='Neighbourhood', right_on='Neighbourhood')
TO_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1,Park,Convenience Store,Pharmacy,Bus Stop,Shopping Mall
1,M4A,North York,Victoria Village,43.725882,-79.315572,2,Coffee Shop,Hockey Arena,Grocery Store,Café,Sporting Goods Shop
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636,2,Coffee Shop,Café,Theater,Pub,Park
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0,Clothing Store,Coffee Shop,Fast Food Restaurant,Restaurant,Furniture / Home Store
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,Coffee Shop,Park,Gastropub,Bookstore,Dance Studio
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242,1,Pharmacy,Bakery,Grocery Store,Bank,Shopping Mall
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,0,Fast Food Restaurant,Trail,Greek Restaurant,Arts & Crafts Store,Bank
7,M3B,North York,Don Mills,43.745906,-79.352188,2,Coffee Shop,Japanese Restaurant,Restaurant,Gym,Burger Joint
8,M3C,North York,Don Mills,43.725900,-79.340923,2,Coffee Shop,Japanese Restaurant,Restaurant,Gym,Burger Joint
9,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,5,Brewery,Pizza Place,Coffee Shop,Athletics & Sports,Pharmacy


To find a neighbourhood that would be a good fit for a new Italian restaurant, the characteristics of the venues in each cluster will be evaluated. Primarily the cluster with the least number of restaurants, particularly Italian restaurants, and with venues that bring in visitors will be the most desirable.

**Typical Neighbourhood:** Cluster number 0 has a variety of services that are common to most neighbourhoods. There are Italian restaurants in this cluster.

In [27]:
TO_merged.loc[TO_merged['Cluster Labels'] == 0, TO_merged.columns[[1] + list(range(5, TO_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,North York,0,Clothing Store,Coffee Shop,Fast Food Restaurant,Restaurant,Furniture / Home Store
6,Scarborough,0,Fast Food Restaurant,Trail,Greek Restaurant,Arts & Crafts Store,Bank
38,Scarborough,0,Chinese Restaurant,Discount Store,Coffee Shop,Grocery Store,Pizza Place
57,York,0,Furniture / Home Store,Discount Store,Grocery Store,Gas Station,Playground
58,North York,0,Convenience Store,Bakery,Gas Station,Golf Course,Storage Facility
65,Scarborough,0,Furniture / Home Store,Restaurant,Coffee Shop,Electronics Store,Light Rail Station
78,Scarborough,0,Chinese Restaurant,Shopping Mall,Bakery,Pizza Place,Restaurant
85,Scarborough,0,Chinese Restaurant,Pharmacy,Park,Pizza Place,Bakery
90,Scarborough,0,Chinese Restaurant,Coffee Shop,Pizza Place,Fast Food Restaurant,Bank


**Sporty Neighbourhood:** Cluster number 1 has more sports type activites including gym, park and beach activities. An Italian restaurant may not be of interest to people who are "body image conscious". 

In [28]:
TO_merged.loc[TO_merged['Cluster Labels'] == 1, TO_merged.columns[[1] + list(range(5, TO_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,North York,1,Park,Convenience Store,Pharmacy,Bus Stop,Shopping Mall
5,Etobicoke,1,Pharmacy,Bakery,Grocery Store,Bank,Shopping Mall
13,Scarborough,1,Breakfast Spot,Burger Joint,Playground,Italian Restaurant,Park
21,York,1,Pharmacy,Park,Hostel,Bus Line,Bus Stop
22,Scarborough,1,Park,Coffee Shop,Chinese Restaurant,Mobile Phone Shop,Fast Food Restaurant
52,North York,1,Arts & Crafts Store,Pharmacy,Bakery,Italian Restaurant,Shopping Mall
72,North York,1,Pharmacy,Bakery,Bank,Eastern European Restaurant,Baby Store
88,Etobicoke,1,Park,Fast Food Restaurant,Fried Chicken Joint,Liquor Store,Bakery
100,Etobicoke,1,Italian Restaurant,Park,Shopping Mall,Eastern European Restaurant,Ice Cream Shop


**Day-time Neighbourhoods:** Cluster number 2 has services which would draw foot traffic but it could be argued that these services are primarily accessed during business hours only whereas operating hours of an Italian restaurant would be prefered in the evening hours.

In [29]:
TO_merged.loc[TO_merged['Cluster Labels'] == 2, TO_merged.columns[[1] + list(range(5, TO_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,North York,2,Coffee Shop,Hockey Arena,Grocery Store,Café,Sporting Goods Shop
2,Downtown Toronto,2,Coffee Shop,Café,Theater,Pub,Park
4,Downtown Toronto,2,Coffee Shop,Park,Gastropub,Bookstore,Dance Studio
7,North York,2,Coffee Shop,Japanese Restaurant,Restaurant,Gym,Burger Joint
8,North York,2,Coffee Shop,Japanese Restaurant,Restaurant,Gym,Burger Joint
10,Downtown Toronto,2,Coffee Shop,Clothing Store,Cosmetics Shop,Sushi Restaurant,Japanese Restaurant
15,Downtown Toronto,2,Café,Coffee Shop,Restaurant,Cosmetics Shop,Gastropub
19,East Toronto,2,Pub,Japanese Restaurant,Caribbean Restaurant,Park,Bakery
20,Downtown Toronto,2,Coffee Shop,Café,Japanese Restaurant,Hotel,Farmers Market
24,Downtown Toronto,2,Coffee Shop,Café,Arts & Crafts Store,Park,Plaza


**Foot Traffic Neighbourhoods:** The top three venues for cluster number 3 including a park, pool and dog run and lack of existing restaurants, make this cluster in the borough of North York very attractive for opening an Italian restaurant.

In [30]:
TO_merged.loc[TO_merged['Cluster Labels'] == 3, TO_merged.columns[[1] + list(range(5, TO_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
94,Etobicoke,3,Drugstore,Lounge,Hotel,Coffee Shop,Dry Cleaner


**Mixed Bag Neighbourhoods:** Cluster number 4 has Italian restaurants which does not make this group a good candidate.

In [31]:
TO_merged.loc[TO_merged['Cluster Labels'] == 4, TO_merged.columns[[1] + list(range(5, TO_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
48,North York,4,Park,Pool,Zoo,Event Space,Drugstore


**Restaurant Overload Neighbourhoods**: There are a lot of established resturants in cluster number 5 so although the foot traffic would be good, the competition would be challenging.

In [32]:
TO_merged.loc[TO_merged['Cluster Labels'] == 5, TO_merged.columns[[1] + list(range(5, TO_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
9,East York,5,Brewery,Pizza Place,Coffee Shop,Athletics & Sports,Pharmacy
11,North York,5,Grocery Store,Fast Food Restaurant,Pizza Place,Coffee Shop,Gas Station
12,Etobicoke,5,Park,Pizza Place,Convenience Store,Gym,Restaurant
14,East York,5,Park,Coffee Shop,Pizza Place,Thai Restaurant,Sandwich Place
16,York,5,Pizza Place,Convenience Store,Grocery Store,Restaurant,Sushi Restaurant
17,Etobicoke,5,Coffee Shop,Gas Station,Grocery Store,Sandwich Place,Shopping Mall
18,Scarborough,5,Pizza Place,Coffee Shop,Restaurant,Bank,Fast Food Restaurant
23,East York,5,Sporting Goods Shop,Coffee Shop,Furniture / Home Store,Grocery Store,Sandwich Place
26,Scarborough,5,Bakery,Coffee Shop,Bank,Pharmacy,Indian Restaurant
27,North York,5,Pharmacy,Coffee Shop,Park,Bank,Recreation Center


Plot the clusters on a map to visualize their locations.

In [33]:
from matplotlib import cm
from matplotlib import colors

#create map
map_clusters = folium.Map(location=[43.6534817, -79.3839347], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(TO_merged['Latitude'], TO_merged['Longitude'], TO_merged['Borough'], TO_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.9).add_to(map_clusters)
       
map_clusters

Examining the clusters in detail it is evident that most of the six clusters have different types of restaurants. However cluster number 3 in the borough of North York has other types of different venues including parks, pools and dog run. This is appealing in terms of a location to open a restaurant in because the foot traffic generated by the other types of venues would be very good.

Extract the **North York neighbourhood in cluster number 3** that has these specific types of venues. Refer to the dataframe with all of the cluster numbers for each neighbourhood and the top five venues in each neighborhood.

In [34]:
TO_merged.iloc[48]

Postal Code                                   M2L
Borough                                North York
Neighbourhood            York Mills, Silver Hills
Latitude                                  43.7575
Longitude                                -79.3747
Cluster Labels                                  4
1st Most Common Venue                        Park
2nd Most Common Venue                        Pool
3rd Most Common Venue                         Zoo
4th Most Common Venue                 Event Space
5th Most Common Venue                   Drugstore
Name: 48, dtype: object

In conclusion, the neighbourhoods of **York Mills and Silver Hills are recommended to be good neighbourhoods for opening an Italian restaurant**.A couple of the reasons for this are because these neighbourhoods are in a pocket of Toronto in which there is very little competition in the restaurant industry and there are attractions that bring visitors into the neighbourhoods.

The map below highlights the location of these neighbourhoods and makes it easier to visualize the relative distance of these two neighbourhoods from other neighbourhoods in Toronto that have many dining establishments.

In [35]:
# Create feature groups for York Mills and Silver Hills
YorkMills_SilverHills=folium.map.FeatureGroup()

# Style the feature group
YorkMills_SilverHills.add_child(
    folium.features.CircleMarker(
    [43.7575,-79.3747],radius = 12,
    color = 'yellow', fill_color = 'yellow'
    )
)

# Add the feature group to the map
map_clusters.add_child(YorkMills_SilverHills)

# Label the marker
folium.Marker([43.7575,-79.3747],
             popup='York Mills, Silver Hills').add_to(map_clusters)
map_clusters

In conclusion, the neighbourhoods of **York Mills and Silver Hills are recommended to be good neighbourhoods for opening an Italian restaurant**.The reasons are because these neighbourhoods are in a pocket of Toronto in which there is very little competition in the restaurant industry and there are attractions that bring visitors into the neighbourhoods.