# Capstone Project - Battle of the Neighborhood

## Introduction/Business Problem

Suppose you are an entrepreneur and you are looking to open a Pub in Toronto, Ontario. But you cannot decide the location of the pub because the location of the pub has a significant impact on the expected returns.
 * You want to open the Pub in a location where the business would be profitable, where there are many customers. So a populated spot would be perfect to open this pub. 
 * Also you would want a place where there few to none competition i.e. you don't want to place the pub in the immediate proximity of existing ones.

In order to answer this question, we would have to build a model to get recommendations on where to start your business.

## Data

**A description of the data**: The data used to solve this problem is "List of postal codes of Canada" data collected from [Wikipedia](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M). Data is a single dataframe, containing the postal codes, boroughs and the neighborhoods in Toronto, Ontario.
 
 |Postal Code             |Borough                   |Neighborhoood                                     |
 |------------------------|--------------------------|--------------------------------------------------|
 |M1B                     |Scarborough              |Malvern, Rouge                                     |
 |M1C                     |Scarborough              |Rouge Hill, Port Union, Highland Creek             |
 |M3A                     |North York                |Parkwoods                                         |
 |M4A                     |North York                |Victoria Village                                  |
 |M5A                     |Downtown Toronto          |Regent Park, Harbourfront                         |
 
The locations of the neighborhoods are collected by using python's `geocoder` package to get location information. Now the data contains `Latitude` and `Longitude` along with `Postal Code`, `Borough`, `Neighborhood`. `Latitude` and `Longitude` is absolutely necessary to get the venues from the Foursquare api. **Example** of the data after using `gecoder`:
 
 |Postal Code        |Borough            |Neighborhood                |Latitude          |Longitude        |
 |-------------------|-------------------|----------------------------|------------------|-----------------|
 |M1B                |Scarborough        |Malvern, Rouge              |43.806686         |-79.194353       |
 |M1C                |Scarborough        |Rouge Hill, Port Union, Highland Creek|43.784535|-79.160497      |
 |M1E                |Scarborough        |Guildwood, Morningside, West Hill|43.763573    |-79.188711       |
 |M1G                |Scarborough        |Woburn                      |43.770992         |-79.216917       |
 |M1H                |Scarborough        |Cedarbrae                   |43.773136         |-79.239476       |

## Let's get to coding 

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # Library to handle data in a vectorized manner

import pandas as pd # Library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # Library to handle JSON files

from geopy.geocoders import Nominatim  # Convert an address into latitude and longitude values.

import requests # Library to handle modules
from pandas.io.json import json_normalize # # Tranform JSON file into a pandas dataframe

# Library to scrape Wikipedia page
from bs4 import BeautifulSoup

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Import K-Means from clustering stage
from sklearn.cluster import KMeans

import folium # Map rendering library

print('Libraries imported.')

Libraries imported.


First step is to scraped Wikipedia in order to obtain the data that is in the table of postal codes and to transform the data into a pandas  dataframe.

Let's Fetch the url https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M and displaying its contents.

In [2]:
fetched_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

print(fetched_url.status_code)
print(fetched_url.headers['content-type'])
print(fetched_url.encoding)
print(fetched_url.text)

200
text/html; charset=UTF-8
UTF-8
<!DOCTYPE html>
<html class="client-nojs" lang="en" dir="ltr">
<head>
<meta charset="UTF-8"/>
<title>List of postal codes of Canada: M - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"X@G@ewpAMM4AA4QbBo8AAAAI","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":995657573,"wgRevisionId":995657573,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Short description is different from Wikidata","Commun

Loading it up in `BeautifulSoup` and displaying it.

In [3]:
soup = BeautifulSoup(fetched_url.text, 'lxml')

print(soup.prettify)

<bound method Tag.prettify of <!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>List of postal codes of Canada: M - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"X@G@ewpAMM4AA4QbBo8AAAAI","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":995657573,"wgRevisionId":995657573,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Short description is different from Wikidata","Communicati

Let's find the table first.

In [4]:
table = soup.find('table', class_='wikitable sortable')

Find all the rows.

In [5]:
rows = table.find_all('tr')

iteration_dict = { '0':'PostalCode', '1':'Borough', '2':'Neighborhood' }
table_data = { 'PostalCode':[], 'Borough':[], 'Neighborhood':[] }

for row in rows:
    row_data = row.find_all('td')
    
    iterator = 0
    for data in row_data:
        table_data[iteration_dict[str(iterator)]].append(data.text.strip('\n'))
        iterator += 1

print(table_data)

{'PostalCode': ['M1A', 'M2A', 'M3A', 'M4A', 'M5A', 'M6A', 'M7A', 'M8A', 'M9A', 'M1B', 'M2B', 'M3B', 'M4B', 'M5B', 'M6B', 'M7B', 'M8B', 'M9B', 'M1C', 'M2C', 'M3C', 'M4C', 'M5C', 'M6C', 'M7C', 'M8C', 'M9C', 'M1E', 'M2E', 'M3E', 'M4E', 'M5E', 'M6E', 'M7E', 'M8E', 'M9E', 'M1G', 'M2G', 'M3G', 'M4G', 'M5G', 'M6G', 'M7G', 'M8G', 'M9G', 'M1H', 'M2H', 'M3H', 'M4H', 'M5H', 'M6H', 'M7H', 'M8H', 'M9H', 'M1J', 'M2J', 'M3J', 'M4J', 'M5J', 'M6J', 'M7J', 'M8J', 'M9J', 'M1K', 'M2K', 'M3K', 'M4K', 'M5K', 'M6K', 'M7K', 'M8K', 'M9K', 'M1L', 'M2L', 'M3L', 'M4L', 'M5L', 'M6L', 'M7L', 'M8L', 'M9L', 'M1M', 'M2M', 'M3M', 'M4M', 'M5M', 'M6M', 'M7M', 'M8M', 'M9M', 'M1N', 'M2N', 'M3N', 'M4N', 'M5N', 'M6N', 'M7N', 'M8N', 'M9N', 'M1P', 'M2P', 'M3P', 'M4P', 'M5P', 'M6P', 'M7P', 'M8P', 'M9P', 'M1R', 'M2R', 'M3R', 'M4R', 'M5R', 'M6R', 'M7R', 'M8R', 'M9R', 'M1S', 'M2S', 'M3S', 'M4S', 'M5S', 'M6S', 'M7S', 'M8S', 'M9S', 'M1T', 'M2T', 'M3T', 'M4T', 'M5T', 'M6T', 'M7T', 'M8T', 'M9T', 'M1V', 'M2V', 'M3V', 'M4V', 'M5V', 'M6V

Now let's turn the `table_data` dictionary into a pandas dataframe.To create the above dataframe:

In [6]:
toronto = pd.DataFrame.from_dict(table_data)

toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Now we have to wrangle the data. First let's get some info.

In [7]:
toronto.info

<bound method DataFrame.info of     PostalCode           Borough  \
0          M1A      Not assigned   
1          M2A      Not assigned   
2          M3A        North York   
3          M4A        North York   
4          M5A  Downtown Toronto   
5          M6A        North York   
6          M7A  Downtown Toronto   
7          M8A      Not assigned   
8          M9A         Etobicoke   
9          M1B       Scarborough   
10         M2B      Not assigned   
11         M3B        North York   
12         M4B         East York   
13         M5B  Downtown Toronto   
14         M6B        North York   
15         M7B      Not assigned   
16         M8B      Not assigned   
17         M9B         Etobicoke   
18         M1C       Scarborough   
19         M2C      Not assigned   
20         M3C        North York   
21         M4C         East York   
22         M5C  Downtown Toronto   
23         M6C              York   
24         M7C      Not assigned   
25         M8C      Not assigned

First let's drop the rows with a borough that is `Not assigned`.

In [8]:
toronto['Borough'].value_counts()

Not assigned        77
North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
East Toronto         5
East York            5
York                 5
Mississauga          1
Name: Borough, dtype: int64

In [9]:
toronto = toronto[toronto.Borough != 'Not assigned']
toronto.reset_index(drop=True, inplace=True)

toronto.Borough.value_counts()

North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
York                 5
East York            5
East Toronto         5
Mississauga          1
Name: Borough, dtype: int64

Next let's replace the cells that has a borough but a **Not assigned** neighborhood with the same name as the borough.

In [10]:
toronto['Neighborhood'].replace({'Not assigned':toronto.Borough}, inplace=True)

toronto.Neighborhood.value_counts()

Downsview                                                                                                                                 4
Don Mills                                                                                                                                 2
Bedford Park, Lawrence Manor East                                                                                                         1
Lawrence Park                                                                                                                             1
University of Toronto, Harbord                                                                                                            1
Little Portugal, Trinity                                                                                                                  1
Canada Post Gateway Processing Centre                                                                                                     1
Golden Mile, Clairle

More than one neighborhood can exist in one postal code area. These rows will be combined into one row with the neighborhoods separated with a comma.

In [11]:
toronto = toronto.groupby(['PostalCode','Borough'])['Neighborhood'].apply(', '.join).reset_index()

toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [12]:
toronto.describe()

Unnamed: 0,PostalCode,Borough,Neighborhood
count,103,103,103
unique,103,10,99
top,M4S,North York,Downsview
freq,1,24,4


Let's print the number of rows of the dataframee using the `.shape` method.

In [13]:
toronto.shape

(103, 3)

Using geocoder with google service results OVER_QUERY_LIMIT: Keyless access to Google Maps Platform is deprecated. Please use an API key with all your API calls to avoid service interruption. We are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, so let's use http://cocl.us/Geospatial_data this link to a csv file that has the geographical coordinates of each postal code.

In [14]:
geo_data = pd.read_csv('Geospatial_Coordinates.csv')

geo_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Let's merge *toronto.csv* and *geo_data.csv*.

In [15]:
geo_data.columns = ['PostalCode', 'Latitude', 'Longitude']

toronto_full = pd.merge(toronto, geo_data, on='PostalCode')
toronto_full.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Let's check if the merging was successful.

In [16]:
toronto_full[toronto_full['PostalCode'] == 'M6A']

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
71,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763


In [17]:
toronto_full.shape

(103, 5)

Let's save the dataframe in a **.csv** file.

In [18]:
toronto_full.to_csv('toronto_full.csv', index=False)

In [19]:
toronto = toronto_full

Let's make sure that the dataset has all the boroughs and neighborhoods.

In [20]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
    len(toronto['Borough'].unique()),
    toronto.shape[0]
))

The dataframe has 10 boroughs and 103 neighborhoods.


Let's use geopy library to get the latitude and longitude values of Toronto Ontario.

In order to define and instance of the geocoder, we need to define a user_agent.

In [21]:
address = 'Toronto Ontario, TO'

geolocator = Nominatim(user_agent='to_explorer')
location = geolocator.geocode(address)

latitude = location.latitude
longitude = location.longitude

print('The geographical coordinate of Toronto Ontario are {}, {}'.format(latitude, longitude))

The geographical coordinate of Toronto Ontario are 43.65238435, -79.38356765


Let's create a map of Toronto Ontario with neighborhoods superimposed on top.

In [22]:
# Create a map of Toronto Ontario using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add marker to map
for lat, lng, borough, neighborhood in zip(
    toronto['Latitude'], toronto['Longitude'],
    toronto['Borough'], toronto['Neighborhood']):
    
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False
    ).add_to(map_toronto)

map_toronto

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

Let's define Foursquare Credentials and Version first.

In [23]:
CLIENT_ID = 'YRRT2YCRSFWA0PMLLE10DSQYHT2OXYFYK4R5K3VIXVVCF2B1' # Foursqare ID
CLIENT_SECRET = 'ZIH2BX40DGBPDLZKWRLTFROYNZL1NWJKLE4YR0IRZ1VDSEPY' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your Credentials: ')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET: ' + CLIENT_SECRET)

Your Credentials: 
CLIENT_ID: YRRT2YCRSFWA0PMLLE10DSQYHT2OXYFYK4R5K3VIXVVCF2B1
CLIENT_SECRET: ZIH2BX40DGBPDLZKWRLTFROYNZL1NWJKLE4YR0IRZ1VDSEPY


Let's explore the first neighborhood in our dataframe.

In [24]:
toronto.loc[0, 'Neighborhood']

'Malvern, Rouge'

Get the neighborhood's latitude and longitude values.

In [25]:
neighborhood_latitude = toronto.loc[0, 'Latitude']
neighborhood_longitude = toronto.loc[0, 'Longitude']

neighborhood_name = toronto.loc[0, 'Neighborhood']

print('Latitude and Longitude values of {} are {}, {}.'.format(
    neighborhood_name, neighborhood_latitude, neighborhood_longitude))

Latitude and Longitude values of Malvern, Rouge are 43.806686299999996, -79.19435340000001.


Now, let's get the top 100 venues that are in Malvern, Rouge within a radius of 500 meters.

First, let's create the GET request URL.

In [26]:
LIMIT = 100 # Limit of number of venues returned by Foursquare API
radius = 500 # Define radius

# Create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    neighborhood_latitude,
    neighborhood_longitude,
    radius,
    LIMIT
)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=YRRT2YCRSFWA0PMLLE10DSQYHT2OXYFYK4R5K3VIXVVCF2B1&client_secret=ZIH2BX40DGBPDLZKWRLTFROYNZL1NWJKLE4YR0IRZ1VDSEPY&v=20180605&ll=43.806686299999996,-79.19435340000001&radius=500&limit=100'

Send the GET request and examine the results.

In [27]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fef4c56dad389062d27b976'},
  'headerLocation': 'Malvern',
  'headerFullLocation': 'Malvern, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 1,
  'suggestedBounds': {'ne': {'lat': 43.8111863045, 'lng': -79.18812958073042},
   'sw': {'lat': 43.80218629549999, 'lng': -79.2005772192696}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bb6b9446edc76b0d771311c',
       'name': 'Wendy’s',
       'location': {'crossStreet': 'Morningside & Sheppard',
        'lat': 43.80744841934756,
        'lng': -79.19905558052072,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.80744841934756,
          'lng': -79.19905558052072}],
        'distance': 387,
        'cc': 'CA',
        'city': 'Toronto',
    

Let's write a function that extracts the category of the venue.

In [28]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe:

In [29]:
# Clean the json and structure it into a pandas dataframe.
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # Flatten JSON

# Filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# Filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# Clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  after removing the cwd from sys.path.


Unnamed: 0,name,categories,lat,lng
0,Wendy’s,Fast Food Restaurant,43.807448,-79.199056


Let's check the number of venues were returned by Foursquare

In [30]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

1 venues were returned by Foursquare.


**Explore Neighborhoods in Toronto**

Let's create a function to repeat the same process to all the neighborhoods in Toronto.

In [31]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [32]:
# Get Toronto venues

toronto_venues = getNearbyVenues(names=toronto['Neighborhood'],
                                 latitudes=toronto['Latitude'],
                                 longitudes=toronto['Longitude']
                                )

Malvern, Rouge
Rouge Hill, Port Union, Highland Creek
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Cliffside, Cliffcrest, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Wexford Heights, Scarborough Town Centre
Wexford, Maryvale
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Steeles West, L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
York Mills, Silver Hills
Willowdale, Newtonbrook
Willowdale, Willowdale East
York Mills West
Willowdale, Willowdale West
Parkwoods
Don Mills
Don Mills
Bathurst Manor, Wilson Heights, Downsview North
Northwood Park, York University
Downsview
Downsview
Downsview
Downsview
Victoria Village
Parkview Hill, Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto, Broadview North (Old East York)
The Danforth West, 

Let's check the size of the resulting dataframe.

In [33]:
print(toronto_venues.shape)
toronto_venues.head()

(2130, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Chris Effects Painting,43.784343,-79.163742,Construction & Landscaping
2,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


Let's select only those venues related to Pub.

In [34]:
column_list = ['Pub', 'Bar', 'Beer Bar', 'Beer Store', 'Cocktail Bar',
              'Gay Bar', 'Hotel Bar', 'Irish Pub', 'Liquor Store', 'Sake Bar', 'Sports Bar',
              'Wine Bar', 'Wine Shop']

toronto_pubs = toronto_venues.loc[toronto_venues['Venue Category'].isin(column_list)]
toronto_pubs = toronto_pubs.reset_index()
toronto_pubs.drop(['index', 'Neighborhood Latitude', 'Neighborhood Longitude'] , inplace=True, axis=1)

toronto_pubs.head()

Unnamed: 0,Neighborhood,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge Hill, Port Union, Highland Creek",Royal Canadian Legion,43.782533,-79.163085,Bar
1,"Fairview, Henry Farm, Oriole",LCBO,43.778955,-79.345048,Liquor Store
2,"Fairview, Henry Farm, Oriole",St. Louis Bar & Grill,43.777215,-79.345081,Bar
3,Don Mills,The Beer Store,43.726987,-79.341494,Beer Store
4,Don Mills,The Beer Store,43.722704,-79.337508,Beer Store


In [35]:
toronto_pubs.shape

(126, 5)

Let's check how many venues were returned for each neighborhood.

In [36]:
toronto_grouped_venues = toronto_pubs.groupby('Neighborhood')

toronto_grouped_venues.head()

Unnamed: 0,Neighborhood,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge Hill, Port Union, Highland Creek",Royal Canadian Legion,43.782533,-79.163085,Bar
1,"Fairview, Henry Farm, Oriole",LCBO,43.778955,-79.345048,Liquor Store
2,"Fairview, Henry Farm, Oriole",St. Louis Bar & Grill,43.777215,-79.345081,Bar
3,Don Mills,The Beer Store,43.726987,-79.341494,Beer Store
4,Don Mills,The Beer Store,43.722704,-79.337508,Beer Store
5,"Northwood Park, York University",Fox & Fiddle,43.763795,-79.488497,Bar
6,Downsview,LCBO,43.759257,-79.519454,Liquor Store
7,Woodbine Heights,The Beer Store,43.693731,-79.316759,Beer Store
8,The Beaches,Grover Pub and Grub,43.679181,-79.297215,Pub
9,Leaside,Local Leaside,43.710012,-79.363514,Sports Bar


Let's find out how many unique categories can be curated from all the returned venues.

In [37]:
print('There are {} uniques categories.'.format(len(toronto_grouped_venues['Venue Category'].unique())))

There are 41 uniques categories.


Some density based estimator is giving a good tip where to start a new coffee business. There's a `HeatMap` plugin ready in Folium, let's use that, and visualize all the existing Pubs to same map:

In [38]:
from folium import plugins

# Create a map of Toronto Ontario using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add marker to map
for lat, lng, venue, neighborhood in zip(
    toronto_pubs['Venue Latitude'], toronto_pubs['Venue Longitude'],
    toronto_pubs['Venue'], toronto_pubs['Neighborhood']):
    
    label = '{}, {}'.format(neighborhood, venue)
    label = folium.Popup(label, parse_html=True)
    
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False
    ).add_to(map_toronto)

hm_data = toronto_pubs[["Venue Latitude", "Venue Longitude"]].values.tolist()
map_toronto.add_child(plugins.HeatMap(hm_data))

map_toronto

## Results

Based on these results, possibly good locations for new Pub would be in crossroad of Queens west, Fashion district and Entertainment district:

In [39]:
lat = 43.648653
lng = -79.396451

map_toronto = folium.Map(location=[lat, lng], zoom_start=17)

folium.CircleMarker(
    [lat, lng],
    radius=15,
    popup="Our Pub!",
    color='red',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

map_toronto