# The Battle of Neighborhoods Part 3

#### This is the final project notebook for the Applied Data Science Capstone course by IBM. The notebook will be mainly used for the capstone project..

## Week 04 Assignment:

## REPORT:

#### Business Problem:
The business problem that is addressed in this notebook is that, if a person wants to open a new coffee shop then what are the things that he has to look into before opening the shop. Here, by analyzing and exploring all of the Neighborhoods in the Boroughs(North York, East York and York) in the city Vaughan, he can get useful insights about the venues present in the neighborhoods. If he can find a neighborhood where no coffee shop is present currently he could try to establish one in that neighborhood. In this case, the stakeholders are himself and the people in the neighborhoods. As he will be the owner of the coffee shop, and he wants to make profit off of it, he needs to analyze all the neighborhoods near the city. So, he will be the internal stakeholder. And the customer will be the consumers. The popularity and prosperity of his business will very much depend of the customers' mood, whether they like the coffee shop or not, whether they like the services given by the employees or not. So, the customers will be the external stakeholder of the business.

#### Data:
The dataset that I am working on is the Neighborhood data of Canada according to their postal Codes. It has been downloaded from the wikipedia page: [Canada Postal codes](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M.). To scrape the webpage, I have used the "beautifulsoup4" library. The dataset consists of three columns, namely, PostalCode ==> refers to the postal code of each of the Neighborhood, Borough ==> the Borough in which the Neighborhood is situated, and Neighborhood ==> the name of the Neighborhood.
To explore each of the Neighborhoods, where all of the coffee shops, parks, restaurants and other venues, the Foursquare API has been used. To use the Foursquare API I needed the latitude and the longitude values of each of the Neighborhoods. The latitude and the longitude values are collected from this [website](http://cocl.us/Geospatial_data). 

### Part 01: Generating the data

In [1]:
!pip install beautifulsoup4



In [2]:
!pip install lxml



In [3]:
!pip install requests



In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [5]:
from bs4 import BeautifulSoup
import requests

In [6]:
#Getting the source data from wikipedia page
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [7]:
#Using BeautifulSoup4 to read the data
soup = BeautifulSoup(source, 'lxml')

In [8]:
#print(soup.prettify())

In [9]:
#Capturing the data table
table = soup.find("table", attrs={"class":"wikitable"})

In [10]:
table

<table class="wikitable">
<tbody><tr>
<th>Postal code
</th>
<th>Borough
</th>
<th>Neighborhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park / Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor / Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park / Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern / Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M3B
</td>
<td>North York
</td>
<td>Don Mills
</td></tr>
<tr>
<td>M4B
</td>
<td>East York
<

In [11]:
#Getting PostalCode and Borough Columns
PostalCode = []
Borough = []
Neighborhood = []

#Data generation from the table
data = []
for i in table.find_all('tr'):
    for j in i.find_all('td'):
        data.append(j.text.rstrip())

#PostalCode
for i in range(0,len(data),3):
    PostalCode.append(data[i])
#Borough
for i in range(1,len(data),3):
    Borough.append(data[i])

#Neighborhood
for i in range(2,len(data),3):
    Neighborhood.append(data[i])

#PostalCode
#Borough
#Neighborhood

In [12]:
#Zipping all those lists to a particular list
List = list(zip(PostalCode, Borough, Neighborhood))

#Creating the dataframe
columns = ['PostalCode', 'Borough', 'Neighborhood']
df = pd.DataFrame(data=List, columns=columns)

In [13]:
#Checkikng the head of the dataframe
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern / Rouge


In [14]:
#Discarding the rows where Borough is not assigned
df = df.loc[df['Borough']!='Not assigned',:]

In [15]:
#Checking the head again
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern / Rouge
11,M3B,North York,Don Mills
12,M4B,East York,Parkview Hill / Woodbine Gardens
13,M5B,Downtown Toronto,"Garden District, Ryerson"


In [16]:
#Assigning Neighborhood as the same as Borough where Neighborhood is not present
df['Neighborhood'] = df['Neighborhood'].replace('', df['Borough']) 

In [17]:
#Shape of the dataframe
df.shape

(103, 3)

### Part 02: Adding Latitude and Longitude of the Neighborhoods the dataframe

In [18]:
from geopy.geocoders import Nominatim
!pip install geocoder
import geocoder
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium

print('Libraries imported.')

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 8.8MB/s ta 0:00:011
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
   

In [19]:
#Reading coordinate data
coord = "http://cocl.us/Geospatial_data"
coord_df = pd.read_csv(coord)

In [20]:
coord_df.head()
#coord_df.shape

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [21]:
coord_df.rename(columns={"Postal Code":"PostalCode"},inplace=True)

#Concatenating Latitude and Longitude values to the dataframe
df = pd.DataFrame.merge(df,coord_df,on='PostalCode')

In [22]:
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,Parkview Hill / Woodbine Gardens,43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


### Part 03: Using Geopy and Folium library to generate and explore the neighborhoods and boroughs of Toronto

In [23]:
#Using Geopy library to get Latitude and Longitude of Toronto
address = 'Toronto'

geolocator = Nominatim(user_agent="tr_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [24]:
# Creating a map of Neighborhoods using latitude and longitude values of Boroughs in Toronto
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# Adding markers to the map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [37]:
York_data = df[df['Borough'].str.contains('York')].reset_index(drop=True)
York_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
3,M3B,North York,Don Mills,43.745906,-79.352188
4,M4B,East York,Parkview Hill / Woodbine Gardens,43.706397,-79.309937


In [26]:
address = 'Vaughan'

geolocator = Nominatim(user_agent="yr_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of York are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of York are 43.7941544, -79.5268023.


In [27]:
# Creating a map of Neighborhoods using latitude and longitude values of Boroughs situated only in Downtown Toronto
map_york = folium.Map(location=[latitude, longitude], zoom_start=10)

# Adding markers to the map
for lat, lng, borough, neighborhood in zip(York_data['Latitude'], York_data['Longitude'], York_data['Borough'], York_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_york)  
    
map_york

### Part 04: Using Foursquare API

In [28]:
#Defining Foursquare API Client ID, secret key, and version
CLIENT_ID = '5Q22GH3WURNDT2U33WNXOEGPESYBSSLTODWMIXUEHGYRXLXQ' 
CLIENT_SECRET = 'KVZBSGRAJELOR02BXJMSSZGPE2MLVPXQHNIH1VOJI0LDPOSM' 
VERSION = '20180605' 

print('My credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentials:
CLIENT_ID: 5Q22GH3WURNDT2U33WNXOEGPESYBSSLTODWMIXUEHGYRXLXQ
CLIENT_SECRET:KVZBSGRAJELOR02BXJMSSZGPE2MLVPXQHNIH1VOJI0LDPOSM


#### Getting nearby venues of the neighborhoods using Foursquare API

In [32]:
import json
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

In [30]:
# Get data of first neighborhood and use Foursquare API to get some insight of the venues of the neighborhood
neighborhood_latitude = York_data['Latitude'][0] 
neighborhood_longitude = York_data['Longitude'][0] 

neighborhood_name = York_data['Neighborhood'][0] 

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


In [35]:
# Setup API URL to explore venues near by Parkwoods
LIMIT = 150
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, radius, LIMIT)
neighborhood_json = requests.get(url).json()

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = neighborhood_json['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114
2,TTC stop - 44 Valley Woods,Bus Stop,43.755402,-79.333741


##### ***By Exploring parkwoods we can see that it doesnot have any coffee shop in the neighborhood

### Part 05: Exploring all of the neighborhoods of Vaughan (Boroughs: North York, East York and York)

In [38]:
#Function to get nearby venues
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 150;
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
          
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [41]:
#Generate venues of Vaughan and printing the neighborhoods
print("Neighborhoods of Vaughan:")
York_venues = getNearbyVenues(names=York_data['Neighborhood'],
                                   latitudes=York_data['Latitude'],
                                   longitudes=York_data['Longitude']
                                  )

Neighborhoods of Vaughan:
Parkwoods
Victoria Village
Lawrence Manor / Lawrence Heights
Don Mills
Parkview Hill / Woodbine Gardens
Glencairn
Don Mills
Woodbine Heights
Humewood-Cedarvale
Caledonia-Fairbanks
Leaside
Hillcrest Village
Bathurst Manor / Wilson Heights / Downsview North
Thorncliffe Park
Fairview / Henry Farm / Oriole
Northwood Park / York University
East Toronto
Bayview Village
Downsview
York Mills / Silver Hills
Downsview
North Park / Maple Leaf Park / Upwood Park
Humber Summit
Willowdale / Newtonbrook
Downsview
Bedford Park / Lawrence Manor East
Del Ray / Mount Dennis / Keelsdale and Silverthorn
Humberlea / Emery
Willowdale
Downsview
Runnymede / The Junction North
Weston
York Mills West
Willowdale


#### Checking the size of the resulting dataframe

In [43]:
print(York_venues.shape)
York_venues.head()

(337, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,43.755402,-79.333741,Bus Stop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


#### There are 337 venues in the neighborhoods of Vaughan.

#### Checking how many venues were returned for each neighborhood

In [80]:
York_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bathurst Manor / Wilson Heights / Downsview North,20,20,20,20,20,20
Bayview Village,4,4,4,4,4,4
Bedford Park / Lawrence Manor East,23,23,23,23,23,23
Caledonia-Fairbanks,4,4,4,4,4,4
Del Ray / Mount Dennis / Keelsdale and Silverthorn,4,4,4,4,4,4
Don Mills,25,25,25,25,25,25
Downsview,13,13,13,13,13,13
East Toronto,4,4,4,4,4,4
Fairview / Henry Farm / Oriole,66,66,66,66,66,66
Glencairn,4,4,4,4,4,4


#### Finding out how many unique categories can be curated from all the returned venues

In [82]:
print('There are {} uniques categories.'.format(len(York_venues['Venue Category'].unique())))

There are 121 uniques categories.


#### Checking out the neighborhoods containing coffee shops

In [47]:
coffee_shop_neighborhoods = York_venues[York_venues['Venue Category']=='Coffee Shop'].reset_index(drop=True)
coffee_shop_neighborhoods

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
1,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,Tim Hortons,43.719427,-79.467995,Coffee Shop
2,Don Mills,43.7259,-79.340923,Tim Hortons,43.722897,-79.339117,Coffee Shop
3,Don Mills,43.7259,-79.340923,Delimark Cafe,43.727536,-79.339547,Coffee Shop
4,Leaside,43.70906,-79.363452,Aroma Espresso Bar,43.705611,-79.360775,Coffee Shop
5,Leaside,43.70906,-79.363452,Tim Hortons,43.705629,-79.361028,Coffee Shop
6,Leaside,43.70906,-79.363452,Starbucks,43.706564,-79.359591,Coffee Shop
7,Bathurst Manor / Wilson Heights / Downsview North,43.754328,-79.442259,Starbucks,43.755703,-79.440483,Coffee Shop
8,Bathurst Manor / Wilson Heights / Downsview North,43.754328,-79.442259,Tim Hortons,43.754767,-79.44325,Coffee Shop
9,Thorncliffe Park,43.705369,-79.349372,Tim Hortons,43.70509,-79.350545,Coffee Shop


#### Getting all the neighborhoods that doesnot have any coffee shops in them

In [76]:
#Getting the neighborhoods that has coffee shops
neigh_coffeeS = list(coffee_shop_neighborhoods.Neighborhood.unique())
neigh_coffeeS

['Victoria Village',
 'Lawrence Manor / Lawrence Heights',
 'Don Mills',
 'Leaside',
 'Bathurst Manor / Wilson Heights / Downsview North',
 'Thorncliffe Park',
 'Fairview / Henry Farm / Oriole',
 'Northwood Park / York University',
 'East Toronto',
 'Bedford Park / Lawrence Manor East',
 'Willowdale']

In [77]:
#Checking the neighborhoods that doesnot have coffeeshop in them.
temp = York_venues[York_venues['Venue Category']!='Coffee Shop'].reset_index(drop=True)
neigh_not_coffeeS = list(temp.Neighborhood.unique())
neigh_not_coffeeS

['Parkwoods',
 'Victoria Village',
 'Lawrence Manor / Lawrence Heights',
 'Don Mills',
 'Parkview Hill / Woodbine Gardens',
 'Glencairn',
 'Woodbine Heights',
 'Humewood-Cedarvale',
 'Caledonia-Fairbanks',
 'Leaside',
 'Hillcrest Village',
 'Bathurst Manor / Wilson Heights / Downsview North',
 'Thorncliffe Park',
 'Fairview / Henry Farm / Oriole',
 'Northwood Park / York University',
 'East Toronto',
 'Bayview Village',
 'Downsview',
 'North Park / Maple Leaf Park / Upwood Park',
 'Humber Summit',
 'Bedford Park / Lawrence Manor East',
 'Del Ray / Mount Dennis / Keelsdale and Silverthorn',
 'Humberlea / Emery',
 'Willowdale',
 'Runnymede / The Junction North',
 'Weston',
 'York Mills West']

In [73]:
temp_set = set(neigh_not_coffeeS) - set(neigh_coffeeS)

In [79]:
#Neighborhoods that doesnot have any coffee shops in them
temp_list = list(temp_set)
temp_list

['Bayview Village',
 'Del Ray / Mount Dennis / Keelsdale and Silverthorn',
 'Parkview Hill / Woodbine Gardens',
 'Weston',
 'Humberlea / Emery',
 'Glencairn',
 'Woodbine Heights',
 'Parkwoods',
 'York Mills West',
 'Caledonia-Fairbanks',
 'Downsview',
 'Runnymede / The Junction North',
 'Humber Summit',
 'Hillcrest Village',
 'Humewood-Cedarvale',
 'North Park / Maple Leaf Park / Upwood Park']

In [86]:
#Converting the list to a dataframe
temp_neigh = pd.DataFrame(data=temp_list,columns=['Neighborhood'])

In [90]:
#Getting the neighborhoods without a coffee shop
neigh_no_coffeeShops = pd.merge(York_venues, temp_neigh, on=['Neighborhood'], how='inner')
neigh_no_coffeeShops.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,43.755402,-79.333741,Bus Stop
3,Parkview Hill / Woodbine Gardens,43.706397,-79.309937,Jawny Bakers,43.705783,-79.312913,Gastropub
4,Parkview Hill / Woodbine Gardens,43.706397,-79.309937,East York Gymnastics,43.710654,-79.309279,Gym / Fitness Center


### Part 06: More analysis on the neighborhoods without any Coffee shops

In [102]:
# one hot encoding
neigh_no_coffeeShops_onehot = pd.get_dummies(neigh_no_coffeeShops[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
neigh_no_coffeeShops_onehot['Neighborhood'] = neigh_no_coffeeShops['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [neigh_no_coffeeShops_onehot.columns[-1]] + list(neigh_no_coffeeShops_onehot.columns[:-1])
neigh_no_coffeeShops_onehot = neigh_no_coffeeShops_onehot[fixed_columns]

neigh_no_coffeeShops_onehot.head()

Unnamed: 0,Neighborhood,Airport,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Beer Store,Brewery,Bus Line,...,Pharmacy,Pizza Place,Pool,Pub,Restaurant,Sandwich Place,Skating Rink,Spa,Trail,Women's Store
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Parkview Hill / Woodbine Gardens,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Parkview Hill / Woodbine Gardens,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Lets see the new dataframe size

In [104]:
neigh_no_coffeeShops_onehot.shape

(81, 49)

#### Grouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [105]:
neigh_no_coffeeShops_grouped = neigh_no_coffeeShops_onehot.groupby('Neighborhood').mean().reset_index()
neigh_no_coffeeShops_grouped

Unnamed: 0,Neighborhood,Airport,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Beer Store,Brewery,Bus Line,...,Pharmacy,Pizza Place,Pool,Pub,Restaurant,Sandwich Place,Skating Rink,Spa,Trail,Women's Store
0,Bayview Village,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Caledonia-Fairbanks,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25
2,Del Ray / Mount Dennis / Keelsdale and Silvert...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.25,0.25,0.25,0.0,0.0,0.0
3,Downsview,0.076923,0.076923,0.0,0.076923,0.0,0.076923,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Glencairn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
5,Hillcrest Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Humber Summit,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Humberlea / Emery,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Humewood-Cedarvale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0
9,North Park / Maple Leaf Park / Upwood Park,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [106]:
#Lets see the shape of the new dataframe
neigh_no_coffeeShops_grouped.shape

(16, 49)

#### Finding each neighborhood along with the top 5 most common venues

In [108]:
num_top_venues = 5

for hood in neigh_no_coffeeShops_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = neigh_no_coffeeShops_grouped[neigh_no_coffeeShops_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bayview Village----
                 venue  freq
0                 Bank  0.25
1  Japanese Restaurant  0.25
2                 Café  0.25
3   Chinese Restaurant  0.25
4              Airport  0.00


----Caledonia-Fairbanks----
           venue  freq
0           Park  0.50
1         Market  0.25
2  Women's Store  0.25
3            Spa  0.00
4          Trail  0.00


----Del Ray / Mount Dennis / Keelsdale and Silverthorn----
                  venue  freq
0  Fast Food Restaurant  0.25
1          Skating Rink  0.25
2        Sandwich Place  0.25
3            Restaurant  0.25
4               Airport  0.00


----Downsview----
                venue  freq
0       Grocery Store  0.23
1                Park  0.15
2             Airport  0.08
3    Business Service  0.08
4  Athletics & Sports  0.08


----Glencairn----
                      venue  freq
0                      Park  0.25
1                       Pub  0.25
2               Pizza Place  0.25
3       Japanese Restaurant  0.25
4  Mediterranea