# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The business problem that is addressed in this notebook is that, if a person wants to open a new **coffee shop** in a city in **Canada**, then what are the things that he/she has to look into before opening the shop. Here, by analyzing and exploring all of the Neighborhoods in the **Boroughs(North York, East York and York)** in the city **Vaughan**, he can get useful insights about the venues present in the neighborhoods. If he/she can find a neighborhood where no coffee shop is present currently he/she could try to establish one in that neighborhood. Also, he/she has to explore the neighboring neighborhoods to get better insight for his/her business. In this case, the **stakeholders** are **himself/herself** and the people in the neighborhoods. As he/she will be the **owner** of the coffee shop, and he/she wants to make profit off of it, he/she needs to analyze all the neighborhoods near the city. So, he/she will be the **internal stakeholder**. And **the customer** will be the **consumers**. The popularity and prosperity of his/her business will very much depend of the customers' mood, whether they like the coffee shop or not, whether they like the services given by the employees or not. So, the customers will be the **external stakeholder** of the business.

## Data <a name="data"></a>

The dataset that I am working on is the Neighborhood data of Canada according to their postal Codes. It has been downloaded from the wikipedia page: [Canada Postal codes](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M.). To scrape the webpage, I have used the "beautifulsoup4" library. The dataset consists of three columns, namely, PostalCode ==> refers to the postal code of each of the Neighborhood, Borough ==> the Borough in which the Neighborhood is situated, and Neighborhood ==> the name of the Neighborhood.
To explore each of the Neighborhoods, where all of the coffee shops, parks, restaurants and other venues, the Foursquare API has been used. To use the Foursquare API I needed the latitude and the longitude values of each of the Neighborhoods. The latitude and the longitude values are collected from this [website](http://cocl.us/Geospatial_data). 

### Part 01: Generating the data

In [1]:
!pip install beautifulsoup4



In [2]:
!pip install lxml



In [3]:
!pip install requests



In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [5]:
from bs4 import BeautifulSoup
import requests

In [6]:
#Getting the source data from wikipedia page
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [7]:
#Using BeautifulSoup4 to read the data
soup = BeautifulSoup(source, 'lxml')

In [8]:
#print(soup.prettify())

In [9]:
#Capturing the data table
table = soup.find("table", attrs={"class":"wikitable"})

In [10]:
table

<table class="wikitable">
<tbody><tr>
<th>Postal code
</th>
<th>Borough
</th>
<th>Neighborhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park / Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor / Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park / Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern / Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M3B
</td>
<td>North York
</td>
<td>Don Mills
</td></tr>
<tr>
<td>M4B
</td>
<td>East York
<

In [11]:
#Getting PostalCode and Borough Columns
PostalCode = []
Borough = []
Neighborhood = []

#Data generation from the table
data = []
for i in table.find_all('tr'):
    for j in i.find_all('td'):
        data.append(j.text.rstrip())

#PostalCode
for i in range(0,len(data),3):
    PostalCode.append(data[i])
#Borough
for i in range(1,len(data),3):
    Borough.append(data[i])

#Neighborhood
for i in range(2,len(data),3):
    Neighborhood.append(data[i])

#PostalCode
#Borough
#Neighborhood

In [12]:
#Zipping all those lists to a particular list
List = list(zip(PostalCode, Borough, Neighborhood))

#Creating the dataframe
columns = ['PostalCode', 'Borough', 'Neighborhood']
df = pd.DataFrame(data=List, columns=columns)

In [13]:
#Checkikng the head of the dataframe
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern / Rouge


In [14]:
#Discarding the rows where Borough is not assigned
df = df.loc[df['Borough']!='Not assigned',:]

In [15]:
#Checking the head again
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern / Rouge
11,M3B,North York,Don Mills
12,M4B,East York,Parkview Hill / Woodbine Gardens
13,M5B,Downtown Toronto,"Garden District, Ryerson"


In [16]:
#Assigning Neighborhood as the same as Borough where Neighborhood is not present
df['Neighborhood'] = df['Neighborhood'].replace('', df['Borough']) 

In [17]:
#Shape of the dataframe
df.shape

(103, 3)

### Part 02: Adding Latitude and Longitude of the Neighborhoods the dataframe

In [18]:
from geopy.geocoders import Nominatim
!pip install geocoder
import geocoder
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium

print('Libraries imported.')

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 13.2MB/s ta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
   

In [19]:
#Reading coordinate data
coord = "http://cocl.us/Geospatial_data"
coord_df = pd.read_csv(coord)

In [20]:
coord_df.head()
#coord_df.shape

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [21]:
coord_df.rename(columns={"Postal Code":"PostalCode"},inplace=True)

#Concatenating Latitude and Longitude values to the dataframe
df = pd.DataFrame.merge(df,coord_df,on='PostalCode')

In [22]:
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,Parkview Hill / Woodbine Gardens,43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


### Part 03: Using Geopy and Folium library to generate and explore the neighborhoods and boroughs of Toronto

In [23]:
#Using Geopy library to get Latitude and Longitude of Toronto
address = 'Toronto'

geolocator = Nominatim(user_agent="tr_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [24]:
# Creating a map of Neighborhoods using latitude and longitude values of Boroughs in Toronto
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# Adding markers to the map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [25]:
York_data = df[df['Borough'].str.contains('York')].reset_index(drop=True)
York_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
3,M3B,North York,Don Mills,43.745906,-79.352188
4,M4B,East York,Parkview Hill / Woodbine Gardens,43.706397,-79.309937


In [26]:
address = 'Vaughan'

geolocator = Nominatim(user_agent="yr_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of York are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of York are 43.7941544, -79.5268023.


In [27]:
# Creating a map of Neighborhoods using latitude and longitude values of Boroughs situated only in Downtown Toronto
map_york = folium.Map(location=[latitude, longitude], zoom_start=10)

# Adding markers to the map
for lat, lng, borough, neighborhood in zip(York_data['Latitude'], York_data['Longitude'], York_data['Borough'], York_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_york)  
    
map_york

## Methodology <a name="methodology"></a>

As the business problem revolves around opening a coffee shop in a neighborhood in city of Vaughan in Canada, at first step the relevant **boroughs** are selected. The boroughs are: **North York, East York and York**. 

In the second step, **all the neighborhoods** that resides in the boroughs selected have been figured out. After that, using the **foursquare API**, the **venues** that are residing in those neighborhoods are found out.

In the next step, **filtering** of the neighborhoods have been done based on the criteria on the absence of coffee shops. This results in the neighborhoods in those boroughs that does not have any coffee shops in them.

Finally, a **clustering technique (k-means clustering)**  was used to find the clusters of similar neighborhoods. The clustering gives the necessary insight that is needed to find a place where if the coffee shop is established would result in **higher profit and customer satisfaction** for the owner. 

### Part 04: Using Foursquare API

In [28]:
#Defining Foursquare API Client ID, secret key, and version
CLIENT_ID = '5Q22GH3WURNDT2U33WNXOEGPESYBSSLTODWMIXUEHGYRXLXQ' 
CLIENT_SECRET = 'KVZBSGRAJELOR02BXJMSSZGPE2MLVPXQHNIH1VOJI0LDPOSM' 
VERSION = '20180605' 

print('My credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentials:
CLIENT_ID: 5Q22GH3WURNDT2U33WNXOEGPESYBSSLTODWMIXUEHGYRXLXQ
CLIENT_SECRET:KVZBSGRAJELOR02BXJMSSZGPE2MLVPXQHNIH1VOJI0LDPOSM


#### Getting nearby venues of the neighborhoods using Foursquare API

In [29]:
import json
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

In [30]:
# Get data of first neighborhood and use Foursquare API to get some insight of the venues of the neighborhood
neighborhood_latitude = York_data['Latitude'][0] 
neighborhood_longitude = York_data['Longitude'][0] 

neighborhood_name = York_data['Neighborhood'][0] 

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


In [31]:
# Setup API URL to explore venues near by Parkwoods
LIMIT = 150
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, radius, LIMIT)
neighborhood_json = requests.get(url).json()

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = neighborhood_json['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114


##### ***By Exploring parkwoods we can see that it doesnot have any coffee shop in the neighborhood

### Part 05: Exploring all of the neighborhoods of Vaughan (Boroughs: North York, East York and York)

In [32]:
#Function to get nearby venues
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 150;
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
          
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [33]:
#Generate venues of Vaughan and printing the neighborhoods
print("Neighborhoods of Vaughan:")
York_venues = getNearbyVenues(names=York_data['Neighborhood'],
                                   latitudes=York_data['Latitude'],
                                   longitudes=York_data['Longitude']
                                  )

Neighborhoods of Vaughan:
Parkwoods
Victoria Village
Lawrence Manor / Lawrence Heights
Don Mills
Parkview Hill / Woodbine Gardens
Glencairn
Don Mills
Woodbine Heights
Humewood-Cedarvale
Caledonia-Fairbanks
Leaside
Hillcrest Village
Bathurst Manor / Wilson Heights / Downsview North
Thorncliffe Park
Fairview / Henry Farm / Oriole
Northwood Park / York University
East Toronto
Bayview Village
Downsview
York Mills / Silver Hills
Downsview
North Park / Maple Leaf Park / Upwood Park
Humber Summit
Willowdale / Newtonbrook
Downsview
Bedford Park / Lawrence Manor East
Del Ray / Mount Dennis / Keelsdale and Silverthorn
Humberlea / Emery
Willowdale
Downsview
Runnymede / The Junction North
Weston
York Mills West
Willowdale


#### Checking the size of the resulting dataframe

In [34]:
print(York_venues.shape)
York_venues.head()

(342, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


#### There are 337 venues in the neighborhoods of Vaughan.

#### Checking how many venues were returned for each neighborhood

In [35]:
York_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bathurst Manor / Wilson Heights / Downsview North,19,19,19,19,19,19
Bayview Village,4,4,4,4,4,4
Bedford Park / Lawrence Manor East,25,25,25,25,25,25
Caledonia-Fairbanks,4,4,4,4,4,4
Del Ray / Mount Dennis / Keelsdale and Silverthorn,5,5,5,5,5,5
Don Mills,24,24,24,24,24,24
Downsview,14,14,14,14,14,14
East Toronto,4,4,4,4,4,4
Fairview / Henry Farm / Oriole,69,69,69,69,69,69
Glencairn,4,4,4,4,4,4


#### Finding out how many unique categories can be curated from all the returned venues

In [36]:
print('There are {} uniques categories.'.format(len(York_venues['Venue Category'].unique())))

There are 120 uniques categories.


#### Checking out the neighborhoods containing coffee shops

In [37]:
coffee_shop_neighborhoods = York_venues[York_venues['Venue Category']=='Coffee Shop'].reset_index(drop=True)
coffee_shop_neighborhoods

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
1,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,Tim Hortons,43.719427,-79.467995,Coffee Shop
2,Don Mills,43.7259,-79.340923,Tim Hortons,43.722897,-79.339117,Coffee Shop
3,Don Mills,43.7259,-79.340923,Delimark Cafe,43.727536,-79.339547,Coffee Shop
4,Leaside,43.70906,-79.363452,Aroma Espresso Bar,43.705611,-79.360775,Coffee Shop
5,Leaside,43.70906,-79.363452,Tim Hortons,43.705629,-79.361028,Coffee Shop
6,Leaside,43.70906,-79.363452,Starbucks,43.706564,-79.359591,Coffee Shop
7,Bathurst Manor / Wilson Heights / Downsview North,43.754328,-79.442259,Starbucks,43.755703,-79.440483,Coffee Shop
8,Bathurst Manor / Wilson Heights / Downsview North,43.754328,-79.442259,Tim Hortons,43.754767,-79.44325,Coffee Shop
9,Thorncliffe Park,43.705369,-79.349372,Tim Hortons,43.70509,-79.350545,Coffee Shop


#### Getting all the neighborhoods that doesnot have any coffee shops in them

In [38]:
#Getting the neighborhoods that has coffee shops
neigh_coffeeS = list(coffee_shop_neighborhoods.Neighborhood.unique())
neigh_coffeeS

['Victoria Village',
 'Lawrence Manor / Lawrence Heights',
 'Don Mills',
 'Leaside',
 'Bathurst Manor / Wilson Heights / Downsview North',
 'Thorncliffe Park',
 'Fairview / Henry Farm / Oriole',
 'Northwood Park / York University',
 'East Toronto',
 'Bedford Park / Lawrence Manor East',
 'Del Ray / Mount Dennis / Keelsdale and Silverthorn',
 'Willowdale']

In [39]:
#Checking the neighborhoods that doesnot have coffeeshop in them.
temp = York_venues[York_venues['Venue Category']!='Coffee Shop'].reset_index(drop=True)
neigh_not_coffeeS = list(temp.Neighborhood.unique())
neigh_not_coffeeS

['Parkwoods',
 'Victoria Village',
 'Lawrence Manor / Lawrence Heights',
 'Don Mills',
 'Parkview Hill / Woodbine Gardens',
 'Glencairn',
 'Woodbine Heights',
 'Humewood-Cedarvale',
 'Caledonia-Fairbanks',
 'Leaside',
 'Hillcrest Village',
 'Bathurst Manor / Wilson Heights / Downsview North',
 'Thorncliffe Park',
 'Fairview / Henry Farm / Oriole',
 'Northwood Park / York University',
 'East Toronto',
 'Bayview Village',
 'Downsview',
 'North Park / Maple Leaf Park / Upwood Park',
 'Humber Summit',
 'Bedford Park / Lawrence Manor East',
 'Del Ray / Mount Dennis / Keelsdale and Silverthorn',
 'Humberlea / Emery',
 'Willowdale',
 'Runnymede / The Junction North',
 'Weston',
 'York Mills West']

In [40]:
temp_set = set(neigh_not_coffeeS) - set(neigh_coffeeS)

In [41]:
#Neighborhoods that doesnot have any coffee shops in them
temp_list = list(temp_set)
temp_list

['Woodbine Heights',
 'Parkwoods',
 'Humberlea / Emery',
 'York Mills West',
 'Hillcrest Village',
 'Caledonia-Fairbanks',
 'Bayview Village',
 'North Park / Maple Leaf Park / Upwood Park',
 'Parkview Hill / Woodbine Gardens',
 'Downsview',
 'Glencairn',
 'Humber Summit',
 'Humewood-Cedarvale',
 'Weston',
 'Runnymede / The Junction North']

In [42]:
#Converting the list to a dataframe
temp_neigh = pd.DataFrame(data=temp_list,columns=['Neighborhood'])

In [43]:
#Getting the neighborhoods without a coffee shop
neigh_no_coffeeShops = pd.merge(York_venues, temp_neigh, on=['Neighborhood'], how='inner')
neigh_no_coffeeShops.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkview Hill / Woodbine Gardens,43.706397,-79.309937,Jawny Bakers,43.705783,-79.312913,Gastropub
3,Parkview Hill / Woodbine Gardens,43.706397,-79.309937,East York Gymnastics,43.710654,-79.309279,Gym / Fitness Center
4,Parkview Hill / Woodbine Gardens,43.706397,-79.309937,Shoppers Drug Mart,43.705933,-79.312825,Pharmacy


## Analysis <a name="analysis"></a>

### Part 06: Analysis on the neighborhoods without any Coffee shops

In [44]:
# one hot encoding
neigh_no_coffeeShops_onehot = pd.get_dummies(neigh_no_coffeeShops[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
neigh_no_coffeeShops_onehot['Neighborhood'] = neigh_no_coffeeShops['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [neigh_no_coffeeShops_onehot.columns[-1]] + list(neigh_no_coffeeShops_onehot.columns[:-1])
neigh_no_coffeeShops_onehot = neigh_no_coffeeShops_onehot[fixed_columns]

neigh_no_coffeeShops_onehot.head()

Unnamed: 0,Neighborhood,Airport,Asian Restaurant,Athletics & Sports,Bakery,Bank,Baseball Field,Basketball Court,Beer Store,Brewery,...,Pet Store,Pharmacy,Pizza Place,Pool,Pub,Shopping Mall,Skating Rink,Trail,Video Store,Women's Store
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkview Hill / Woodbine Gardens,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Parkview Hill / Woodbine Gardens,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Parkview Hill / Woodbine Gardens,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0


Lets see the new dataframe size

In [45]:
neigh_no_coffeeShops_onehot.shape

(76, 46)

#### Grouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [46]:
neigh_no_coffeeShops_grouped = neigh_no_coffeeShops_onehot.groupby('Neighborhood').mean().reset_index()
neigh_no_coffeeShops_grouped

Unnamed: 0,Neighborhood,Airport,Asian Restaurant,Athletics & Sports,Bakery,Bank,Baseball Field,Basketball Court,Beer Store,Brewery,...,Pet Store,Pharmacy,Pizza Place,Pool,Pub,Shopping Mall,Skating Rink,Trail,Video Store,Women's Store
0,Bayview Village,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Caledonia-Fairbanks,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25
2,Downsview,0.071429,0.0,0.071429,0.0,0.071429,0.071429,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0
3,Glencairn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0
4,Hillcrest Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0
5,Humber Summit,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Humberlea / Emery,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Humewood-Cedarvale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0
8,North Park / Maple Leaf Park / Upwood Park,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Parkview Hill / Woodbine Gardens,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,...,0.090909,0.090909,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [47]:
#Lets see the shape of the new dataframe
neigh_no_coffeeShops_grouped.shape

(15, 46)

#### Finding each neighborhood along with the top 5 most common venues

In [48]:
num_top_venues = 5

for hood in neigh_no_coffeeShops_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = neigh_no_coffeeShops_grouped[neigh_no_coffeeShops_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bayview Village----
                 venue  freq
0                 Bank  0.25
1  Japanese Restaurant  0.25
2                 Café  0.25
3   Chinese Restaurant  0.25
4              Airport  0.00


----Caledonia-Fairbanks----
           venue  freq
0           Park  0.50
1         Market  0.25
2  Women's Store  0.25
3          Trail  0.00
4    Video Store  0.00


----Downsview----
                  venue  freq
0         Grocery Store  0.21
1                  Park  0.14
2         Shopping Mall  0.07
3            Food Truck  0.07
4  Gym / Fitness Center  0.07


----Glencairn----
                      venue  freq
0                      Park  0.25
1                       Pub  0.25
2               Pizza Place  0.25
3       Japanese Restaurant  0.25
4  Mediterranean Restaurant  0.00


----Hillcrest Village----
                      venue  freq
0               Golf Course   0.2
1  Mediterranean Restaurant   0.2
2      Fast Food Restaurant   0.2
3                   Dog Run   0.2
4           

#### Sorting the venues in descending order

In [49]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [50]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# creating columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# creating a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = neigh_no_coffeeShops_grouped['Neighborhood']

for ind in np.arange(neigh_no_coffeeShops_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(neigh_no_coffeeShops_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bayview Village,Café,Chinese Restaurant,Bank,Japanese Restaurant,Empanada Restaurant,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop
1,Caledonia-Fairbanks,Park,Women's Store,Market,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop,Convenience Store,Construction & Landscaping
2,Downsview,Grocery Store,Park,Airport,Baseball Field,Food Truck,Gym / Fitness Center,Liquor Store,Discount Store,Shopping Mall,Bank
3,Glencairn,Japanese Restaurant,Pub,Pizza Place,Park,Women's Store,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop,Convenience Store
4,Hillcrest Village,Dog Run,Golf Course,Pool,Mediterranean Restaurant,Fast Food Restaurant,Women's Store,Chinese Restaurant,Discount Store,Dance Studio,Curling Ice


### Part 07: Clustering the Neighborhoods

#### Running K-means clustering algorithm to cluster the neighborhoods

In [51]:
# setting number of clusters
kclusters = 5

neigh_no_coffeeShops_grouped_clustering = neigh_no_coffeeShops_grouped.drop('Neighborhood', 1)

# running k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(neigh_no_coffeeShops_grouped_clustering)

# checking cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 3, 0, 0, 0, 2, 1, 4, 0, 0], dtype=int32)

In [52]:
# adding clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

neigh_no_coffeeShops_merged = neigh_no_coffeeShops

# merging neigh_no_coffeeShops_grouped with neigh_no_coffeeShops to add latitude/longitude for each neighborhood
neigh_no_coffeeShops_merged = neigh_no_coffeeShops_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

neigh_no_coffeeShops_merged.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park,3,Food & Drink Shop,Park,Café,Empanada Restaurant,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop,Convenience Store
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop,3,Food & Drink Shop,Park,Café,Empanada Restaurant,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop,Convenience Store
2,Parkview Hill / Woodbine Gardens,43.706397,-79.309937,Jawny Bakers,43.705783,-79.312913,Gastropub,0,Pizza Place,Bus Line,Fast Food Restaurant,Athletics & Sports,Bank,Pharmacy,Pet Store,Gastropub,Gym / Fitness Center,Intersection
3,Parkview Hill / Woodbine Gardens,43.706397,-79.309937,East York Gymnastics,43.710654,-79.309279,Gym / Fitness Center,0,Pizza Place,Bus Line,Fast Food Restaurant,Athletics & Sports,Bank,Pharmacy,Pet Store,Gastropub,Gym / Fitness Center,Intersection
4,Parkview Hill / Woodbine Gardens,43.706397,-79.309937,Shoppers Drug Mart,43.705933,-79.312825,Pharmacy,0,Pizza Place,Bus Line,Fast Food Restaurant,Athletics & Sports,Bank,Pharmacy,Pet Store,Gastropub,Gym / Fitness Center,Intersection


#### Creating a map for the clusters

In [53]:
# creating map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# setting color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# adding markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(neigh_no_coffeeShops_merged['Neighborhood Latitude'], neigh_no_coffeeShops_merged['Neighborhood Longitude'], neigh_no_coffeeShops_merged['Neighborhood'], neigh_no_coffeeShops_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Part 08: Examining the clusters

##### Cluster 1

In [54]:
neigh_no_coffeeShops_merged.loc[neigh_no_coffeeShops_merged['Cluster Labels'] == 0, neigh_no_coffeeShops_merged.columns[[1] + list(range(5, neigh_no_coffeeShops_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,43.706397,-79.312913,Gastropub,0,Pizza Place,Bus Line,Fast Food Restaurant,Athletics & Sports,Bank,Pharmacy,Pet Store,Gastropub,Gym / Fitness Center,Intersection
3,43.706397,-79.309279,Gym / Fitness Center,0,Pizza Place,Bus Line,Fast Food Restaurant,Athletics & Sports,Bank,Pharmacy,Pet Store,Gastropub,Gym / Fitness Center,Intersection
4,43.706397,-79.312825,Pharmacy,0,Pizza Place,Bus Line,Fast Food Restaurant,Athletics & Sports,Bank,Pharmacy,Pet Store,Gastropub,Gym / Fitness Center,Intersection
5,43.706397,-79.31227,Bank,0,Pizza Place,Bus Line,Fast Food Restaurant,Athletics & Sports,Bank,Pharmacy,Pet Store,Gastropub,Gym / Fitness Center,Intersection
6,43.706397,-79.31313,Pizza Place,0,Pizza Place,Bus Line,Fast Food Restaurant,Athletics & Sports,Bank,Pharmacy,Pet Store,Gastropub,Gym / Fitness Center,Intersection
7,43.706397,-79.312196,Pet Store,0,Pizza Place,Bus Line,Fast Food Restaurant,Athletics & Sports,Bank,Pharmacy,Pet Store,Gastropub,Gym / Fitness Center,Intersection
8,43.706397,-79.313274,Intersection,0,Pizza Place,Bus Line,Fast Food Restaurant,Athletics & Sports,Bank,Pharmacy,Pet Store,Gastropub,Gym / Fitness Center,Intersection
9,43.706397,-79.313957,Pizza Place,0,Pizza Place,Bus Line,Fast Food Restaurant,Athletics & Sports,Bank,Pharmacy,Pet Store,Gastropub,Gym / Fitness Center,Intersection
10,43.706397,-79.313808,Bus Line,0,Pizza Place,Bus Line,Fast Food Restaurant,Athletics & Sports,Bank,Pharmacy,Pet Store,Gastropub,Gym / Fitness Center,Intersection
11,43.706397,-79.314105,Fast Food Restaurant,0,Pizza Place,Bus Line,Fast Food Restaurant,Athletics & Sports,Bank,Pharmacy,Pet Store,Gastropub,Gym / Fitness Center,Intersection


##### Cluster 2

In [55]:
neigh_no_coffeeShops_merged.loc[neigh_no_coffeeShops_merged['Cluster Labels'] == 1, neigh_no_coffeeShops_merged.columns[[1] + list(range(5, neigh_no_coffeeShops_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
63,43.724766,-79.532854,Baseball Field,1,Baseball Field,Women's Store,Chinese Restaurant,Empanada Restaurant,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop,Convenience Store


##### Cluster 3

In [56]:
neigh_no_coffeeShops_merged.loc[neigh_no_coffeeShops_merged['Cluster Labels'] == 2, neigh_no_coffeeShops_merged.columns[[1] + list(range(5, neigh_no_coffeeShops_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,43.756303,-79.570637,Empanada Restaurant,2,Empanada Restaurant,Women's Store,Field,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop,Convenience Store,Construction & Landscaping


##### Cluster 4

In [57]:
neigh_no_coffeeShops_merged.loc[neigh_no_coffeeShops_merged['Cluster Labels'] == 3, neigh_no_coffeeShops_merged.columns[[1] + list(range(5, neigh_no_coffeeShops_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,43.753259,-79.33214,Park,3,Food & Drink Shop,Park,Café,Empanada Restaurant,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop,Convenience Store
1,43.753259,-79.333114,Food & Drink Shop,3,Food & Drink Shop,Park,Café,Empanada Restaurant,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop,Convenience Store
31,43.689026,-79.4563,Park,3,Park,Women's Store,Market,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop,Convenience Store,Construction & Landscaping
32,43.689026,-79.456333,Women's Store,3,Park,Women's Store,Market,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop,Convenience Store,Construction & Landscaping
33,43.689026,-79.456317,Market,3,Park,Women's Store,Market,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop,Convenience Store,Construction & Landscaping
34,43.689026,-79.448924,Park,3,Park,Women's Store,Market,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop,Convenience Store,Construction & Landscaping
68,43.706876,-79.515869,Park,3,Park,Convenience Store,Women's Store,Café,Empanada Restaurant,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop
69,43.706876,-79.521705,Park,3,Park,Convenience Store,Women's Store,Café,Empanada Restaurant,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop
70,43.706876,-79.515789,Convenience Store,3,Park,Convenience Store,Women's Store,Café,Empanada Restaurant,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop
71,43.706876,-79.522648,Park,3,Park,Convenience Store,Women's Store,Café,Empanada Restaurant,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop


##### Cluster 5

In [58]:
neigh_no_coffeeShops_merged.loc[neigh_no_coffeeShops_merged['Cluster Labels'] == 4, neigh_no_coffeeShops_merged.columns[[1] + list(range(5, neigh_no_coffeeShops_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,43.693781,-79.428705,Field,4,Field,Trail,Hockey Arena,Chinese Restaurant,Empanada Restaurant,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop
29,43.693781,-79.426106,Trail,4,Field,Trail,Hockey Arena,Chinese Restaurant,Empanada Restaurant,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop
30,43.693781,-79.431761,Hockey Arena,4,Field,Trail,Hockey Arena,Chinese Restaurant,Empanada Restaurant,Dog Run,Discount Store,Dance Studio,Curling Ice,Cosmetics Shop


From our cluster analysis, we can see that the neighborhoods that falls in **cluster 0** and **cluster 3** has more venues in them than the other clusters. So, those neighborhoods might have more **potential customers** for any business.

In [62]:
#Finding out the neighborhoods of interest
neighborhoods_of_interest_1 = neigh_no_coffeeShops_merged[neigh_no_coffeeShops_merged['Cluster Labels'] == 0].Neighborhood
neighborhoods_of_interest_2 = neigh_no_coffeeShops_merged[neigh_no_coffeeShops_merged['Cluster Labels'] == 3].Neighborhood

In [65]:
#Neighborhoods of interest: 01
print(neighborhoods_of_interest_1.unique())

['Parkview Hill / Woodbine Gardens' 'Glencairn' 'Woodbine Heights'
 'Hillcrest Village' 'Bayview Village' 'Downsview'
 'North Park / Maple Leaf Park / Upwood Park'
 'Runnymede / The Junction North']


In [66]:
#Neighborhoods of interest: 02
print(neighborhoods_of_interest_2.unique())

['Parkwoods' 'Caledonia-Fairbanks' 'Weston' 'York Mills West']


## Results and Discussion <a name="results"></a>

So the cluster analysis results in 5 clusters of neighborhoods present in the boroughs of: North York, East York and York. To select the neighborhoods that would be perfect for opening a coffee shop two neighborhoods clusters have been selected, namely **cluster 0** and **cluster 3**. 

In cluster 0, the neighborhoods present are: 
'Parkview Hill / Woodbine Gardens', 'Glencairn', 'Woodbine Heights', 'Hillcrest Village', 'Bayview Village', 'Downsview', 'North Park / Maple Leaf Park / Upwood Park', 'Runnymede / The Junction North'.

In cluster 3, the neighborhoods present are:
'Parkwoods', 'Caledonia-Fairbanks', 'Weston', 'York Mills West'.

Although they fall in the same cluster, the distance between neighborhoods in cluster 3 is much greater than the neighborhoods in cluster 0.
So neighborhoods in cluster 0 would be a good choice for a potential neighborhood to open a coffee shop based on business perspective. Remember, the data that have been worked on, consists only of the neighborhoods that does not have any coffee shops in them. From the map analysis of the clusters it is found that the **Downsview** neighborhood might be the best choice in cluster 0.   

## Conclusion <a name="conclusion"></a>

Although the dataset consists of neighborhood data of every city in Canada and the foursquare API has been used to find out all the venues residing in those neighborhoods, but lack of population data, population density data in the neighborhoods certainly limit the capability to get a proper analysis of the business potential of each neighborhood. But, based on the current data, it can be said that, **Downsview** is a good choice to open a coffee shop in the city of Vaughan. 