<a href="https://colab.research.google.com/github/oscarfdezmora/Coursera_Capstone/blob/master/Capstone_Project_The_Battle_of_Neighborhoods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Introduction**

---



A firm of medical supplies wants to open a new pharmacy in Toronto, and is interested in looking for new opportunities Scarborough.  

Taking into account the density of this kind of commerces in the area, we will offer a set most likely boroughs to open their store. 

This kind of study would be suitable for profiling any kind of store or venue in a sets of address, by means of geospatial tools such as Foursquare or Google Maps.

# **Data Section**

---



We will focus on those nighborhoods which are addressed in **Scarborough, Toronto**. This locations will be provided by the postal code information settled in Wikipedia (source: [List of postal codes of Canada: M](https://www.wikiwand.com/en/List_of_postal_codes_of_Canada:_M))

**Geocoder** was used to determinate latitude and longitude co-ordinates

The **Foursquare API** will be use to find locations and look for those area which lack of pharmacies in the nearbies.

The data retrieved contains information of venues within a specified distance of the longitude and latitude of the postcodes. The information obtained per venue as follows:

1. Neighborhood
2. Neighborhood Latitude
3. Neighborhood Longitude
4. Venue
5. Name of the venue e.g. the name of a store or restaurant
6. Venue Latitude
7. Venue Longitude
8. Venue Category


# **Methodology**

### **Data Preparation - Wikipedia**


---


The **postal information** from Wikipedia was downloaded and transform into a pandas dataframe, which had three columns:


*   Postal Code

*   Borough

*   Neighborhood



Once done this, common extraction and data cleaning were done:


*   Only the cells that have an assigned borough were processed; cells with a borough that is Not assigned were ignored.

*   More than one neighborhood can exist in one postal code area.

*   If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.


Using **Geocoder**, we retrieve the latitude and longitude for each neighboorhood, and add them as columns. 
Our data frame will have five columns.

*   Postal Code

*   Borough

*   Neighborhood

*   Latitude

*   Longitude




In [204]:
## Libraries and environment setup

!pip install geocoder
!pip install folium

import pandas as pd
import requests
import numpy as np
import geocoder
import folium
import requests 
import matplotlib.cm as cm
import matplotlib.colors as colors
import json
import xml
import matplotlib.pyplot as plt
%matplotlib inline

from pandas.io.json import json_normalize 
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim 
from bs4 import BeautifulSoup

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)



In [205]:
## Get Data from Wikipedia
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [206]:
# Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
df = df[df.Borough != 'Not assigned']
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [207]:
# More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
df = df.groupby(['Postal Code', 'Borough'])['Neighborhood'].apply(list).apply(lambda x:', '.join(x)).to_frame().reset_index()
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [208]:
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
for index, row in df.iterrows():
    if row['Neighborhood'] == 'Not assigned':
        row['Neighborhood'] = row['Borough']

In [209]:
## Define function to retrieve Latitude & Longitude from Geocoder

def get_latilong(postal_code):
    coords = None
    while(coords is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
        coords = g.latlng
    return coords

In [210]:
# Retrieving Postal Code Co-ordinates
postal_codes = df['Postal Code']    
coords = [ get_latilong(postal_code) for postal_code in postal_codes.tolist() ]

# Adding Columns Latitude & Longitude to our DataFrame
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
df['Latitude'] = df_coords['Latitude']
df['Longitude'] = df_coords['Longitude']

df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.808626,-79.189913
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.785779,-79.157368
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.765806,-79.185284
3,M1G,Scarborough,Woburn,43.771545,-79.218135
4,M1H,Scarborough,Cedarbrae,43.768791,-79.238813


### **Data Preparation - Foursquare**

---

Using the dataset with the information of co-ordinates of the Neighbours, we will call the **Foursquare API** to get all the venues in a radius of 700m for each neighborhood.

The information is stored in a new dataframe with the columns


*   Neighborhood	

*   Neighborhood	Latitude

*   Neighborhood	Longitude

*   Venue	

*   Venue	Category




In [211]:
##Credentials for Foursquare connection

CLIENT_ID = "UAH2DBVN5TBHSYUMSYKXPGEERYSEXBEKBE5XA2RDDAHYLXHX"
CLIENT_SECRET = "RWH0BUK1B5ZGIFJS0BNCTHESAIXLI0FHY0RXR40ME3RRBNSO"
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UAH2DBVN5TBHSYUMSYKXPGEERYSEXBEKBE5XA2RDDAHYLXHX
CLIENT_SECRET:RWH0BUK1B5ZGIFJS0BNCTHESAIXLI0FHY0RXR40ME3RRBNSO


In [216]:
## Function that will get all the venues from a given location, in 700 m round

def getNearbyVenues(names, latitudes, longitudes, radius=700):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # making GET request
        venue_results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in venue_results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

In [227]:
# Get all the venues from the list of neighbourhoods
Scarborough_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Malvern, Rouge


KeyError: ignored

In [220]:
## Get all the neighbours which have a pharmacy 
Scarborough_pharmacy = Scarborough_venues[Scarborough_venues["Venue Category"] == "Pharmacy"]
Scarborough_pharmacy.reset_index(drop = True, inplace = True)

Scarborough_pharmacy

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
0,"Guildwood, Morningside, West Hill",43.765806,-79.185284,Shoppers Drug Mart,Pharmacy
1,"Kennedy Park, Ionview, East Birchmount Park",43.726881,-79.265694,Shoppers Drug Mart,Pharmacy
2,"Cliffside, Cliffcrest, Scarborough Village West",43.723538,-79.228353,Shoppers Drug Mart,Pharmacy
3,Agincourt,43.79393,-79.265694,Shoppers Drug Mart,Pharmacy
4,"Clarks Corners, Tam O'Shanter, Sullivan",43.784902,-79.304725,Shoppers Drug Mart,Pharmacy
5,"Clarks Corners, Tam O'Shanter, Sullivan",43.784902,-79.304725,Rexall,Pharmacy
6,"Clarks Corners, Tam O'Shanter, Sullivan",43.784902,-79.304725,Rexall,Pharmacy
7,"Milliken, Agincourt North, Steeles East, L'Amo...",43.817998,-79.280887,Brimley Road Medical Center I.D.A.,Pharmacy
8,"Milliken, Agincourt North, Steeles East, L'Amo...",43.817998,-79.280887,IDA Pharmacy,Pharmacy
9,"Steeles West, L'Amoreaux West",43.80053,-79.32183,Shoppers Drug Mart,Pharmacy


### **Clustering over number of Pharmacies**


---

With the results of our Foursquare API, we will get the number of Pharmacies that are in each neighborhood.

With this, we will clusterize the neighborhoods depending in the number of Pharmacies they have in their nearbies.

In addition, this information will be join to the main data set in order to make graphical analysis.

In [221]:
## We count the number of Pharmacies that are in each neighborhood

Scarborough_phar_coun = Scarborough_pharmacy['Neighborhood'].value_counts()

Scarborough_phar_coun = Scarborough_phar_coun.to_frame()
Scarborough_phar_coun.rename(columns={'Neighborhood':'Pharmacy'},inplace=True)
Scarborough_phar_coun


Unnamed: 0,Pharmacy
"Clarks Corners, Tam O'Shanter, Sullivan",3
"Milliken, Agincourt North, Steeles East, L'Amoreaux East",2
"Islington Avenue, Humber Valley Village",2
"Kennedy Park, Ionview, East Birchmount Park",1
"Fairview, Henry Farm, Oriole",1
"High Park, The Junction South",1
Hillcrest Village,1
"Cliffside, Cliffcrest, Scarborough Village West",1
"Bedford Park, Lawrence Manor East",1
"Guildwood, Morningside, West Hill",1


In [222]:
## We join the number of Pharmacies to each neighboorhood in our data set with the coordinates.

df_joined = df.set_index("Neighborhood").join(Scarborough_phar_coun,how='outer',lsuffix='main', rsuffix='phar')
df_final = df_joined.fillna(0)
df_final

Unnamed: 0,Postal Code,Borough,Latitude,Longitude,Pharmacy
Agincourt,M1S,Scarborough,43.79393,-79.265694,1.0
"Alderwood, Long Branch",M8W,Etobicoke,43.600895,-79.540387,0.0
"Bathurst Manor, Wilson Heights, Downsview North",M3H,North York,43.757394,-79.442394,1.0
Bayview Village,M2K,North York,43.780607,-79.376921,0.0
"Bedford Park, Lawrence Manor East",M5M,North York,43.735447,-79.417944,1.0
Berczy Park,M5E,Downtown Toronto,43.645196,-79.373855,0.0
"Birch Cliff, Cliffside West",M1N,Scarborough,43.696448,-79.265642,0.0
"Brockton, Parkdale Village, Exhibition Place",M6K,West Toronto,43.639922,-79.43124,0.0
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",M7Y,East Toronto,43.6487,-79.38545,0.0
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",M5V,Downtown Toronto,43.640539,-79.397435,0.0


In [223]:
## A representation of Scarborough map, painting in green those neighborhoods without Pharmacies, yellow those with one, and red those with two or more

map_Scarborough = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, nei, far in zip(df_final['Latitude'], df_final['Longitude'], df_final.index, df_final['Pharmacy']):
    
    label = '{}'.format(nei)
    label = folium.Popup(label, parse_html=True)
    if far>1:
      change='red'
    elif far==1: 
      change="yellow"
    else: 
      change="green"

    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=change,
        fill=True,
        fill_color=change,
        fill_opacity=0.7,
        parse_html=False).add_to(map_Scarborough)  
    
map_Scarborough

In [224]:
## Which areas are grear opportunities
df_final[df_final['Pharmacy'] ==0].index

Index(['Alderwood, Long Branch', 'Bayview Village', 'Berczy Park',
       'Birch Cliff, Cliffside West',
       'Brockton, Parkdale Village, Exhibition Place',
       'Business reply mail Processing Centre, South Central Letter Processing Plant Toronto',
       'CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport',
       'Caledonia-Fairbanks', 'Canada Post Gateway Processing Centre',
       'Cedarbrae', 'Central Bay Street', 'Christie', 'Church and Wellesley',
       'Commerce Court, Victoria Hotel', 'Davisville',
       'Del Ray, Mount Dennis, Keelsdale and Silverthorn',
       'Dorset Park, Wexford Heights, Scarborough Town Centre',
       'Dufferin, Dovercourt Village',
       'East Toronto, Broadview North (Old East York)',
       'First Canadian Place, Underground city', 'Garden District, Ryerson',
       'Glencairn', 'Golden Mile, Clairlea, Oakridge',
       'Harbourfront East, Union Station, Toronto Islands', 'Humber Summit

In [225]:
## Main Pharmacy competitors

Scarborough_pharmacy['Venue'].value_counts()

Shoppers Drug Mart                    23
Rexall                                 3
Cliffwood I.D.A. Pharmacy              1
Thorncrest Drug Store                  1
Brimley Road Medical Center I.D.A.     1
IDA Pharmacy                           1
IDA High Park                          1
Name: Venue, dtype: int64

# **Results**

---

Having analyzed **104 Neighborhoods**, 
**27 of them had at least a pharmacy** identified as venue in Foursquare.

In addition, there are only three locations which have more than one Pharmacy: 

*   Clarks Corners, Tam O'Shanter, Sullivan

*   Milliken, Agincourt North, Steeles East, L'Amoreaux East

*   Islington Avenue, Humber Valley Village



---


Considering this, in order to get a clear view of the opportunities, we define three clusters of neighborhoods:

1.   **Those with more than one pharmacy**
2.   **Those with one pharmacy**
3.   **Those with no pharmacies**


Representing the locations in Toronto map throws this results:

![Toronto Map](https://docs.google.com/drawings/d/e/2PACX-1vSZjDxwQBmWpLded2Y4hZ1If9AW_LlyRdjoBJnmGeFqa_vrwUsJat8dE_axkqbX8gTkbqR5xQJjV5Hw/pub?w=800&h=532)

Having:


*   **in red, those with more than one pharmacy**,
*   **in yellow, those with one pharmacy**, and
*   **in green, those with no pharmacies**.



---


Taking into account possible competitors, these are the number of stores of each firm in Scarborough:

**Shoppers Drug Mart                    23**

Rexall                                **3**

Cliffwood I.D.A. Pharmacy,
Thorncrest Drug Store,
Brimley Road Medical Center I.D.A.,
IDA Pharmacy,
IDA High Park                        **1** 




---






# **Conclusion**


---


According to the information from Foursquare, Scarborough offers a great opportunity to open pharmacies as there are very few of this stores.

Shoppers Drug Mart has over 23 stores, so should be taken into account when deciding the location to open.



