<h1> The Battle of the Neighbourhoods: Retail Therapy </h1>

<h2> Introduction </h2>
<p>Toronto is the most populous city in Canada with a recorded population of nearly 3 million people. It is the capital city of the province of Ontario and is widely recognised as one of the most multicultural and cosmopolitan cities in the world. As an international centre of business, finance, arts and culture, Toronto is extremely popular with both tourists and residents alike. </p>
<h3> Business Problem </h3>
<p>My client is the chief executive officer (CEO) of a large retail clothing business. As a budding data scientist, my client has asked me to decide on the most suitable neighbourhood in Toronto to open a new store. My client has stressed that the key to retail success on the high street boils down to four factors; great products, attentive customer service, consistently high foot-fall, and convenient parking. </p>
<p>While the products themselves and customer service are not my responsibility, I can leverage the Foursquare API to ensure that the recommended neighbourhoods have busy streets and nearby parking locations. Additionally, I will be clustering Toronto’s neighbourhoods by popular venues to determine which of them can be considered hot-spots for retail outlets and eateries. </p>

<h2> The Data </h2>
<h3> Required data </h3>
<p>The following data will be required to provide an accurate recommendation to my client:</p>
<ul>
    <li>A list of Toronto’s neighbourhoods, with latitude and longitude coordinates, calculated by geopy’s Nominatim</li>
    <li>A list of the most popular venues for each postal code region retrieved via the Foursquare API</li>
    <li>A list of suitable car parks for each postal code region retrieved via the Foursquare API</li>
    <li>A list of total population for each postal code region retrieved via Statistics Canada [1]
</ul>
<h3> Assumptions </h3>
<p>There will be some assumptions made to keep this project relatively simple. Firstly, I am making the assumption that all postal-codes cover the same land area and hence population density will have a direct linear relationship with total population. Additionally, I am making the assumption that while total populations may have changed since this data was curated (2016), postal-code population sizes will be largely similar in relation to each other. Furthermore, it will be assumed all car parks retrieved from the Foursquare API are deemed suitable and that only one car park is required to meet the parking criteria outline in the first section.</p>

<h3> Import necessary Python libraries </h3>

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import requests
print('Libraries imported.')

Libraries imported.


<h3> Get the table of neighbourhoods and postal codes and read into a dataframe </h3>

In [2]:
link = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
tables = pd.read_html(link, header=0)
df=pd.DataFrame(tables[0])

<h3>Ignore cells with a borough that is Not assigned </h3>

In [3]:
df.drop(df[df['Borough']=="Not assigned"].index,axis=0, inplace=True)

<h3> Combine rows with the same Postal Code </h3>

In [4]:
df_pc=df.groupby("Postal Code", as_index=False).agg(lambda neighbourhood:','.join(set(neighbourhood)))

<h3>If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough</h3>

In [5]:
df_pc.loc[df_pc['Neighbourhood'] == 'Not assigned', 'Neighbourhood'] = ...
df_pc.loc[df_pc['Neighbourhood'] == 'Not assigned', 'Borough']

Series([], Name: Borough, dtype: object)

<h3> Let's take a look at what we have so far! </h3>

In [6]:
df_pc.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


<h3> Adding Latitude and Longitude </h3>

In [7]:
coordinates_df = pd.read_csv('http://cocl.us/Geospatial_data')
df_pc = pd.merge(df_pc, coordinates_df, on='Postal Code')
df_pc.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


<h3> Adding population statistics </h3>

In [8]:
population_df = pd.read_csv('Data\\toronto_population.csv')

In [9]:
df_pc = pd.merge(df_pc, population_df, on='Postal Code')
df_pc.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Population
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,66108
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,35626
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,46943
3,M1G,Scarborough,Woburn,43.770992,-79.216917,29690
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,24383


<h3> Ignore cells with zero population</h3>
<p>Assuming it's safe to drop data with a population of zero.</p>
<p>This will also cover postal codes with an unassigned population. M7R is the only postcode in this dataset without a population assigned. This is because it refers to the city of Mississauga, which can safely be considered as outside of Toronto and hence not relevant for this project. </p>

In [10]:
df_pc.drop(df_pc[df_pc['Population']==0].index,axis=0, inplace=True)

<h3> Using the Foursquare Api to obtain the top 100 venues for each neighbourhood in Toronto </h3>
<p> I am assuming that the top venues are within walking distance (1km) of the postal code coordinates </p> 

In [19]:
CLIENT_ID = 'GXVOFCDKFJLQYEHM11FA1MBHGT2MVV0OXNYUB5KMC0QS2XZV'
CLIENT_SECRET = 'private'
VERSION = '20201408' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GXVOFCDKFJLQYEHM11FA1MBHGT2MVV0OXNYUB5KMC0QS2XZV
CLIENT_SECRET:private


In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000, limit=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<h3> Use the above method to obtain the top 100 venues for each neighbourhood and store them in a dataframe </h3>

In [13]:
toronto_venues = getNearbyVenues(names=df_pc['Neighbourhood'],
                                   latitudes=df_pc['Latitude'],
                                   longitudes=df_pc['Longitude'])
toronto_venues.head()

Malvern, Rouge
Rouge Hill, Port Union, Highland Creek
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Cliffside, Cliffcrest, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Wexford Heights, Scarborough Town Centre
Wexford, Maryvale
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Steeles West, L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
York Mills, Silver Hills
Willowdale, Newtonbrook
Willowdale, Willowdale East
York Mills West
Willowdale, Willowdale West
Parkwoods
Don Mills
Don Mills
Bathurst Manor, Wilson Heights, Downsview North
Northwood Park, York University
Downsview
Downsview
Downsview
Downsview
Victoria Village
Parkview Hill, Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto, Broadview North (Old East York)
The Danforth West, 

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Images Salon & Spa,43.802283,-79.198565,Spa
1,"Malvern, Rouge",43.806686,-79.194353,Harvey's,43.80002,-79.198307,Restaurant
2,"Malvern, Rouge",43.806686,-79.194353,Staples Morningside,43.800285,-79.196607,Paper / Office Supplies Store
3,"Malvern, Rouge",43.806686,-79.194353,Wendy's,43.802008,-79.19808,Fast Food Restaurant
4,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant


In [14]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 323 uniques categories.


In [15]:
def getNearbyParking(names, latitudes, longitudes, radius=1000, limit=100):
    
    parking_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query=Parking'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        parking_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_parking = pd.DataFrame([item for parking_list in parking_list for item in parking_list])
    nearby_parking.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Car Park', 
                  'Car Park Latitude', 
                  'Car Park Longitude', 
                  'Car Park Category']
    
    return(nearby_parking)

<h3> Use the above method to obtain all parking lots in Toronto and identify the number of unique neighbourhoods that have access to a parking lot </h3>

In [16]:
toronto_parking = getNearbyParking(names=df_pc['Neighbourhood'],
                                   latitudes=df_pc['Latitude'],
                                   longitudes=df_pc['Longitude'])
toronto_parking.head()

Malvern, Rouge
Rouge Hill, Port Union, Highland Creek
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Cliffside, Cliffcrest, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Wexford Heights, Scarborough Town Centre
Wexford, Maryvale
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Steeles West, L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
York Mills, Silver Hills
Willowdale, Newtonbrook
Willowdale, Willowdale East
York Mills West
Willowdale, Willowdale West
Parkwoods
Don Mills
Don Mills
Bathurst Manor, Wilson Heights, Downsview North
Northwood Park, York University
Downsview
Downsview
Downsview
Downsview
Victoria Village
Parkview Hill, Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto, Broadview North (Old East York)
The Danforth West, 

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Car Park,Car Park Latitude,Car Park Longitude,Car Park Category
0,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,"Re/Max West Realty Inc., Brokerage",43.783623,-79.169489,Office
1,Cedarbrae,43.773136,-79.239476,Green P,43.776199,-79.250497,Parking
2,Scarborough Village,43.744734,-79.239476,Eglinton Go Station Parking Lot,43.739985,-79.231362,Parking
3,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,Kennedy Station - South Parking Lot,43.731911,-79.262755,Parking
4,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,Kennedy Station - North Lot,43.7332,-79.263298,Parking


<h3> Drop any results that aren't in the Parking venue category </h3>

In [17]:
toronto_parking.drop(toronto_parking[toronto_parking['Car Park Category'] != 'Parking'].index,axis=0, inplace=True)

In [18]:
print('There are {} unique parking lots in Toronto.'.format(len(toronto_parking['Car Park'].unique())))
print('There are {} neighbourhoods with access to at least one parking lot.'.format(len(toronto_parking['Neighbourhood'].unique())))

There are 86 unique parking lots in Toronto.
There are 54 neighbourhoods with access to at least one parking lot.


<h2> References </h2>
<ol>
    <li> Statistics Canada. 2017. Population and dwelling counts, for Canada and forward sortation areas© as reported by the respondents, 2016 Census (table). Population and Dwelling Count Highlight Tables. 2016 Census. </li>
</ol>