# Coursera Capstone: Sporting Goods Store

-------------

<h3>Introduction/Business Problem:</h3>

This notebook is for the open-ended capstone project related to neighborhoods and geospacial data.

My chosen project is to take the data of Toronto neighborhoods and determine the best neighborhood in which a new sporting goods store should open for business.

If after analyzing the data, there turns out to be neighborhoods with extensive sporting venues without existing sporting goods stores, business owners interested in starting a new sporting goods store will find significant value in the revealed neighborhood data.

--------------------------------------------------------------------------------------------------

<h3>Data:</h3>

The sporting goods store location will be based on the distance of nearby related venues (gathered from Foursquare) including:

- Baseball fields
- College gyms
- Fields
- Golf courses
- Gyms
- Gyms/Fitness centers
- Lakes
- Parks
- Playgrounds
- Pools
- Rivers
- Sporting Goods shops (to be avoided)
- Tennis Courts

To be realistic, there will most likely be no neighborhoods with all of the listed venues nearby.

Instead, the neighborhoods with the highest number of relevant nearby venues will be considered as potential locations.

Sporting Goods shops were added to the list because neighborhoods with multiple existing sports shops should be avoided.

The highest emphasis will be placed on: **Baseball fields, gyms, parks and tennis courts**. Although if others are present, they will not be disregarded.

The general idea is a sporting goods store located near as many relevant sports venues as possible will result in the highest amount of business.

-------------------

<h3>How data will be used to solve the problem:</h3>

The data on Toronto neighborhoods will be downloaded and paired with relevant Foursquare geospacial data for properly mapping the included neighborhoods.

The resulting dataframe will be cleaned by removing empty and NaN values.

The dataframe will be analyzed, having the neighborhoods be sorted by type and frequency of the different venues.

Neighborhoods will be considered based on both quantity of venues as well as variety.

Variety of venues takes priority, although if the best available neighborhoods share the same nearby venues, the one with a higher quantity is preferred.

Conclusion will include the most promising neighborhood for the sporting goods store to be located, followed by any runner ups if present.

----------

## Methodology

Gathering relevant data:

In [1]:
# Importing pandas, numpy
import pandas as pd
import numpy as np

Gathering Canadian postal codes:

In [2]:
!pip install lxml
#!pip3 install html5lib
import lxml
df = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")[0]

Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/79/37/d420b7fdc9a550bd29b8cfeacff3b38502d9600b09d7dfae9a69e623b891/lxml-4.5.2-cp36-cp36m-manylinux1_x86_64.whl (5.5MB)
[K     |████████████████████████████████| 5.5MB 5.5MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.5.2


Checking resulting dataframe:

In [3]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [4]:
df.rename(columns={"Neighbourhood":"Neighborhood"}, inplace=True)

Remove unassigned boroughs/neighborhoods:

In [5]:
df = df[df.Borough != 'Not assigned']
df = df[df.Neighborhood != 'Not assigned']

In [6]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [7]:
df.shape

(103, 3)

Gathering geospacial data:

In [8]:
df_geo = pd.read_csv("https://cocl.us/Geospatial_data")

Checking resulting dataframe:

In [9]:
df_geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Copying latitude/longitude data to main dataframe:

In [10]:
df['Latitude'] = df_geo['Latitude']
df['Longitude'] = df_geo['Longitude']

In [11]:
df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.763573,-79.188711
3,M4A,North York,Victoria Village,43.770992,-79.216917
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.773136,-79.239476
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.744734,-79.239476
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.727929,-79.262029
...,...,...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",,
165,M4Y,Downtown Toronto,Church and Wellesley,,
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",,
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",,


Removing cells with NaN latitude/longitude values:

In [12]:
df = df[df.replace([np.inf, -np.inf], np.nan).notnull().all(axis=1)]

In [13]:
df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.763573,-79.188711
3,M4A,North York,Victoria Village,43.770992,-79.216917
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.773136,-79.239476
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.744734,-79.239476
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.727929,-79.262029
...,...,...,...,...,...
95,M6N,York,"Runnymede, The Junction North",43.643515,-79.577201
98,M9N,York,Weston,43.706876,-79.518188
99,M1P,Scarborough,"Dorset Park, Wexford Heights, Scarborough Town...",43.696319,-79.532242
100,M2P,North York,York Mills West,43.688905,-79.554724


----

Gathering Toronto coordinates:

In [14]:
!pip install geopy
from geopy.geocoders import Nominatim
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/07/e1/9c72de674d5c2b8fcb0738a5ceeb5424941fefa080bfe4e240d0bacb5a38/geopy-2.0.0-py3-none-any.whl (111kB)
[K     |████████████████████████████████| 112kB 5.3MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-2.0.0
The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


Generating Toronto map with neighborhood locations:

In [15]:
import folium # map rendering library
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Foursquare credentials:

In [16]:
CLIENT_ID = 'Client ID'
CLIENT_SECRET = 'Client Secret'
VERSION = '20180605'

First neighborhood's name:

In [17]:
df.loc[2, 'Neighborhood']

'Parkwoods'

Getting neighborhood's latitude/longitude:

In [18]:
neighborhood_latitude = df.loc[2, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[2, 'Longitude'] # neighborhood longitude value

neighborhood_name = df.loc[2, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7635726, -79.1887115.


Getting top 100 venues in Parkwood within 500 meters:

In [19]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=VY2E42OOB33QS4QBYCNPPXSIOPGZEDVSGFOU5Y1XJSNOSRHI&client_secret=24FVYRZLCM35R2IKZBBH0HYTABXA455KVTK0A0SZZ5RRCRJD&ll=43.6534817,-79.3839347&v=20180605&radius=500&limit=100'

Sending get request:

In [20]:
import requests
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f19fa1ae10aee11f216002a'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 82,
  'suggestedBounds': {'ne': {'lat': 43.6579817045, 'lng': -79.37772678059432},
   'sw': {'lat': 43.6489816955, 'lng': -79.39014261940568}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5227bb01498e17bf485e6202',
       'name': 'Downtown Toronto',
       'location': {'lat': 43.65323167517444,
        'lng': -79.38529600606677,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.65323167517444,
          'lng'

Collecting categories of venues:

In [21]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Converting data to pandas dataframe:

In [22]:
from pandas.io.json import json_normalize
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  after removing the cwd from sys.path.


Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Nathan Phillips Square,Plaza,43.65227,-79.383516
2,Poke Guys,Poke Place,43.654895,-79.385052
3,Japango,Sushi Restaurant,43.655268,-79.385165
4,Indigo,Bookstore,43.653515,-79.380696


Total venues returned:

In [23]:
print('{} venues were returned.'.format(nearby_venues.shape[0]))

82 venues were returned.


Function to apply above process to all neighborhoods:

In [24]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Running function on all Toronto neighborhoods:

In [25]:
toronto_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

Checking resulting dataframe:

In [26]:
toronto_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.763573,-79.188711,RBC Royal Bank,43.766790,-79.191151,Bank
1,Parkwoods,43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
2,Parkwoods,43.763573,-79.188711,Sail Sushi,43.765951,-79.191275,Restaurant
3,Parkwoods,43.763573,-79.188711,Big Bite Burrito,43.766299,-79.190720,Mexican Restaurant
4,Parkwoods,43.763573,-79.188711,Enterprise Rent-A-Car,43.764076,-79.193406,Rental Car Location
...,...,...,...,...,...,...,...
1287,York Mills West,43.688905,-79.554724,TTC Bus 45 Kipling,43.690907,-79.557066,Bus Line
1288,York Mills West,43.688905,-79.554724,Ital Pizza,43.690136,-79.559979,Pizza Place
1289,York Mills West,43.688905,-79.554724,Rogers,43.692620,-79.557402,Mobile Phone Shop
1290,Davisville North,43.706748,-79.594054,Economy Rent A Car,43.708471,-79.589943,Rental Car Location


In [27]:
# Creating columns for formatted dataframe:
columnsList = ["Neighborhood", "Baseball Field", "College Gym", "Field", "Golf Course", "Gym", "Gym / Fitness Center", "Lake", "Park", "Playground", "Pool", "River", "Sporting Goods Shop", "Tennis Court"]
formatted_df = pd.DataFrame(columns=columnsList)
# Create function to automate above process for every neighborhood and insert resulting data in new dataframe:
def venue_count_by_neighborhood(toronto_venues, formatted_df):
    for hood in toronto_venues["Neighborhood"]:
        temp_name = toronto_venues[toronto_venues["Neighborhood"] == hood]
        #print("temp name:", temp_name)
        hood_name = hood
        temp = toronto_venues[toronto_venues["Neighborhood"] == hood]
        temp = temp.dropna(axis = 0, how ='any')
        
        temp_neighborhoodName = hood
        temp_baseballField = np.sum(temp["Venue Category"] == "Baseball Field")
        temp_collegeGym = np.sum(temp["Venue Category"] == "College Gym")
        temp_field = np.sum(temp["Venue Category"] == "Field")
        temp_golfCourse = np.sum(temp["Venue Category"] == "Golf Course")
        temp_gym = np.sum(temp["Venue Category"] == "Gym")
        temp_gymFitnessCenter = np.sum(temp["Venue Category"] == "Gym / Fitness Center")
        temp_lake = np.sum(temp["Venue Category"] == "Lake")
        temp_park = np.sum(temp["Venue Category"] == "Park")
        temp_playground = np.sum(temp["Venue Category"] == "Playground")
        temp_pool = np.sum(temp["Venue Category"] == "Pool")
        temp_river = np.sum(temp["Venue Category"] == "River")
        temp_sportingGoodsShop = np.sum(temp["Venue Category"] == "Sporting Goods Shop")
        temp_tennisCourt = np.sum(temp["Venue Category"] == "Tennis Court")
        add_row = {"Neighborhood":temp_neighborhoodName, "Baseball Field":temp_baseballField, "College Gym":temp_collegeGym, "Field":temp_field,
                  "Golf Course":temp_golfCourse, "Gym":temp_gym, "Gym / Fitness Center":temp_gymFitnessCenter, "Lake":temp_lake,
                  "Park":temp_park, "Playground":temp_playground, "Pool":temp_pool, "River":temp_river, "Sporting Goods Shop":temp_sportingGoodsShop,
                  "Tennis Court":temp_tennisCourt}
        formatted_df = formatted_df.append(add_row, ignore_index=True)
    return formatted_df
        #print('\n')
        #print(temp_neighborhoodName)
formatted_df = venue_count_by_neighborhood(toronto_venues, formatted_df)
formatted_df

Unnamed: 0,Neighborhood,Baseball Field,College Gym,Field,Golf Course,Gym,Gym / Fitness Center,Lake,Park,Playground,Pool,River,Sporting Goods Shop,Tennis Court
0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1287,York Mills West,0,0,0,0,0,0,0,0,0,0,0,0,0
1288,York Mills West,0,0,0,0,0,0,0,0,0,0,0,0,0
1289,York Mills West,0,0,0,0,0,0,0,0,0,0,0,0,0
1290,Davisville North,0,0,0,0,0,0,0,0,0,0,0,0,0


In [28]:
# Removing duplicate neighborhood rows:
formatted_df_cleaned = formatted_df.drop_duplicates()

In [29]:
# Don't think this cell is neccessary
formatted_df_grouped = formatted_df_cleaned.groupby("Neighborhood", sort=False)["Neighborhood", "Baseball Field", "College Gym", "Field", "Golf Course", "Gym", "Gym / Fitness Center", "Lake", "Park", "Playground", "Pool", "River", "Sporting Goods Shop", "Tennis Court"].agg('sum')

  


In [30]:
formatted_df_grouped.head(10)

Unnamed: 0_level_0,Neighborhood,Baseball Field,College Gym,Field,Golf Course,Gym,Gym / Fitness Center,Lake,Park,Playground,Pool,River,Sporting Goods Shop,Tennis Court
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Parkwoods,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0
Victoria Village,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0
"Regent Park, Harbourfront","Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0
"Lawrence Manor, Lawrence Heights","Lawrence Manor, Lawrence Heights",0,0,0,0,0,0,0,0,1,0,0,0,0
"Queen's Park, Ontario Provincial Government","Queen's Park, Ontario Provincial Government",0,0,0,0,0,0,0,0,0,0,0,0,0
"Islington Avenue, Humber Valley Village","Islington Avenue, Humber Valley Village",0,0,0,0,0,0,0,0,0,0,0,0,0
"Malvern, Rouge","Malvern, Rouge",0,0,0,0,0,0,0,0,0,0,0,0,0
Don Mills,Don Mills,0,0,0,0,0,0,0,0,0,0,0,0,0
"Parkview Hill, Woodbine Gardens","Parkview Hill, Woodbine Gardens",0,0,0,0,0,0,0,0,0,0,0,0,0
"Garden District, Ryerson","Garden District, Ryerson",0,0,0,0,0,0,0,0,0,0,0,0,0


In [31]:
formatted_df_grouped.sort_values(by=["Baseball Field"], inplace=True, ascending=False)

In [32]:
formatted_df_grouped.head()

Unnamed: 0_level_0,Neighborhood,Baseball Field,College Gym,Field,Golf Course,Gym,Gym / Fitness Center,Lake,Park,Playground,Pool,River,Sporting Goods Shop,Tennis Court
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Caledonia-Fairbanks,Caledonia-Fairbanks,1,0,0,0,0,0,0,0,0,0,0,0,0
"Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood","Eringate, Bloordale Gardens, Old Burnhamthorpe...",1,0,0,0,1,0,0,0,0,0,0,0,0
"Rouge Hill, Port Union, Highland Creek","Rouge Hill, Port Union, Highland Creek",1,0,0,0,0,0,0,0,0,0,0,1,0
"Willowdale, Willowdale East","Willowdale, Willowdale East",1,0,0,0,0,0,0,0,0,0,0,0,0
"Toronto Dominion Centre, Design Exchange","Toronto Dominion Centre, Design Exchange",0,0,0,0,0,0,0,2,0,0,0,0,0


In [33]:
# Sorting by single column testing (BASEBALL FIELDS)
formatted_df_grouped.sort_values(by=["Baseball Field"], inplace=True, ascending=False)
formatted_df_grouped.head()

Unnamed: 0_level_0,Neighborhood,Baseball Field,College Gym,Field,Golf Course,Gym,Gym / Fitness Center,Lake,Park,Playground,Pool,River,Sporting Goods Shop,Tennis Court
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Caledonia-Fairbanks,Caledonia-Fairbanks,1,0,0,0,0,0,0,0,0,0,0,0,0
"Rouge Hill, Port Union, Highland Creek","Rouge Hill, Port Union, Highland Creek",1,0,0,0,0,0,0,0,0,0,0,1,0
"Willowdale, Willowdale East","Willowdale, Willowdale East",1,0,0,0,0,0,0,0,0,0,0,0,0
"Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood","Eringate, Bloordale Gardens, Old Burnhamthorpe...",1,0,0,0,1,0,0,0,0,0,0,0,0
"Malvern, Rouge","Malvern, Rouge",0,0,0,0,0,0,0,0,0,0,0,0,0


In [34]:
# Sorting by single column (GYMS)
formatted_df_grouped.sort_values(by=["Tennis Court"], inplace=True, ascending=False)
formatted_df_grouped.head(30)

Unnamed: 0_level_0,Neighborhood,Baseball Field,College Gym,Field,Golf Course,Gym,Gym / Fitness Center,Lake,Park,Playground,Pool,River,Sporting Goods Shop,Tennis Court
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
"Bathurst Manor, Wilson Heights, Downsview North","Bathurst Manor, Wilson Heights, Downsview North",0,0,0,0,2,0,0,1,0,0,0,0,1
Caledonia-Fairbanks,Caledonia-Fairbanks,1,0,0,0,0,0,0,0,0,0,0,0,0
"Little Portugal, Trinity","Little Portugal, Trinity",0,0,0,0,1,0,1,2,0,0,0,3,0
"East Toronto, Broadview North (Old East York)","East Toronto, Broadview North (Old East York)",0,0,0,0,0,1,0,1,0,0,0,0,0
"Dorset Park, Wexford Heights, Scarborough Town Centre","Dorset Park, Wexford Heights, Scarborough Town...",0,0,0,0,0,0,0,0,0,0,0,0,0
"Toronto Dominion Centre, Design Exchange","Toronto Dominion Centre, Design Exchange",0,0,0,0,0,0,0,2,0,0,0,0,0
"India Bazaar, The Beaches West","India Bazaar, The Beaches West",0,0,0,0,0,0,0,2,0,0,0,0,0
"York Mills, Silver Hills","York Mills, Silver Hills",0,0,1,0,0,0,0,0,1,0,0,0,0
"Golden Mile, Clairlea, Oakridge","Golden Mile, Clairlea, Oakridge",0,0,0,0,0,0,0,0,0,0,0,0,0
"Brockton, Parkdale Village, Exhibition Place","Brockton, Parkdale Village, Exhibition Place",0,0,0,0,0,0,0,0,0,0,0,0,0


In [35]:
# Sorting by single column (PARKS)
formatted_df_grouped.sort_values(by=["Park"], inplace=True, ascending=False)
formatted_df_grouped.head()

Unnamed: 0_level_0,Neighborhood,Baseball Field,College Gym,Field,Golf Course,Gym,Gym / Fitness Center,Lake,Park,Playground,Pool,River,Sporting Goods Shop,Tennis Court
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Downsview,Downsview,0,0,0,0,1,0,0,2,0,1,0,0,0
"Little Portugal, Trinity","Little Portugal, Trinity",0,0,0,0,1,0,1,2,0,0,0,3,0
Humewood-Cedarvale,Humewood-Cedarvale,0,0,0,0,0,0,0,2,0,0,0,0,0
"Dufferin, Dovercourt Village","Dufferin, Dovercourt Village",0,0,0,0,0,0,0,2,1,0,0,0,0
"Toronto Dominion Centre, Design Exchange","Toronto Dominion Centre, Design Exchange",0,0,0,0,0,0,0,2,0,0,0,0,0


In [36]:
# Sorting by single column (TENNIS COURTS)
formatted_df_grouped.sort_values(by=["Tennis Court"], inplace=True, ascending=False)
formatted_df_grouped.head()

Unnamed: 0_level_0,Neighborhood,Baseball Field,College Gym,Field,Golf Course,Gym,Gym / Fitness Center,Lake,Park,Playground,Pool,River,Sporting Goods Shop,Tennis Court
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
"Bathurst Manor, Wilson Heights, Downsview North","Bathurst Manor, Wilson Heights, Downsview North",0,0,0,0,2,0,0,1,0,0,0,0,1
Downsview,Downsview,0,0,0,0,1,0,0,2,0,1,0,0,0
Parkwoods,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0
"Lawrence Manor, Lawrence Heights","Lawrence Manor, Lawrence Heights",0,0,0,0,0,0,0,0,1,0,0,0,0
"Islington Avenue, Humber Valley Village","Islington Avenue, Humber Valley Village",0,0,0,0,0,0,0,0,0,0,0,0,0


In [37]:
# Sorting by single column (SPORTING GOODS SHOP) Top neighborhoods to dismiss:
formatted_df_grouped.sort_values(by=["Sporting Goods Shop"], inplace=True, ascending=False)
formatted_df_grouped.head()

Unnamed: 0_level_0,Neighborhood,Baseball Field,College Gym,Field,Golf Course,Gym,Gym / Fitness Center,Lake,Park,Playground,Pool,River,Sporting Goods Shop,Tennis Court
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
"Little Portugal, Trinity","Little Portugal, Trinity",0,0,0,0,1,0,1,2,0,0,0,3,0
Scarborough Village,Scarborough Village,0,0,0,0,1,1,1,1,0,0,0,1,0
"Rouge Hill, Port Union, Highland Creek","Rouge Hill, Port Union, Highland Creek",1,0,0,0,0,0,0,0,0,0,0,1,0
Hillcrest Village,Hillcrest Village,0,0,0,0,0,1,0,1,0,0,0,1,0
"Guildwood, Morningside, West Hill","Guildwood, Morningside, West Hill",0,0,0,0,2,0,0,0,0,0,0,1,0


-----

In [38]:
# Display all rows (still sorted by sporting goods shops)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)
formatted_df_grouped

  """


Unnamed: 0_level_0,Neighborhood,Baseball Field,College Gym,Field,Golf Course,Gym,Gym / Fitness Center,Lake,Park,Playground,Pool,River,Sporting Goods Shop,Tennis Court
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
"Little Portugal, Trinity","Little Portugal, Trinity",0,0,0,0,1,0,1,2,0,0,0,3,0
Scarborough Village,Scarborough Village,0,0,0,0,1,1,1,1,0,0,0,1,0
"Rouge Hill, Port Union, Highland Creek","Rouge Hill, Port Union, Highland Creek",1,0,0,0,0,0,0,0,0,0,0,1,0
Hillcrest Village,Hillcrest Village,0,0,0,0,0,1,0,1,0,0,0,1,0
"Guildwood, Morningside, West Hill","Guildwood, Morningside, West Hill",0,0,0,0,2,0,0,0,0,0,0,1,0
Woburn,Woburn,0,0,0,0,0,0,0,1,0,0,0,0,0
Humewood-Cedarvale,Humewood-Cedarvale,0,0,0,0,0,0,0,2,0,0,0,0,0
"Dufferin, Dovercourt Village","Dufferin, Dovercourt Village",0,0,0,0,0,0,0,2,1,0,0,0,0
"Toronto Dominion Centre, Design Exchange","Toronto Dominion Centre, Design Exchange",0,0,0,0,0,0,0,2,0,0,0,0,0
"India Bazaar, The Beaches West","India Bazaar, The Beaches West",0,0,0,0,0,0,0,2,0,0,0,0,0


# Analysis breakdown:

Out of all included neighborhoods, **Scarborough Village** contains the largest **variety** of sporting related venues (five), although one of the venues is a sporting goods shop.

The existence of the rival sporting goods shop is not surprising, given the large variety of sporting venues within the neighborhood.

As a result, Scarborough Village will not be considered as a potential candidate for the new sporting goods shop.

**Birch Cliff, Cliffside West**: One Park, One Pool, One River. (3 sports related venues, although only 1 is from the list of highly desired/relevant venues)

**Dufferin, Dovercourt Village**: Two Parks, One Playground (3 sports related venues, 2 of which are from the list of highly desired/relevant venues)

**Humewood-Cedarvale**: Three Parks (Although only one category of venues, this neighborhood contains 3 parks, a highly desired/relevant venue)

**Downsview**: One Gym, Three Parks, One Pool (4 highly relevant venues and a pool in addition)

**Harbourfront East, Union Station, Toronto Islands**: Three Gyms, One Gym/Recreation Center (High volume of gyms, yet no parks so limited products to be sold)

**Bathurst Manor, Wilson Heights, Downsview North**: Two Gyms, One Park, One Tennis Court (**4 highly valued venues, with significant variety including the park and tennis court**).

Bathurst Manor, Wilson Heights, Downsview North is the recommended neighborhood location based on the highly valued venues, as well as proximity to nearby potential neighborhoods and large distance from threatening competing sporting goods stores.

In [39]:
df["Neighborhood"]

2      Parkwoods                                                             
3      Victoria Village                                                      
4      Regent Park, Harbourfront                                             
5      Lawrence Manor, Lawrence Heights                                      
6      Queen's Park, Ontario Provincial Government                           
8      Islington Avenue, Humber Valley Village                               
9      Malvern, Rouge                                                        
11     Don Mills                                                             
12     Parkview Hill, Woodbine Gardens                                       
13     Garden District, Ryerson                                              
14     Glencairn                                                             
17     West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
18     Rouge Hill, Port Union, Highland Creek                   

In [40]:
bathurst_lat = 43.704324
bathurst_long = -79.388790
bathurst_neighborhood = "Bathurst Manor, Wilson Heights, Downsview North"
bathurst_borough = "North York"

harbourfront_lat = 43.650571
harbourfront_long = -79.384568
harbourfront_neighborhood = "Harbourfront East, Union Station, Toronto Islands"
harbourfront_borough = "Downtown Toronto"

downsview_lat = 43.672710
downsview_long = -79.405678
downsview_neighborhood = "Downsview"
downsview_borough = "North York"

humewood_lat = 43.752758
humewood_long = -79.400049
humewood_neighborhood = "Humewood-Cedarvale"
humewood_borough = "York"

dufferin_lat = 43.679563
dufferin_long = -79.377529
dufferin_neighborhood = "Dufferin, Dovercourt Village"
dufferin_borough = "West Toronto"

birch_lat = 43.653654
birch_long = -79.506944
birch_neighborhood = "Birch Cliff, Cliffside West"
birch_borough = "Scarborough"

scar_lat = 43.657162
scar_long = -79.378937
scar_neighborhood = "Cliffside, Cliffcrest, Scarborough Village West"
scar_borough = "Scarborough"

littlepor_lat = 43.640816
littlepor_long = -79.381752
littlepor_neighborhood = "Little Portugal, Trinity"
littlepor_borough = "West Toronto"

hillcrest_lat = 43.715383
hillcrest_long = -79.405678
hillcrest_neighborhood = "Hillcrest Village"
hillcrest_borough = "North York"

guildwood_lat = 43.725900
guildwood_long = -79.340923
guildwood_neighborhood = "Guildwood, Morningside, West Hill"
guildwood_borough = "Scarborough"

rouge_lat = 43.778517
rouge_long = -79.346556
rouge_neighborhood = "Rouge Hill, Port Union, Highland Creek"
rouge_borough = "Scarborough"

In [41]:
import folium # map rendering library
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
# Bathurst
label = '{}, {}'.format(bathurst_neighborhood, bathurst_borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [bathurst_lat, bathurst_long],
    radius=7,
    popup=label,
    color='green',
    fill=True,
    fill_color='green',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)  

# Harbourfront
label = '{}, {}'.format(harbourfront_neighborhood, harbourfront_borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [harbourfront_lat, harbourfront_long],
    radius=7,
    popup=label,
    color='purple',
    fill=True,
    fill_color='purple',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

#Downsview
label = '{}, {}'.format(downsview_neighborhood, downsview_borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [downsview_lat, downsview_long],
    radius=7,
    popup=label,
    color='purple',
    fill=True,
    fill_color='purple',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

#Humewood
label = '{}, {}'.format(humewood_neighborhood, humewood_borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [humewood_lat, humewood_long],
    radius=7,
    popup=label,
    color='purple',
    fill=True,
    fill_color='purple',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

#Dufferin
label = '{}, {}'.format(dufferin_neighborhood, dufferin_borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [dufferin_lat, dufferin_long],
    radius=7,
    popup=label,
    color='purple',
    fill=True,
    fill_color='purple',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

#Birch Cliff
label = '{}, {}'.format(birch_neighborhood, birch_borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [birch_lat, birch_long],
    radius=7,
    popup=label,
    color='purple',
    fill=True,
    fill_color='purple',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

#Scarborough
label = '{}, {}'.format(scar_neighborhood, scar_borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [scar_lat, scar_long],
    radius=7,
    popup=label,
    color='orange',
    fill=True,
    fill_color='orange',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

#Little Portugal
label = '{}, {}'.format(littlepor_neighborhood, littlepor_borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [littlepor_lat, littlepor_long],
    radius=7,
    popup=label,
    color='red',
    fill=True,
    fill_color='red',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

#Hillcrest
label = '{}, {}'.format(hillcrest_neighborhood, hillcrest_borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [hillcrest_lat, hillcrest_long],
    radius=7,
    popup=label,
    color='orange',
    fill=True,
    fill_color='orange',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

#Guildwood, Morningside, West Hill
label = '{}, {}'.format(guildwood_neighborhood, guildwood_borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [guildwood_lat, guildwood_long],
    radius=7,
    popup=label,
    color='orange',
    fill=True,
    fill_color='orange',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

#Rouge Hill, Port Union, Highland Creek
label = '{}, {}'.format(rouge_neighborhood, rouge_borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [rouge_lat, rouge_long],
    radius=7,
    popup=label,
    color='orange',
    fill=True,
    fill_color='orange',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

    
map_toronto

# Conclusion:

The above map depicts the designated best neighborhood option in green (Bathurst Manor) and the other potential candidates in purple. As well as

Scarborough, Little Portugal, Hillcrest, (Guildwood, Morningside, West Hill), and (Rouge Hill, Port Union, Highland Creek) which are all of the neighborhoods containing at least one competing sporting goods store and were avoided in the process.