<h1 align=center><font size = 7>New Barbershop in Toronto</font></h1>

## About

A new comer to Toronto wants to open a new salon, preferably in a populated postal code, with no or little competition. For this reason, the new investor is only interested in checking a score, that reflects the ratio of population to competitor per postal code, to identify the top 5 postal codes to consider.

This exercise is only for demonstration purpose, and the data might be outdated, and reflects the limitations set by the providers. It is by no means meant to be used for commercial purposes.

## 1. Initialization

In [1]:
import numpy as np # library to handle data in a vectorized manner
import math

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn import preprocessing

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

!pip install beautifulsoup4
from bs4 import BeautifulSoup
import requests #HTTP

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Libraries imported.


<a id='item1'></a>

## 2. Population Data

I use beautiful soup to download data from statcanada

In [2]:
url = 'https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Table.cfm?Lang=Eng&T=1201&SR=1&S=22&O=A&RPP=9999&PR=0'

text_result = requests.get(url).text #get the entire html of the article as a str
html_parsed_result = BeautifulSoup(text_result, 'html.parser') #transform the text to html

population_split_table = html_parsed_result.find('table', class_ = 'table-condensed')
population_split_rows = population_split_table.find_all('tr')

# extract the info ('Postcode', 'Borough', 'Neighbourhood') from the table
population_split_info = []
for row in population_split_rows:
    info = row.text.split('\n')[1:-1] # remove empty str (first and last items)
    population_split_info.append(info)
    
population_split_info[0:10]

[['Geographic name',
  'Population, 2016',
  'Total private dwellings, 2016',
  'Private dwellings occupied by usual residents, 2016'],
 ['', '', '', ''],
 ['CanadaFootnote 1', '35,151,728', '15,412,443', '14,072,079'],
 ['A0A', '46,587', '26,155', '19,426'],
 ['A0B', '19,792', '13,658', '8,792'],
 ['A0C', '12,587', '8,010', '5,606'],
 ['A0E', '22,294', '12,293', '9,603'],
 ['A0G', '35,266', '21,750', '15,200'],
 ['A0H', '17,804', '9,928', '7,651'],
 ['A0J', '7,880', '4,813', '3,426']]

### Casting into a frame

In [3]:
#create a Neighbourhoods dataframe
newRow = ['PostCode','Population','TotalDwellings','OccupiedDwellings']
population_split_info[0] = newRow
population_df = pd.DataFrame(population_split_info[1:], columns=population_split_info[0])

population_df.drop(population_df[population_df.PostCode.map(len) != 3].index, inplace=True)

population_df['PopulationF'] = 0.0

population_df = population_df.reset_index(drop=True)

for i in range(population_df.shape[0]):
    x = float(str(population_df['Population'][i]).replace(',',''))
    if math.isnan(x):
        x = 0
    population_df['PopulationF'][i] = x
    
population_df.drop(['Population','TotalDwellings', 'OccupiedDwellings'], inplace = True, axis = 1)
        
df1 = population_df[~population_df['PostCode'].str.startswith('M')].index
population_df.drop(population_df.index[df1], inplace=True)
#population_df['Population'].astype(float)
population_df = population_df.reset_index(drop=True)
population_df.columns = ['Postal Code', 'Population']
print(population_df.dtypes)
population_df.head(5)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  app.launch_new_instance()


Postal Code     object
Population     float64
dtype: object


Unnamed: 0,Postal Code,Population
0,M1B,66108.0
1,M1C,35626.0
2,M1E,46943.0
3,M1G,29690.0
4,M1H,24383.0


In [4]:
population_df.shape

(102, 2)

## 3. Coordinates of Toronto Postal Codes

In [5]:
!wget -q -O 'Geospatial_Coordinates.csv' https://cocl.us/Geospatial_data
print('Data downloaded!')

coordinates_df = pd.read_csv('Geospatial_Coordinates.csv') # transform the csv file into a dataframe

print('The coordinates dataframe shape is', coordinates_df.shape)
coordinates_df.head()

Data downloaded!
The coordinates dataframe shape is (103, 3)


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge the coordinates with population dataframes.

In [6]:
group_df = coordinates_df.join(population_df.set_index('Postal Code'), on='Postal Code')
group_df.head(5)

Unnamed: 0,Postal Code,Latitude,Longitude,Population
0,M1B,43.806686,-79.194353,66108.0
1,M1C,43.784535,-79.160497,35626.0
2,M1E,43.763573,-79.188711,46943.0
3,M1G,43.770992,-79.216917,29690.0
4,M1H,43.773136,-79.239476,24383.0


## 4. Get the list of businesses of interest (barbersshops or salons) from Foursquare

In [7]:
CLIENT_ID = 'xxxxxxxxxx' # your Foursquare ID
CLIENT_SECRET = 'xxxxxxxxxxx' # your Foursquare Secret
VERSION = 'xxxxxxxxxx'

#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

search_category = '4bf58dd8d48988d110951735' #Salon / Barbershop, categories at https://developer.foursquare.com/docs/resources/categories
LIMIT = 50

nNeighborhoods = group_df.shape[0]
#nNeighborhoods = 2
print('There are ',nNeighborhoods,'postal areas')

for i in range(nNeighborhoods):
    #initialize data for the foursquare query
    xPostalCode = group_df['Postal Code'][i]
    xLongitude = group_df['Longitude'][i]
    xLatitude = group_df['Latitude'][i]
    #print('Currently retrieving businesses for postcode: ', xPostalCode, ' Longitude ', xLongitude, ' Latitude ',xLatitude) 
    
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, xLatitude, xLongitude, VERSION, search_category,LIMIT)
    #print('URL: ',url)    
    results = requests.get(url).json()
    # assign relevant part of JSON to venues
    venues = results['response']['venues']
    
    # tranform venues into a dataframe
    dataframe = json_normalize(venues)
    dataframe['xPostalCode'] = xPostalCode
    
    #print('Loop ',i,'Postcode ',xPostalCode,'Retrieved businesses ', dataframe.shape[0])
    
    if i == 0:        
        concatenated_4sq_pd = dataframe
    else:
        #temp_pd = pd.read_csv(fname)
        concatenated_4sq_pd = concatenated_4sq_pd.append(dataframe, sort = False)

concatenated_4sq_pd.to_csv('downloaded 4s results.csv')

print('results size: ',concatenated_4sq_pd.shape)

There are  103 postal areas
results size:  (5150, 20)


In [8]:
cleaned_df = concatenated_4sq_pd.drop_duplicates(subset = 'id', keep = 'first')

In [9]:
print('before cleanup, size was ',concatenated_4sq_pd.shape)
print('after cleanup, size became ',cleaned_df.shape)
cleaned_df.to_csv('Unique_Business_list_raw.csv')

before cleanup, size was  (5150, 20)
after cleanup, size became  (64, 20)


### Load the saved CSV (downloaded businesses, instead of repeating the previous steps while testing)

In [10]:
businesses_df = pd.read_csv('Unique_Business_list_raw.csv', index_col='Unnamed: 0')
print('Loaded: ', businesses_df.shape)
businesses_df['AdjustedPostalCode'] = None
businesses_df.reset_index(inplace = True)
businesses_df.drop(columns=['index'], inplace = True)
businesses_df.head(5)

Loaded:  (64, 20)


Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.country,location.formattedAddress,location.address,location.crossStreet,location.postalCode,location.city,location.state,venuePage.id,location.neighborhood,xPostalCode,AdjustedPostalCode
0,5734ba34498e56a45e264614,Boyd's Barber Shop,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1569946412,False,43.661189,-79.382419,"[{'label': 'display', 'lat': 43.66118850597661...",22162,CA,Canada,['Canada'],,,,,,,,M1B,
1,4c1d3b10eac020a141e347c2,SeeFu Hair,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1569946412,False,43.650925,-79.397314,"[{'label': 'display', 'lat': 43.65092528400248...",23815,CA,Canada,"['222 Spadina Ave (at Sullivan St)', 'Toronto ...",222 Spadina Ave,at Sullivan St,Ontario,Toronto Division,ON,,,M1B,
2,555f750f498ea0e938e6c9f4,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.648182,-79.37387,"[{'label': 'display', 'lat': 43.648182, 'lng':...",22800,CA,Canada,"['63 Front St E', 'Toronto ON M5E 1B3', 'Canada']",63 Front St E,,M5E 1B3,Toronto,ON,,,M1B,
3,4b243812f964a520ff6324e3,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.665372,-79.38118,"[{'label': 'display', 'lat': 43.665372, 'lng':...",21754,CA,Canada,"['63 Wellesley St E (at Church St.)', 'Toronto...",63 Wellesley St E,at Church St.,M4Y 1G7,Toronto,ON,,,M1B,
4,4b271340f964a520998424e3,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.660485,-79.384455,"[{'label': 'display', 'lat': 43.660485, 'lng':...",22331,CA,Canada,"['777 Bay St,Unit C 216', 'Toronto ON M5G 2C8'...","777 Bay St,Unit C 216",,M5G 2C8,Toronto,ON,,,M1B,


Verify if postal code is included and valid

In [11]:
for i in range(businesses_df.shape[0]):
    try:        
        temp = businesses_df['location.postalCode'][i]
        mystring = str(temp)
        midchar = mystring[1:2]
        firstchar = mystring[0:1]
        
        #if(len(str(mystring)) == 7 and midchar.isdigit() and firstchar == 'M'):
        if(len(str(mystring)) == 7 and midchar.isdigit()):
            print(i, "OK, "" ", midchar)
            businesses_df['AdjustedPostalCode'][i] = mystring[:-3].strip()
        else:
            print(i, " error")
            businesses_df['AdjustedPostalCode'][i] = 'error'
    except:
        businesses_df['AdjustedPostalCode'][i] = 'error'
        print(i, " error ")
    
businesses_df.head(5)

0  error
1  error
2 OK,   5
3 OK,   4
4 OK,   5
5 OK,   4
6 OK,   4
7 OK,   9
8 OK,   4
9  error
10 OK,   5
11 OK,   5
12 OK,   5
13 OK,   3
14 OK,   5
15 OK,   5
16 OK,   2
17 OK,   4
18 OK,   6
19  error
20 OK,   6
21 OK,   5
22 OK,   5
23  error
24 OK,   5
25  error
26 OK,   4
27  error
28 OK,   2
29 OK,   4
30  error
31  error
32 OK,   1
33 OK,   6
34 OK,   4
35  error
36 OK,   1
37 OK,   2
38  error
39  error
40  error
41 OK,   4
42 OK,   5
43 OK,   6
44 OK,   4
45 OK,   3
46 OK,   6
47 OK,   6
48 OK,   9
49 OK,   3
50 OK,   4
51 OK,   4
52  error
53 OK,   8
54 OK,   5
55 OK,   6
56 OK,   5
57  error
58 OK,   5
59 OK,   5
60 OK,   6
61  error
62 OK,   6
63 OK,   7


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()


Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.country,location.formattedAddress,location.address,location.crossStreet,location.postalCode,location.city,location.state,venuePage.id,location.neighborhood,xPostalCode,AdjustedPostalCode
0,5734ba34498e56a45e264614,Boyd's Barber Shop,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1569946412,False,43.661189,-79.382419,"[{'label': 'display', 'lat': 43.66118850597661...",22162,CA,Canada,['Canada'],,,,,,,,M1B,error
1,4c1d3b10eac020a141e347c2,SeeFu Hair,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1569946412,False,43.650925,-79.397314,"[{'label': 'display', 'lat': 43.65092528400248...",23815,CA,Canada,"['222 Spadina Ave (at Sullivan St)', 'Toronto ...",222 Spadina Ave,at Sullivan St,Ontario,Toronto Division,ON,,,M1B,error
2,555f750f498ea0e938e6c9f4,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.648182,-79.37387,"[{'label': 'display', 'lat': 43.648182, 'lng':...",22800,CA,Canada,"['63 Front St E', 'Toronto ON M5E 1B3', 'Canada']",63 Front St E,,M5E 1B3,Toronto,ON,,,M1B,M5E
3,4b243812f964a520ff6324e3,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.665372,-79.38118,"[{'label': 'display', 'lat': 43.665372, 'lng':...",21754,CA,Canada,"['63 Wellesley St E (at Church St.)', 'Toronto...",63 Wellesley St E,at Church St.,M4Y 1G7,Toronto,ON,,,M1B,M4Y
4,4b271340f964a520998424e3,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.660485,-79.384455,"[{'label': 'display', 'lat': 43.660485, 'lng':...",22331,CA,Canada,"['777 Bay St,Unit C 216', 'Toronto ON M5G 2C8'...","777 Bay St,Unit C 216",,M5G 2C8,Toronto,ON,,,M1B,M5G


### Attempt to reverse lookup the address/postal code when it's missing

In [12]:
mycounter = 0
geolocator = Nominatim(user_agent="new business in toronto")

for i in range(businesses_df.shape[0]):
    #if str(businesses_df['location.postalCode'][i]).lower() == 'nan':
    if businesses_df['AdjustedPostalCode'][i] == 'error':
        xLatitude = businesses_df['location.lat'][i]
        xLongitude = businesses_df['location.lng'][i]
        geoAddress = geolocator.reverse([xLatitude, xLongitude])
        newAddress = str(geoAddress.address).replace(', Canada','')
        #newAddress.replace(', Canada','')
        print(i," | ", newAddress[-7:-4] )
        businesses_df['AdjustedPostalCode'][i] = newAddress[-7:-4].strip()
        mycounter = mycounter + 1

print("Missing Postal Codes ", mycounter)

businesses_df.head(10)

0  |  M4Y


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  del sys.path[0]


1  |  M5T
9  |  M5V
19  |  L1Z
23  |  M5A
25  |  M5V
27  |  M5R
30  |  M8Z
31  |  M4W
35  |  M8X
38  |  M6H
39  |  M1P
40  |  M5S
52  |  L5B
57  |  L5M
61  |  L7A
Missing Postal Codes  16


Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.country,location.formattedAddress,location.address,location.crossStreet,location.postalCode,location.city,location.state,venuePage.id,location.neighborhood,xPostalCode,AdjustedPostalCode
0,5734ba34498e56a45e264614,Boyd's Barber Shop,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1569946412,False,43.661189,-79.382419,"[{'label': 'display', 'lat': 43.66118850597661...",22162,CA,Canada,['Canada'],,,,,,,,M1B,M4Y
1,4c1d3b10eac020a141e347c2,SeeFu Hair,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1569946412,False,43.650925,-79.397314,"[{'label': 'display', 'lat': 43.65092528400248...",23815,CA,Canada,"['222 Spadina Ave (at Sullivan St)', 'Toronto ...",222 Spadina Ave,at Sullivan St,Ontario,Toronto Division,ON,,,M1B,M5T
2,555f750f498ea0e938e6c9f4,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.648182,-79.37387,"[{'label': 'display', 'lat': 43.648182, 'lng':...",22800,CA,Canada,"['63 Front St E', 'Toronto ON M5E 1B3', 'Canada']",63 Front St E,,M5E 1B3,Toronto,ON,,,M1B,M5E
3,4b243812f964a520ff6324e3,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.665372,-79.38118,"[{'label': 'display', 'lat': 43.665372, 'lng':...",21754,CA,Canada,"['63 Wellesley St E (at Church St.)', 'Toronto...",63 Wellesley St E,at Church St.,M4Y 1G7,Toronto,ON,,,M1B,M4Y
4,4b271340f964a520998424e3,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.660485,-79.384455,"[{'label': 'display', 'lat': 43.660485, 'lng':...",22331,CA,Canada,"['777 Bay St,Unit C 216', 'Toronto ON M5G 2C8'...","777 Bay St,Unit C 216",,M5G 2C8,Toronto,ON,,,M1B,M5G
5,4c6da723c524370467c928eb,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.67165,-79.378245,"[{'label': 'display', 'lat': 43.67165, 'lng': ...",21088,CA,Canada,"['345 Bloor St,Unit 3', 'Toronto ON M4W 1H7', ...","345 Bloor St,Unit 3",,M4W 1H7,Toronto,ON,,,M1B,M4W
6,4ad9dac1f964a5204d1b21e3,Corallo Men's Salon,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1569946412,False,43.727663,-79.402857,"[{'label': 'display', 'lat': 43.72766318188245...",18929,CA,Canada,"['3195 Yonge St (at Ranleigh Ave)', 'Toronto O...",3195 Yonge St,at Ranleigh Ave,M4n 2k9,Toronto,ON,,,M1B,M4n
7,4e3b23e0d22d102e85232050,Mauro's Beauty Salon,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1569946412,False,43.749506,-79.553228,"[{'label': 'display', 'lat': 43.74950597, 'lng...",29538,CA,Canada,"['2523 Finch Ave W', 'Toronto ON M9M 2G1', 'Ca...",2523 Finch Ave W,,M9M 2G1,Toronto,ON,,,M1B,M9M
8,4b21a191f964a5203b3f24e3,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.696662,-79.369394,"[{'label': 'display', 'lat': 43.696662, 'lng':...",18657,CA,Canada,"['325 Moore Ave', 'Toronto ON M4G 3T6', 'Canada']",325 Moore Ave,,M4G 3T6,Toronto,ON,,,M1B,M4G
9,4b54ea81f964a520fad227e3,Mankind Grooming Studio for Men,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1569946412,False,43.647545,-79.398179,"[{'label': 'display', 'lat': 43.6475447145094,...",24138,CA,Canada,"['477 Richmond Street West (at Brant Street)',...",477 Richmond Street West,at Brant Street,,Toronto,ON,,,M1B,M5V


### Mark Postal codes that do not start with M a an error

In [13]:
for i in range(businesses_df.shape[0]):    
    temp = businesses_df['AdjustedPostalCode'][i]
    mystring = str(temp)
    firstchar = mystring[0:1]        
    if(firstchar != 'M'):
        print(i," ",mystring,"  ", 'error')
        businesses_df['AdjustedPostalCode'][i] = 'error'
    
businesses_df.head(5)

19   L1Z    error
34   L4J    error
36   L1V    error
44   L4C    error
49   L3R    error
50   L4X    error
51   L4X    error
52   L5B    error
54   L5B    error
55   L6Y    error
56   L5V    error
57   L5M    error
58   L5C    error
59   L5N    error
60   L6Z    error
61   L7A    error
62   L6M    error
63   L7P    error


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.country,location.formattedAddress,location.address,location.crossStreet,location.postalCode,location.city,location.state,venuePage.id,location.neighborhood,xPostalCode,AdjustedPostalCode
0,5734ba34498e56a45e264614,Boyd's Barber Shop,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1569946412,False,43.661189,-79.382419,"[{'label': 'display', 'lat': 43.66118850597661...",22162,CA,Canada,['Canada'],,,,,,,,M1B,M4Y
1,4c1d3b10eac020a141e347c2,SeeFu Hair,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1569946412,False,43.650925,-79.397314,"[{'label': 'display', 'lat': 43.65092528400248...",23815,CA,Canada,"['222 Spadina Ave (at Sullivan St)', 'Toronto ...",222 Spadina Ave,at Sullivan St,Ontario,Toronto Division,ON,,,M1B,M5T
2,555f750f498ea0e938e6c9f4,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.648182,-79.37387,"[{'label': 'display', 'lat': 43.648182, 'lng':...",22800,CA,Canada,"['63 Front St E', 'Toronto ON M5E 1B3', 'Canada']",63 Front St E,,M5E 1B3,Toronto,ON,,,M1B,M5E
3,4b243812f964a520ff6324e3,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.665372,-79.38118,"[{'label': 'display', 'lat': 43.665372, 'lng':...",21754,CA,Canada,"['63 Wellesley St E (at Church St.)', 'Toronto...",63 Wellesley St E,at Church St.,M4Y 1G7,Toronto,ON,,,M1B,M4Y
4,4b271340f964a520998424e3,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.660485,-79.384455,"[{'label': 'display', 'lat': 43.660485, 'lng':...",22331,CA,Canada,"['777 Bay St,Unit C 216', 'Toronto ON M5G 2C8'...","777 Bay St,Unit C 216",,M5G 2C8,Toronto,ON,,,M1B,M5G


### Drop businesses without a postal code

In [14]:
businesses_to_drop_df = businesses_df.index[businesses_df['AdjustedPostalCode'] == 'error']

print('there are',len(businesses_to_drop_df), 'businesses dropped because they contain no postal codes, or their postal code does not start with M')
businesses_df.drop(businesses_to_drop_df, inplace=True)
businesses_df.reset_index(inplace=True)
businesses_df.head()

there are 18 businesses dropped because they contain no postal codes, or their postal code does not start with M


Unnamed: 0,index,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.country,location.formattedAddress,location.address,location.crossStreet,location.postalCode,location.city,location.state,venuePage.id,location.neighborhood,xPostalCode,AdjustedPostalCode
0,0,5734ba34498e56a45e264614,Boyd's Barber Shop,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1569946412,False,43.661189,-79.382419,"[{'label': 'display', 'lat': 43.66118850597661...",22162,CA,Canada,['Canada'],,,,,,,,M1B,M4Y
1,1,4c1d3b10eac020a141e347c2,SeeFu Hair,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1569946412,False,43.650925,-79.397314,"[{'label': 'display', 'lat': 43.65092528400248...",23815,CA,Canada,"['222 Spadina Ave (at Sullivan St)', 'Toronto ...",222 Spadina Ave,at Sullivan St,Ontario,Toronto Division,ON,,,M1B,M5T
2,2,555f750f498ea0e938e6c9f4,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.648182,-79.37387,"[{'label': 'display', 'lat': 43.648182, 'lng':...",22800,CA,Canada,"['63 Front St E', 'Toronto ON M5E 1B3', 'Canada']",63 Front St E,,M5E 1B3,Toronto,ON,,,M1B,M5E
3,3,4b243812f964a520ff6324e3,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.665372,-79.38118,"[{'label': 'display', 'lat': 43.665372, 'lng':...",21754,CA,Canada,"['63 Wellesley St E (at Church St.)', 'Toronto...",63 Wellesley St E,at Church St.,M4Y 1G7,Toronto,ON,,,M1B,M4Y
4,4,4b271340f964a520998424e3,Rexall,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",v-1569946412,False,43.660485,-79.384455,"[{'label': 'display', 'lat': 43.660485, 'lng':...",22331,CA,Canada,"['777 Bay St,Unit C 216', 'Toronto ON M5G 2C8'...","777 Bay St,Unit C 216",,M5G 2C8,Toronto,ON,,,M1B,M5G


## 5. Issue List of Results

In [15]:
grouped_df=businesses_df[['id','AdjustedPostalCode']]
grouped_df = grouped_df.groupby('AdjustedPostalCode').count()
grouped_df.reset_index(inplace = True)
grouped_df.columns = ['Postal Code', 'Count']
final_df = pd.merge(group_df, grouped_df, on = 'Postal Code', how = 'outer')

In [16]:
for i in range(final_df.shape[0]):
    if math.isnan(final_df['Count'][i]):
        final_df['Count'][i] = 0

#print(final_df.dtypes)
final_df.head(5)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,Postal Code,Latitude,Longitude,Population,Count
0,M1B,43.806686,-79.194353,66108.0,0.0
1,M1C,43.784535,-79.160497,35626.0,0.0
2,M1E,43.763573,-79.188711,46943.0,0.0
3,M1G,43.770992,-79.216917,29690.0,0.0
4,M1H,43.773136,-79.239476,24383.0,0.0


In [17]:
final_df['metric'] = 0.0
maxV = 0.0

for i in range(final_df.shape[0]):
    final_df['metric'][i] = round(final_df['Population'][i]/(final_df['Count'][i] + 1),0)
    #print('max ',maxV, 'metric ',final_df['metric'][i])
    if final_df['metric'][i] > maxV:
        maxV = final_df['metric'][i]

final_df['score'] = round(100*final_df['metric']/maxV,2)
    
print(final_df.shape)

final_df.sort_values('score', ascending=False, inplace = True)

#Drop nan
for i in range(final_df.shape[0]):
    if math.isnan(final_df['Latitude'][i]):
        final_df.drop(i, axis = 0, inplace = True)
final_df

(104, 7)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,Postal Code,Latitude,Longitude,Population,Count,metric,score
0,M1B,43.806686,-79.194353,66108.0,0.0,66108.0,100.0
18,M2J,43.778517,-79.346556,58293.0,0.0,58293.0,88.18
101,M9V,43.739416,-79.588437,55959.0,0.0,55959.0,84.65
14,M1V,43.815252,-79.284577,54680.0,0.0,54680.0,82.71
15,M1W,43.799525,-79.318389,48471.0,0.0,48471.0,73.32
6,M1K,43.727929,-79.262029,48434.0,0.0,48434.0,73.26
2,M1E,43.763573,-79.188711,46943.0,0.0,46943.0,71.01
36,M4C,43.695344,-79.318389,46866.0,0.0,46866.0,70.89
80,M6M,43.691116,-79.476013,42434.0,0.0,42434.0,64.19
33,M3N,43.761631,-79.520999,41958.0,0.0,41958.0,63.47


## 6. Display top 5 locations on a map

In [18]:
map = folium.Map(location=[43.6532,-79.3832], zoom_start=11)
mycounter = 0
for row in final_df.itertuples(): #iterate each row of the dataframe
    label = 'Postal Code: {};  Population: {};  Competitors: {}; Score: {}'.format(row[1], row[4], row[5], row[7])
    label = folium.Popup(label, parse_html=True)    
    folium.CircleMarker(
        [row[2], row[3]],
        radius=1,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map) 
    folium.Circle(
        radius=500,
        popup=label,
        location=[row[2], row[3]],
        color='#3186cc',
        fill=True,
        fill_color='#3186cc'
    ).add_to(map)
    mycounter = mycounter +1
    if mycounter == 5:
        break
    
map