# <CENTER>CAPSTONE PROJECT | SALES TERRITORY SCORE FOR TORONTO NEIGHBORHOODS</CENTER>

# INTRODUCTION

<b>PROBLEM</b>

In order to properly allocate salespeople into sales territories, characteristics of the territory in question must be evaluated. My Capstone project will identify neighborhoods in Toronto as potential sales territories. The client is an advertising sales company ("Cafn8") that employs independent subcontractors as sales agents. The product is coffee shop display advertising, specifically:

•	Branded packaging (heat sleeves, cups, bags, single-portion sweetener, etc.)<br>
•	In-shop digital kiosks<br>
•	Coupons integrated into transaction receipts (print or digital)

Neighborhood rankings will be based on a proprietary scoring algorithm (“Prosval”) that assigns points to businesses in a neighborhood by venue type, then sums those points into a territory score for each neighborhood ("Terriscore"). Neighborhoods with a large number of businesses are favorable, and certain types of businesses are more favorable compared to others, so a summation index should be robust. Cafn8 has asked for neighborhoods to be ranked based on their Terriscore, and for certain tables of information to be produced. <b>The goal of this project is to produce the information Cafn8 has requested.</b>

<b>BACKGROUND</b>

By definition, anyone in a given coffee shop is in that location. By extension, they are also in proximity to all of the surrounding businesses. Business operators interested in brand marketing to people who are actually near them can derive great value from doing so inside a coffee shop in their neighborhood. Successful coffee shops enjoy relatively high traffic counts, and many of their customers are “regulars” from the neighborhood who visit daily, some of them staying for long periods of time. This kind of repeat exposure to display advertising is a phenomenal brand building opportunity.

Brand building is an essential but hard to attribute component of any marketing program. Resources committed to brand marketing do not directly convert into observable sales the same way coupon redemptions or online click-throughs do, for example. Brand marketing intends to create a familiarity and image advantage with potential customers when they actively seek out your product or service. This typically involves large scale, expensive, mass media type programs that are inappropriate for small, locally focused businesses.

Leveraging the notion of a coffee shop as an essential “third place” in a community (“The Third Place: Our Commitment, Renewed” - https://starbuckschannel.com/thethirdplace/) within which to advertise a local business is not entirely new. The same idea was executed in supermarkets with great success by TV Fanfare Publications starting in the 1950’s (“TV Fanfare Publications : Making Big Business From Small Ads” - http://articles.latimes.com/1985-02-21/business/fi-773_1_guides ). The essential driver of value in such a marketing program is to offer local businesses access to nearby locations that are frequently visited by large proportions of their customer base. This kind of targeted, appropriate-scale brand marketing has a track record of success that can be expected to repeat, because coffee shop owners work hard to establish and maintain a ubiquitous presence within a community.

Cafn8 plans to use the final product from this project to assign its subcontractor sales agents to neighborhoods in Toronto. Those sales agents are responsible for entering an agreement with a coffee shop to allow display advertising inside their establishment. The incentive for the coffee shop is free supplies (Heat sleeves, cups, napkins, single serving sweetener, etc. Wi-Fi if there is a digital kiosk placed. Access to customer data if ecoupons are sent with digital receipts.). Coffee shops have some discretion over what venue types they will allow to advertise, but as a rule they agree to accept branded advertising from anyone Cafn8 has approved. After entering an agreement with a coffee shop, the sales agent must find local businesses to advertise there. Once a particular coffee shop has signed up and the advertising capacity reaches a necessary minimum level, the sales agent is eligible to be paid and they can move onto the next neighborhood to repeat the process.

# DATA

We will be using the following data:

•	A listing of neighborhoods by Postal Code obtained from Wikipedia (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M)<br>
•	Latitude/Longitude data provided via Cognitive Class (https://cocl.us/Geospatial_data)<br>
•	Toronto city data accessible through the FourSquare API<br>
•	Terrivalue data for each type of business provided by Cafn8<br>

This data will produce the following deliverables.

•	A list of coffee shops in Toronto, by neighborhood (with Terriscore)<br>
•	A list of businesses, by neighborhood (with Prosval and Terriscore)<br>
•	A ranking of all neighborhoods based on their Terriscore<br>


Neighborhoods will become sales territories. Terriscore is calculated from a proprietary algorithm that assigns a Prosval to each type of business. All business types are ranked by Cafn8 as follows:

•	2 - Most desirable prospects offering higher success rates for display advertising sales<br>
•	1 - Neutral prospects that are hit or miss and not priority to pursue<br>
•	0 - Least desirable prospects to call or prohibited businesses<br>

Businesses within a neighborhood are scored via Prosval and summed into that neighborhood’s Terriscore. Once all neighborhoods have been scored, the list will be ranked from highest to lowest based on Terriscore. <b>The ranked list will be the focus final product.</b>


# METHODOLOGY

<b>We will begin by importing libraries and resources that we will need for the purposes of this project

In [1]:
import numpy as np

import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 1000)

import json 

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests 
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes

from bs4 import BeautifulSoup as BS
import urllib.request

import random

print('!!!!!!! Libraries imported !!!!!!!')

Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\HP\Anaconda3

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-4.6.0                |        py37_1000         895 KB  conda-forge

The following packages will be UPDATED:

    conda: 4.5.12-py37_1000 conda-forge --> 4.6.0-py37_1000 conda-forge


Downloading and Extracting Packages
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done



conda-4.6.0          | 895 KB    |            |   0% 
conda-4.6.0          | 895 KB    | ##1        |  21% 
conda-4.6.0          | 895 KB    | ########1  |  82% 
conda-4.6.0          | 895 KB    | #########  |  91% 
conda-4.6.0          | 895 KB    | #########8 |  98% 
conda-4.6.0          | 895 KB    | ########## | 100% 


Collecting package metadata: ...working... done
Solving environment: ...working... done

# All requested packages already installed.

!!!!!!! Libraries imported !!!!!!!


<b>We will import a table of business type Prosvals from a .csv file provided by Cafn8

The first thing we will present is a table that defines bunsiness type values ("Prosval") for the <b>355 business types</b> in our dataset. This table was completed by Cafn8 after we identified 355 FourSquare API business types in the Toronto neighborhood data. Cafn8 has explained that venue values have been assigned by senior sales staff based on their collective experience selling advertising to small local businesses.

NOTE: Some business types or specific businesses may be prohibited from buying advertising from Cafn8 (direct coffee shop competitors, certain "sin" businesses, prior bad credits, etc.). This version of the table includes no such business type prohibitions. On a coffee shop by coffee shop basis additional venue types are subject to prohibition as well. Coffee shops cannot accept advertising from any business Cafn8 prohibits even if they are willing to do so.

In [2]:
prosval = pd.read_csv('C:\\Users\HP\Desktop\COURSERA\IBM Data Science\CAPSTONE\PROSVAL.csv', encoding = "ISO-8859-1")

print(prosval.shape)
prosval

(355, 3)


Unnamed: 0,CATEGORY,PROHIBITED,PROSVAL
0,Accessories Store,No,2
1,Adult Boutique,No,0
2,Afghan Restaurant,No,1
3,African Restaurant,No,1
4,Airport,No,0
5,Airport Food Court,No,0
6,Airport Gate,No,0
7,Airport Lounge,No,0
8,Airport Service,No,0
9,Airport Terminal,No,0


The following steps convert the <b>Toronto Wikipedia postal codes table</b> into a dataframe.

In [3]:
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BS(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))[0]
postcode = df[0].tolist()
borough = df[1].tolist()
neigh = df[2].tolist()

print("Rows, columns in df", df.shape)
df.head()

Rows, columns in df (290, 3)


Unnamed: 0,0,1,2
0,Postcode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village


Clean up the Toronto postal codes dataframe

In [4]:
df = df.drop(0)

df = df.drop(df[df[1] == "Not assigned"].index)

df[2] = df[2].replace("Not assigned", df[1])

df = pd.DataFrame({0:df[0],1:df[1],2:df[2]})

df = df.groupby([0,1])[2].apply(lambda tags : ', '.join(tags)).reset_index()

df = df.rename(columns = {0:"PostalCode", 1:"Borough", 2:"Neighbourhood"})

print("Rows, columns in df", df.shape)
df.head()

Rows, columns in df (103, 3)


Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Add latitude and longitude values to the dataframe from data provided by <b>Cognitive Class.</b>

In [5]:
dflatlng = pd.read_csv('https://cocl.us/Geospatial_data')

dftor = df.join(dflatlng)

dftor = dftor.drop(['Postal Code'], axis=1).reset_index(drop=True)

dftor.to_csv('C:\\Users\HP\Desktop\COURSERA\IBM Data Science\CAPSTONE\dftor.csv', encoding='UTF-8')

print("Rows, columns in dftor", dftor.shape)
dftor

Rows, columns in dftor (103, 5)


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


At this point we connect with the <b>FourSquare API.</b>

FourSquare requires a client ID and client Secret to access its data via the API. They also require that you state what version of the API you are using. We will load these values into variables for ease of use throughout the project.

In [6]:
CLIENT_ID = 'ZGU4DXB3ROK4GNWTUAMJCARJMMC3J120HKWIYMXM52KJPZED' # your Foursquare ID
CLIENT_SECRET = 'RXGSRIHMW1EE1E54XL4NRHFIPJELLQ3SLFFZGW4UEDJ0VZTU' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your FourSquare API credentails\n')
print('    CLIENT_ID:  ' + CLIENT_ID)
print('CLIENT_SECRET:  ' + CLIENT_SECRET)
print('      VERSION:  ' + VERSION)

Your FourSquare API credentails

    CLIENT_ID:  ZGU4DXB3ROK4GNWTUAMJCARJMMC3J120HKWIYMXM52KJPZED
CLIENT_SECRET:  RXGSRIHMW1EE1E54XL4NRHFIPJELLQ3SLFFZGW4UEDJ0VZTU
      VERSION:  20180605


Now we compile a list of FourSquare URLs for the <b>103 neighborhoods in Toronto.</b> These URL addresses contain neighborhood data we will use to extract a list of all coffee shops in each neighborhood in Toronto, and eventually all other businesses in each neighborhood.

In [7]:
radius = 1000
LIMIT = 100

URLs = []

x = 0

while x < 103 :

    neigh_lat = dftor.loc[x, 'Latitude']
    neigh_long = dftor.loc[x, 'Longitude']

    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neigh_lat, 
    neigh_long, 
    radius,
    LIMIT)
    
    URLs.append(url)
    
    print(x, url)
    
    x = x + 1

0 https://api.foursquare.com/v2/venues/explore?&client_id=ZGU4DXB3ROK4GNWTUAMJCARJMMC3J120HKWIYMXM52KJPZED&client_secret=RXGSRIHMW1EE1E54XL4NRHFIPJELLQ3SLFFZGW4UEDJ0VZTU&v=20180605&ll=43.806686299999996,-79.19435340000001&radius=1000&limit=100
1 https://api.foursquare.com/v2/venues/explore?&client_id=ZGU4DXB3ROK4GNWTUAMJCARJMMC3J120HKWIYMXM52KJPZED&client_secret=RXGSRIHMW1EE1E54XL4NRHFIPJELLQ3SLFFZGW4UEDJ0VZTU&v=20180605&ll=43.7845351,-79.16049709999999&radius=1000&limit=100
2 https://api.foursquare.com/v2/venues/explore?&client_id=ZGU4DXB3ROK4GNWTUAMJCARJMMC3J120HKWIYMXM52KJPZED&client_secret=RXGSRIHMW1EE1E54XL4NRHFIPJELLQ3SLFFZGW4UEDJ0VZTU&v=20180605&ll=43.7635726,-79.1887115&radius=1000&limit=100
3 https://api.foursquare.com/v2/venues/explore?&client_id=ZGU4DXB3ROK4GNWTUAMJCARJMMC3J120HKWIYMXM52KJPZED&client_secret=RXGSRIHMW1EE1E54XL4NRHFIPJELLQ3SLFFZGW4UEDJ0VZTU&v=20180605&ll=43.7709921,-79.21691740000001&radius=1000&limit=100
4 https://api.foursquare.com/v2/venues/explore?&client_

In [8]:
print("Confirmed:", len(URLs), "URL addresses created.")

Confirmed: 103 URL addresses created.


The next step is to retrieve each URL and extract data for Toronto coffee shops.

In [9]:
shops = pd.DataFrame() # Data issues may require me to append to my dataframe in chunks, so I define it outside that code.

In [10]:
x = 0

while x < 103 :

    shopsresults = requests.get(URLs[x]).json()

    def get_category_type(row):
        try:
            categories_list = row['categories']
        except:
            categories_list = row['venue.categories']
        
        if len(categories_list) == 0:
            return None
        else:
            return categories_list[0]['name']

    shopsvenues = shopsresults['response']['groups'][0]['items']
    
    nearby_shopsvenues = json_normalize(shopsvenues) # flatten JSON

    # filter columns
    filtered_columns = ['venue.name', 'venue.location.address', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    nearby_shopsvenues = nearby_shopsvenues.loc[:, filtered_columns]

    # filter the category for each row
    nearby_shopsvenues['venue.categories'] = nearby_shopsvenues.apply(get_category_type, axis=1)

    # clean columns
    nearby_shopsvenues.columns = [col.split(".")[-1] for col in nearby_shopsvenues.columns]

    dfven = nearby_shopsvenues.loc[nearby_shopsvenues['categories'] == 'Coffee Shop']
    
    dfven = dfven.reset_index(drop=True)
    
    if not dfven.empty :
        
        shops = shops.append(dfven)
        print(x, '!!!', dftor.iloc[x]['PostalCode'], dftor.iloc[x]['Neighbourhood'], dfven.shape)
        
    x = x + 1
    
    if x == 16 : x = x + 1 # One URL causes FourSquare to reject our data request. We are skipping this URL.

0 !!! M1B Rouge, Malvern (2, 5)
2 !!! M1E Guildwood, Morningside, West Hill (2, 5)
3 !!! M1G Woburn (2, 5)
4 !!! M1H Cedarbrae (3, 5)
5 !!! M1J Scarborough Village (1, 5)
6 !!! M1K East Birchmount Park, Ionview, Kennedy Park (3, 5)
7 !!! M1L Clairlea, Golden Mile, Oakridge (3, 5)
10 !!! M1P Dorset Park, Scarborough Town Centre, Wexford Heights (3, 5)
11 !!! M1R Maryvale, Wexford (1, 5)
12 !!! M1S Agincourt (1, 5)
13 !!! M1T Clarks Corners, Sullivan, Tam O'Shanter (1, 5)
14 !!! M1V Agincourt North, L'Amoreaux East, Milliken, Steeles East (1, 5)
15 !!! M1W L'Amoreaux West, Steeles West (2, 5)
17 !!! M2H Hillcrest Village (2, 5)
18 !!! M2J Fairview, Henry Farm, Oriole (4, 5)
21 !!! M2M Newtonbrook, Willowdale (2, 5)
22 !!! M2N Willowdale South (6, 5)
23 !!! M2P York Mills West (1, 5)
24 !!! M2R Willowdale West (1, 5)
25 !!! M3A Parkwoods (1, 5)
26 !!! M3B Don Mills North (3, 5)
27 !!! M3C Flemingdon Park, Don Mills South (2, 5)
28 !!! M3H Bathurst Manor, Downsview North, Wilson Heights (2

We have produced a data frame with <b>348 coffee shops in Toronto.</b> One URL (index = 16) causes an issue with FourSquare, because it contains no data. We are simply passing over this URL and now work with <b>102 neighborhoods.</b>

In [11]:
shops.to_csv('C:\\Users\HP\Desktop\COURSERA\IBM Data Science\CAPSTONE\shops.csv', encoding='UTF-8')
print(shops.shape)
shops

(348, 5)


Unnamed: 0,name,address,categories,lat,lng
0,Tim Hortons,8129 Sheppard Ave,Coffee Shop,43.802,-79.198169
1,Tim Hortons / Esso,Morningside and sheppard,Coffee Shop,43.80166,-79.199133
0,Starbucks,255 Morningside Ave,Coffee Shop,43.770367,-79.186313
1,Tim Hortons,4479 Kingston Rd,Coffee Shop,43.769591,-79.187081
0,Starbucks,300 Borough Dr,Coffee Shop,43.770037,-79.221156
1,Country Style,136 Orton Park Rd,Coffee Shop,43.773716,-79.207027
0,Tim Hortons,1150 Markham Rd,Coffee Shop,43.775992,-79.232135
1,Tim Hortons,60 Tuxedo Court,Coffee Shop,43.779944,-79.232457
2,Coffee Culture,2060 Ellesmere Rd,Coffee Shop,43.776278,-79.234026
0,Tim Hortons,2874 Eglinton Ave East,Coffee Shop,43.738992,-79.238961


Now we retrieve each URL and extract data for all business by neighborhood.

In [12]:
cust = pd.DataFrame() # Data issues may require me to append to my dataframe in chunks. So I define it outside that code.

In [13]:
neighlist = []

x = 0

while x < 103 :

    custresults = requests.get(URLs[x]).json()

    def get_category_type(row):
        try:
            categories_list = row['categories']
        except:
            categories_list = row['venue.categories']
        
        if len(categories_list) == 0:
            return None
        else:
            return categories_list[0]['name']

    custvenues = custresults['response']['groups'][0]['items']
    
    nearby_custvenues = json_normalize(custvenues) # flatten JSON

    # filter columns
    filtered_columns = ['venue.name', 'venue.location.address', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    nearby_custvenues = nearby_custvenues.loc[:, filtered_columns]

    # filter the category for each row
    nearby_custvenues['venue.categories'] = nearby_custvenues.apply(get_category_type, axis=1)

    # clean columns
    nearby_custvenues.columns = [col.split(".")[-1] for col in nearby_custvenues.columns]

    dfven = nearby_custvenues
       
    if not dfven.empty :
        
        
        cust = cust.append(dfven)
        neighlist.append(dftor.iloc[x]['Neighbourhood'])
        print(x, '!!!', dftor.iloc[x]['PostalCode'], dftor.iloc[x]['Neighbourhood'], dfven.shape)
        
    x = x + 1
    
    if x == 16 : x = x + 1 # One URL causes FourSquare to reject our data request. We are skipping this URL.

0 !!! M1B Rouge, Malvern (16, 5)
1 !!! M1C Highland Creek, Rouge Hill, Port Union (4, 5)
2 !!! M1E Guildwood, Morningside, West Hill (24, 5)
3 !!! M1G Woburn (8, 5)
4 !!! M1H Cedarbrae (30, 5)
5 !!! M1J Scarborough Village (11, 5)
6 !!! M1K East Birchmount Park, Ionview, Kennedy Park (23, 5)
7 !!! M1L Clairlea, Golden Mile, Oakridge (29, 5)
8 !!! M1M Cliffcrest, Cliffside, Scarborough Village West (13, 5)
9 !!! M1N Birch Cliff, Cliffside West (14, 5)
10 !!! M1P Dorset Park, Scarborough Town Centre, Wexford Heights (46, 5)
11 !!! M1R Maryvale, Wexford (25, 5)
12 !!! M1S Agincourt (47, 5)
13 !!! M1T Clarks Corners, Sullivan, Tam O'Shanter (30, 5)
14 !!! M1V Agincourt North, L'Amoreaux East, Milliken, Steeles East (30, 5)
15 !!! M1W L'Amoreaux West, Steeles West (24, 5)
17 !!! M2H Hillcrest Village (21, 5)
18 !!! M2J Fairview, Henry Farm, Oriole (43, 5)
19 !!! M2K Bayview Village (14, 5)
20 !!! M2L Silver Hills, York Mills (4, 5)
21 !!! M2M Newtonbrook, Willowdale (30, 5)
22 !!! M2N Willo

We have produced a data frame with <b>4853 businesses in Toronto.</b> One URL (index = 16) causes an issue with FourSquare, because it contains no data. We are simply passing over this URL and now work with <b>102 neighborhoods.</b>

In [14]:
columns = ['lat', 'lng']
cust.drop(columns, inplace=True, axis=1)
cust.to_csv('C:\\Users\HP\Desktop\COURSERA\IBM Data Science\CAPSTONE\cust.csv', encoding='UTF-8')
print(cust.shape)
cust

(4853, 3)


Unnamed: 0,name,address,categories
0,Images Salon & Spa,8130 Sheppard Ave E,Spa
1,Caribbean Wave,875 Milner Ave,Caribbean Restaurant
2,Wendy's,8129 Sheppard Avenue,Fast Food Restaurant
3,Harvey's,853 Milner Ave,Fast Food Restaurant
4,Wendy's,,Fast Food Restaurant
5,Tim Hortons,8129 Sheppard Ave,Coffee Shop
6,Staples Morningside,850 Milner Avenue,Paper / Office Supplies Store
7,Lee Valley,1275 Morningside,Hobby Shop
8,Tim Hortons / Esso,Morningside and sheppard,Coffee Shop
9,MMA World Academy,"15 Grand Marshall Drive, Suite 2",Martial Arts Dojo


# RESULTS

The three deliverables requested by Cafn8 are presented below. To produce these tables for Cafn8, data was collected from four sources (Wikipedia, Cognitive Class, FourSquare, and Cafn8). We used python to collect and handle the data and used Microsoft Excel to process some of it.

Preparing a sales prospecting resource for indpendent contractor sales agents does not demand a very high degree of analytical work. It does offer a basis for the client to perform analysis as a practitioner in their field, however. This project is an example of Data Science as a collection process. Cafn8 knows what they want, and for their purposes this information will provide meaningful value.

Deliverables are large but presented in full for your inspection. In addition to this report, Cafn8 is being provided with Microsoft Excel .xls/.csv files as per their request.

<b>A list of coffee shops in Toronto, by neighborhood (with Terriscore)</b>

In [15]:
final = pd.read_csv('C:\\Users\HP\Desktop\COURSERA\IBM Data Science\CAPSTONE\COFFEESHOPS.csv', encoding="ISO-8859-1")

final = final.fillna("")

print(final.shape)
final

(348, 4)


Unnamed: 0,COFFEE SHOP,ADDRESS,NEIGHBORHOOD,TERRISCORE
0,Tim Hortons,8129 Sheppard Ave,"M1B Rouge, Malvern",18.0
1,Tim Hortons / Esso,Morningside and sheppard,,
2,Starbucks,255 Morningside Ave,"M1E Guildwood, Morningside, West Hill",14.0
3,Tim Hortons,4479 Kingston Rd,,
4,Starbucks,300 Borough Dr,M1G Woburn,6.0
5,Country Style,136 Orton Park Rd,,
6,Tim Hortons,1150 Markham Rd,M1H Cedarbrae,20.0
7,Tim Hortons,60 Tuxedo Court,,
8,Coffee Culture,2060 Ellesmere Rd,,
9,Tim Hortons,2874 Eglinton Ave East,M1J Scarborough Village,7.0


<b>A list of businesses, by neighborhood (with Prosval and Terriscore)</b>

In [16]:
final2 = pd.read_csv('C:\\Users\HP\Desktop\COURSERA\IBM Data Science\CAPSTONE\BUSINESSES.csv', encoding="ISO-8859-1")

final2 = final2.fillna("")

print(final2.shape)
final2

(4853, 6)


Unnamed: 0,TERRISCORE,NEIGHBORHOOD,BUSINESS,ADDRESS,CATEGORY,PROSVAL
0,18.0,"M1B Rouge, Malvern",Images Salon & Spa,8130 Sheppard Ave E,Spa,2
1,,,Caribbean Wave,875 Milner Ave,Caribbean Restaurant,1
2,,,Wendy's,8129 Sheppard Avenue,Fast Food Restaurant,1
3,,,Harvey's,853 Milner Ave,Fast Food Restaurant,1
4,,,Wendy's,,Fast Food Restaurant,1
5,,,Tim Hortons,8129 Sheppard Ave,Coffee Shop,0
6,,,Staples Morningside,850 Milner Avenue,Paper / Office Supplies Store,1
7,,,Lee Valley,1275 Morningside,Hobby Shop,2
8,,,Tim Hortons / Esso,Morningside and sheppard,Coffee Shop,0
9,,,MMA World Academy,"15 Grand Marshall Drive, Suite 2",Martial Arts Dojo,2


<b>A ranking of all neighborhoods based on their Terriscore</b>

In [17]:
final3 = pd.read_csv('C:\\Users\HP\Desktop\COURSERA\IBM Data Science\CAPSTONE\RANKINGS.csv', encoding="ISO-8859-1")

final3 = final3.fillna("")

print(final3.shape)
final3

(102, 3)


Unnamed: 0,NEIGHBORHOOD,TERRISCORE,BUSINESSES
0,"M5T Chinatown, Grange Park, Kensington Market",92,100
1,"M6K Brockton, Exhibition Place, Parkdale Village",90,100
2,"M5S Harbord, University of Toronto",89,100
3,"M6J Little Portugal, Trinity",88,100
4,M4J East Toronto,87,99
5,"M5H Adelaide, King, Richmond",87,100
6,M4S Davisville,84,100
7,"M5R The Annex, North Midtown, Yorkville",84,100
8,M4M Studio District,83,100
9,"M6P High Park, The Junction South",81,100


# DISCUSSION

If you look at the final deliverable, the ranking of neighborhoods by Terrascore, you see that there are a number of neighborhoods with high scores. The median Terriscore from 102 neighborhoods is 24, and 30 neighborhoods scored 50 or higher. Realistically, a sales agent should be able to sell out the available display advertising space in a coffee shop by presenting to no more than a couple dozen interested businesses. Terriscore attempts to create a quality-weighted metric that should equate to a count of probable coffee shop display advertising buyers. By this standard, at least half of the neighborhoods in Toronto should be viable sales territories. On a simple unweighted counting basis, 67 neighborhoods returned at least 24 businesses to approach.

The final deliverable also offers a potential opportunity to reconsider the additive index approach. Consider the top-ranked neighborhood, M5T Chinatown, Grange Park, Kensington Market. It returns the max of 100 businesses, but achives a Terriscore of 92. That 92% ratio may demand being built into a quality metric. The lowest Terriscore among neighborhoods that provided 100 businesses was 61 (M5E Berczy Park), a 61% ratio. The next lower Terriscore on the list is 55 (M4L The Beaches West, India Bazaar), but it is based on only 79 businesses, a 69.6% ratio. Which is the "better" score? It may be the case that above a certain quantity threshhold a Terriscore ratio should prevail when allocating sales agents into territories.

We were limited to returning no more than 100 results for a given search, which covered a radius of 1000m around each neighborhood's lat/lng focal point. That search criteria leads to roughly 1,500 duplicates (business addresses that appear in more than one neighborhood area). It is not necessarily the case that duplicates are a problem. There is no reason to presume that a given business will not advertise within more than one coffee shop in its business area. A sales agent may well find themselves calling on a business that has already committed to another coffee shop, but this does not mean the call is not worth making. Advertisers advertise, and the worst case from any sales call is a "no."

# CONCLUSION

The primary takeaway from this project is that the quantity of data that was <b>not</b> available is large and potentially more useful than the data collected. For example, FourSquare business types did not include <b>professional people</b> (medical, legal, financial, real estate, etc.) who are typically asssigned Prosval scores of 2 and who traditionally purchase local advertising in all forms. There are <b>only 56 banks</b> in a data set that includes nearly 3,500 businesses. Small independent banks have been reliable buyers of local display advertising in the past. Are there really only 56 banks in Toronto, or is the data incomplete? Other examples of enthusiastic buyers of local display advertising that are effectively missing from the data include <b>automotive related businesses</b> (repair, dealerships, rental, etc contribute less than 10 data points), <b>end of life services,</b> and <b>building contractors</b> (plumber, electrical, HVAC, carpentry, etc.).

TV Fanfare Publications, the company mentioned at the beginning of this report, allowed its independent subcontractor sales agents great latitude in devising procedures for selling display advertising to appear in supermarkets. The process worked as follows, more or less.

•	Go to the sales territory and collect all forms of print advertising available<br>
•	Contact advertisers, by business type, by size/frequency of advertisement, as you prefer<br>
•	Sell out the display advertising space available in the supermarket and move on<br>

This is the model that Cafn8 will attempt to reproduce, more or less. Understanding where this project falls in the context of that model, it would appear that another iteration of Data Science should probably be pursued to build on our limited results.

Here is the issue Cafn8 must consider: <b>Will additional effort lead to different and better results?</b> ... <B>ROBUST?</B>

"Robust" is the word for that notion. Additional data to be processed into information comes at a cost (acquisition cost, processing cost, intellectual resource commitment). Based on our results, it would appear obvious that large amounts of additional data <b>are worth accumulating and processing.</b> Data on the unrepresented and underrepresented businesses above is a must to collect and assess. Whether adding this data to our project would actually change the rankings is uncertain, but it would definitely add to the number of potential customers and perhaps offer a reason to be more aggressive about pricing, or some other aspect of the structure of the coffee shop display advertising deal being offered. It would also raise the stakes for making optimal assignments of the most effective sales agents to the most lucrative territories.

Wherever possible, making maximum use of low cost proxy data that is robust offers enormous returns on investment in Data Science. In the case of this project, we are confident that valuable sales territory information has been created by producing Terriscores despite having incomplete data. Future efforts to improve on deliverables will be worth pursuing, but are unlikely to contradict the primary result. Adding new business categories or adjusting Prosval for business types may alter the rankings somewhat, but it appears unlikely that the rankings would flip under any circumstances. There should not be any systematic effect that would move a given cluster of neighborhoods very far from its current position on the ranked list in future Data Science projects.

For that reason, Cafn8 should be pleased with this report and find value in makinng use of it to make decisions about sales territories in Toronto.