### Coursera - IBM Data Science Certification
### Capstone Project - Week 4
------------

# Yarn Stores are the New Black

------------
### Mollie Conrad, MSc. 
#### October 4, 2019
------------

#### Introduction

In this report, the Foursquare API will be used to determine which location in Kitchener-Waterloo (KW) is the most viable for a "Local Yarn Store" (LYS). LYSs seem to be a niche establishment to an unknowing individual, used by only grannies and crazy cat ladies. But what many people don't know is that within the knitting and crochet fibre community, MANY young folx are ditching the big box stores like "Michael's" for unique and inspiring LYSs. It is here that you can find yarn hand-dyed by your super talented neighbour, or yarn hand-spun from fleece sourced from the next town over. LYSs are seriously underestimated treasure troves.

Currently in KW, there are only 3 *that I know of* within a 20 - 30 minute drive radius. We can use geographical data from Foursquare to determine the *best* location for a new LYS; this will likely be a location that isn't too close to the other 3 LYSs, or any local big box stores that are likely to sell similar products for lower costs.

For the purposes of this project, we will assume we don't already know the quantity and locations of *any* LYS within KW.

This information would be interesting for an individual looking to open a *new* LYS. 

#### Data

To solve this problem, we will require the import of pandas (pd), folium (visual mapping) and the use of unique Foursquare API credentials. Foursquare will be used to search for LYS, and locations where LYS *may* be too far away (perhaps a radius = 30 km). The geographical data returned by Foursquare will help us to determine the best location for a new store. 



In [1]:
# Import all required libraries
import pandas as pd
import requests # library to handle requests
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge

The following packages will be UPDATED:

    certifi: 2019.6.16-py36_1 conda-forge --> 2019.9.11-py36_0 conda-forge


Downloading and Extracting Packages
certifi-2019.9.11    | 147 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done


In [2]:
CLIENT_ID = '5MXDGJBX0OMLHH0OPKIN3W44EBFGV1X0RCVBWAJSGBDPYQMS' # your Foursquare ID
CLIENT_SECRET = '1SWB3DXCV5DJCF3JT0KMSEJDYJD4GHELTHWXRNARV0XRE0CJ' # your Foursquare Secret
VERSION = '20190705' # Foursquare API version

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: 5MXDGJBX0OMLHH0OPKIN3W44EBFGV1X0RCVBWAJSGBDPYQMS
CLIENT_SECRET:1SWB3DXCV5DJCF3JT0KMSEJDYJD4GHELTHWXRNARV0XRE0CJ


In [3]:
# -----------------------------------------------------------------
# Coordinates of central Kitchener - Waterloo (KW)
# -----------------------------------------------------------------

KW_latitude = 43.452969
KW_longitude = -80.495064

In [4]:
search_query = 'yarn'
print(search_query + ' .... OK!')

yarn .... OK!


In [5]:
radius = 30000 #meters
print("Searching a radius of", radius/1000, "km")
LIMIT = 30
print("Limiting the number of returned LYS to", LIMIT)

# -----------------------------------------------------------------
# create URL
# -----------------------------------------------------------------
url_LYS = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, KW_latitude, KW_longitude, VERSION, search_query, radius, LIMIT)
print("URL generated:")
# -----------------------------------------------------------------
# display URL
# -----------------------------------------------------------------
url_LYS 

Searching a radius of 30.0 km
Limiting the number of returned LYS to 30
URL generated:


'https://api.foursquare.com/v2/venues/search?client_id=5MXDGJBX0OMLHH0OPKIN3W44EBFGV1X0RCVBWAJSGBDPYQMS&client_secret=1SWB3DXCV5DJCF3JT0KMSEJDYJD4GHELTHWXRNARV0XRE0CJ&ll=43.452969,-80.495064&v=20190705&query=yarn&radius=30000&limit=30'

In [6]:
results_LYS = requests.get(url_LYS).json()
results_LYS

{'meta': {'code': 200, 'requestId': '5d9cde3c018cbb002c7f62c9'},
 'response': {'venues': [{'id': '505dea27e4b02c16f354d244',
    'name': 'Yarn Indulgences',
    'location': {'lat': 43.44926,
     'lng': -80.48585,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.44926,
       'lng': -80.48585}],
     'distance': 851,
     'cc': 'CA',
     'country': 'Canada',
     'formattedAddress': ['Canada']},
    'categories': [{'id': '4bf58dd8d48988d127951735',
      'name': 'Arts & Crafts Store',
      'pluralName': 'Arts & Crafts Stores',
      'shortName': 'Arts & Crafts',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/artstore_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1570561596',
    'hasPerk': False},
   {'id': '5d968d699d5a9900086a32f3',
    'name': 'Galt House Of Yarn',
    'location': {'address': '110-7 Grand Ave S',
     'lat': 43.358144399644424,
     'lng': -80.31754130029069,
     'labeledLatLngs': [{'label': 'displ

In [7]:
# assign relevant part of JSON to venues
LYS = results_LYS['response']['venues']

# tranform venues into a dataframe
dataframe_LYS = json_normalize(LYS)
dataframe_LYS.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.country,location.formattedAddress,location.address,location.postalCode,location.city,location.state,location.crossStreet
0,505dea27e4b02c16f354d244,Yarn Indulgences,"[{'id': '4bf58dd8d48988d127951735', 'name': 'A...",v-1570561596,False,43.44926,-80.48585,"[{'label': 'display', 'lat': 43.44926, 'lng': ...",851,CA,Canada,[Canada],,,,,
1,5d968d699d5a9900086a32f3,Galt House Of Yarn,"[{'id': '52f2ab2ebcbc57f1066b8b25', 'name': 'K...",v-1570561596,False,43.358144,-80.317541,"[{'label': 'display', 'lat': 43.35814439964442...",17819,CA,Canada,"[110-7 Grand Ave S, Cambridge ON N1S 2L3, Canada]",110-7 Grand Ave S,N1S 2L3,Cambridge,ON,
2,5792be88498e3514faa45957,Hillside yarn bombed forest,"[{'id': '4bf58dd8d48988d1f1931735', 'name': 'G...",v-1570561596,False,43.596976,-80.241305,"[{'label': 'display', 'lat': 43.596976, 'lng':...",26009,CA,Canada,[Canada],,,,,
3,52f7d2ae498e4919d81c0a8c,Yarnbird,"[{'id': '52f2ab2ebcbc57f1066b8b25', 'name': 'K...",v-1570561596,False,43.680841,-80.430093,"[{'label': 'display', 'lat': 43.680841, 'lng':...",25902,CA,Canada,"[22 Mill St W, Elora ON N0B 1S0, Canada]",22 Mill St W,N0B 1S0,Elora,ON,
4,4cb6372d64998cfa4daa13a2,All Strung Out Fine Yarns,"[{'id': '4bf58dd8d48988d127951735', 'name': 'A...",v-1570561596,False,43.545631,-80.250688,"[{'label': 'display', 'lat': 43.54563094931906...",22266,CA,Canada,"[36 Quebec St (Baker St), Guelph ON N1H 2T4, C...",36 Quebec St,N1H 2T4,Guelph,ON,Baker St


In [8]:
# keeping only columns that include venue name and anything that is associated with location
filtered_columns_LYS = ['name', 'categories'] + [col for col in dataframe_LYS.columns if col.startswith('location.')]
dataframe_filtered_LYS = dataframe_LYS.loc[:, filtered_columns_LYS]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered_LYS['categories'] = dataframe_filtered_LYS.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered_LYS.columns = [column.split('.')[-1] for column in dataframe_filtered_LYS.columns]

dataframe_filtered_LYS

Unnamed: 0,name,categories,lat,lng,labeledLatLngs,distance,cc,country,formattedAddress,address,postalCode,city,state,crossStreet
0,Yarn Indulgences,Arts & Crafts Store,43.44926,-80.48585,"[{'label': 'display', 'lat': 43.44926, 'lng': ...",851,CA,Canada,[Canada],,,,,
1,Galt House Of Yarn,Knitting Store,43.358144,-80.317541,"[{'label': 'display', 'lat': 43.35814439964442...",17819,CA,Canada,"[110-7 Grand Ave S, Cambridge ON N1S 2L3, Canada]",110-7 Grand Ave S,N1S 2L3,Cambridge,ON,
2,Hillside yarn bombed forest,General Entertainment,43.596976,-80.241305,"[{'label': 'display', 'lat': 43.596976, 'lng':...",26009,CA,Canada,[Canada],,,,,
3,Yarnbird,Knitting Store,43.680841,-80.430093,"[{'label': 'display', 'lat': 43.680841, 'lng':...",25902,CA,Canada,"[22 Mill St W, Elora ON N0B 1S0, Canada]",22 Mill St W,N0B 1S0,Elora,ON,
4,All Strung Out Fine Yarns,Arts & Crafts Store,43.545631,-80.250688,"[{'label': 'display', 'lat': 43.54563094931906...",22266,CA,Canada,"[36 Quebec St (Baker St), Guelph ON N1H 2T4, C...",36 Quebec St,N1H 2T4,Guelph,ON,Baker St


In [9]:
# Dropping column "LabeledLatLngs"
dataframe_filtered_LYS.drop(columns = "labeledLatLngs", axis=1, inplace = True)
dataframe_filtered_LYS.drop(columns = "formattedAddress", axis=1, inplace = True)
dataframe_filtered_LYS

Unnamed: 0,name,categories,lat,lng,distance,cc,country,address,postalCode,city,state,crossStreet
0,Yarn Indulgences,Arts & Crafts Store,43.44926,-80.48585,851,CA,Canada,,,,,
1,Galt House Of Yarn,Knitting Store,43.358144,-80.317541,17819,CA,Canada,110-7 Grand Ave S,N1S 2L3,Cambridge,ON,
2,Hillside yarn bombed forest,General Entertainment,43.596976,-80.241305,26009,CA,Canada,,,,,
3,Yarnbird,Knitting Store,43.680841,-80.430093,25902,CA,Canada,22 Mill St W,N0B 1S0,Elora,ON,
4,All Strung Out Fine Yarns,Arts & Crafts Store,43.545631,-80.250688,22266,CA,Canada,36 Quebec St,N1H 2T4,Guelph,ON,Baker St


In [10]:
dataframe_filtered_LYS.name

0               Yarn Indulgences
1             Galt House Of Yarn
2    Hillside yarn bombed forest
3                       Yarnbird
4      All Strung Out Fine Yarns
Name: name, dtype: object

#### Now generating Foursquare data for local Big Box store "Michaels".

In [11]:
radius = 30000 #meters
print("Searching a radius of", radius/1000, "km")
LIMIT = 30
print("Limiting the number of returned Michael's to", LIMIT)

New_search_query = 'Michaels'
# -----------------------------------------------------------------
# create URL
# -----------------------------------------------------------------
url_Michaels = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, KW_latitude, KW_longitude, VERSION, New_search_query, radius, LIMIT)
print("URL generated:")
# -----------------------------------------------------------------
# display URL
# -----------------------------------------------------------------
url_Michaels 

Searching a radius of 30.0 km
Limiting the number of returned Michael's to 30
URL generated:


'https://api.foursquare.com/v2/venues/search?client_id=5MXDGJBX0OMLHH0OPKIN3W44EBFGV1X0RCVBWAJSGBDPYQMS&client_secret=1SWB3DXCV5DJCF3JT0KMSEJDYJD4GHELTHWXRNARV0XRE0CJ&ll=43.452969,-80.495064&v=20190705&query=Michaels&radius=30000&limit=30'

In [12]:
results_Michaels = requests.get(url_Michaels).json()
results_Michaels

{'meta': {'code': 200, 'requestId': '5d9cde3dcad1b6002c263aef'},
 'response': {'venues': [{'id': '4b23cb31f964a520de5924e3',
    'name': 'Michaels',
    'location': {'address': '50 Westmount Rd N',
     'crossStreet': 'Erb',
     'lat': 43.460981,
     'lng': -80.536459,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.460981,
       'lng': -80.536459}],
     'distance': 3461,
     'postalCode': 'N2L 2R5',
     'cc': 'CA',
     'city': 'Waterloo',
     'state': 'ON',
     'country': 'Canada',
     'formattedAddress': ['50 Westmount Rd N (Erb)',
      'Waterloo ON N2L 2R5',
      'Canada']},
    'categories': [{'id': '4bf58dd8d48988d127951735',
      'name': 'Arts & Crafts Store',
      'pluralName': 'Arts & Crafts Stores',
      'shortName': 'Arts & Crafts',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/artstore_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1570561597',
    'hasPerk': False},
   {'id': '4aff68c5f964a520

In [13]:
# assign relevant part of JSON to venues
Michaels = results_Michaels['response']['venues']

# tranform venues into a dataframe
dataframe_Michaels = json_normalize(Michaels)
dataframe_Michaels.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress
0,4b23cb31f964a520de5924e3,Michaels,"[{'id': '4bf58dd8d48988d127951735', 'name': 'A...",v-1570561597,False,50 Westmount Rd N,Erb,43.460981,-80.536459,"[{'label': 'display', 'lat': 43.460981, 'lng':...",3461,N2L 2R5,CA,Waterloo,ON,Canada,"[50 Westmount Rd N (Erb), Waterloo ON N2L 2R5,..."
1,4aff68c5f964a5204d3822e3,Michaels,"[{'id': '4bf58dd8d48988d127951735', 'name': 'A...",v-1570561597,False,18 Pinebush Rd Unit 1,,43.409428,-80.327303,"[{'label': 'display', 'lat': 43.40942847822192...",14401,N1R 8K5,CA,Cambridge,ON,Canada,"[18 Pinebush Rd Unit 1, Cambridge ON N1R 8K5, ..."
2,4e91ccf129c2117fa453575a,Michaels,"[{'id': '4bf58dd8d48988d127951735', 'name': 'A...",v-1570561597,False,500 Fairway Rd S Unit 1,,43.420774,-80.448239,"[{'label': 'display', 'lat': 43.420774, 'lng':...",5212,N2C 1X3,CA,Kitchener,ON,Canada,"[500 Fairway Rd S Unit 1, Kitchener ON N2C 1X3..."
3,4ba37ee6f964a5200a4238e3,St Michaels Campus,"[{'id': '4bf58dd8d48988d199941735', 'name': 'C...",v-1570561597,False,,,43.475267,-80.529744,"[{'label': 'display', 'lat': 43.475267, 'lng':...",3743,,CA,,,Canada,[Canada]
4,4e885b2f61af3ee1a9eb8aab,Michaels,"[{'id': '4bf58dd8d48988d127951735', 'name': 'A...",v-1570561597,False,15 Woodlawn Rd W Unit 101,Woolwich St,43.563974,-80.283619,"[{'label': 'display', 'lat': 43.563974, 'lng':...",21074,N1H 1G8,CA,Guelph,ON,Canada,"[15 Woodlawn Rd W Unit 101 (Woolwich St), Guel..."


In [14]:
# keeping only columns that include venue name and anything that is associated with location
filtered_columns_Michaels = ['name', 'categories'] + [col for col in dataframe_Michaels.columns if col.startswith('location.')]
dataframe_filtered_Michaels = dataframe_Michaels.loc[:, filtered_columns_Michaels]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered_Michaels['categories'] = dataframe_filtered_Michaels.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered_Michaels.columns = [column.split('.')[-1] for column in dataframe_filtered_Michaels.columns]

# Dropping column "LabeledLatLngs", "formattedAddress"
dataframe_filtered_Michaels.drop(columns = "labeledLatLngs", axis=1, inplace = True)
dataframe_filtered_Michaels.drop(columns = "formattedAddress", axis=1, inplace = True)

# Dropping rows that are not "Michaels"
big_box_store = ['Michaels']

dataframe_filtered_Michaels = dataframe_filtered_Michaels[dataframe_filtered_Michaels['name'].isin(big_box_store)]

dataframe_filtered_Michaels

Unnamed: 0,name,categories,address,crossStreet,lat,lng,distance,postalCode,cc,city,state,country
0,Michaels,Arts & Crafts Store,50 Westmount Rd N,Erb,43.460981,-80.536459,3461,N2L 2R5,CA,Waterloo,ON,Canada
1,Michaels,Arts & Crafts Store,18 Pinebush Rd Unit 1,,43.409428,-80.327303,14401,N1R 8K5,CA,Cambridge,ON,Canada
2,Michaels,Arts & Crafts Store,500 Fairway Rd S Unit 1,,43.420774,-80.448239,5212,N2C 1X3,CA,Kitchener,ON,Canada
4,Michaels,Arts & Crafts Store,15 Woodlawn Rd W Unit 101,Woolwich St,43.563974,-80.283619,21074,N1H 1G8,CA,Guelph,ON,Canada


In [15]:
# Ensuring only "Michaels" stores are included
dataframe_filtered_Michaels.name

0    Michaels
1    Michaels
2    Michaels
4    Michaels
Name: name, dtype: object

#### Mapping out locations of current LYS and Michaels

Folium will use KW's geographical coordinates and the coordinates of the LYS and Michaels returned from Foursquare to generate a custom map. 

In [16]:
yarn_map = folium.Map(location=[KW_latitude, KW_longitude], zoom_start=10) # generate map centred on Kitchener

# add a red circle marker to represent central Kitchener
folium.features.CircleMarker(
    [KW_latitude, KW_longitude],
    radius=10,
    color='red',
    popup='Kitchener',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(yarn_map)

# add LYS as blue circle markers
for lat, lng, label in zip(dataframe_filtered_LYS.lat, dataframe_filtered_LYS.lng, dataframe_filtered_LYS.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(yarn_map)
    
# add LYS as green circle markers
for lat, lng, label in zip(dataframe_filtered_Michaels.lat, dataframe_filtered_Michaels.lng, dataframe_filtered_Michaels.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='green',
        popup=label,
        fill = True,
        fill_color='green',
        fill_opacity=0.6
    ).add_to(yarn_map)

# display map
yarn_map