### Coursera - IBM Data Science Certification
### Capstone Project - Week 4
------------

# Yarn Stores are the New Black

------------
### Mollie Conrad, MSc. 
#### October 4, 2019
------------

### Introduction

In this report, the Foursquare API will be used to determine which location in Kitchener-Waterloo (KW) is the most viable for a "Local Yarn Store" (LYS). LYSs seem to be a niche establishment to an unknowing individual, used by only grannies and crazy cat ladies. But what many people don't know is that within the knitting and crochet fibre community, MANY young folx are ditching the big box stores like "Michael's" for unique and inspiring LYSs. It is here that you can find yarn hand-dyed by your super talented neighbour, or yarn hand-spun from fleece sourced from the next town over. LYSs are seriously underestimated treasure troves.

Currently in KW, there are only 3 *that I know of* within a 20 - 30 minute drive radius. We can use geographical data from Foursquare to determine the *best* location for a new LYS; this will likely be a location that isn't too close to the other 3 LYSs, or any local big box stores that are likely to sell similar products for lower costs.

For the purposes of this project, we will assume we don't already know the quantity and locations of *any* LYS within KW.

This information would be interesting for an individual looking to open a *new* LYS. 

### Data

* Geographical data from Foursquare

Will use Foursquare API to acquire LYS locations, Michael's locations (Big Box Store - Crafting), and high traffic shopping locations.

* Population density data from the Stats Canada 2016 Census

Will use .csv files from the Stats Canada 2016 census to determine population density in municipalities within the Region of Waterloo.

* GeoJSON data from the Region of Waterloo

Will obtain and use GeoJSON files from the Region of Waterloo to mark the boundaries of each municipality within Folium mapping.

#### Foursquare Example: 
(Note: will not execute results - do not want to rack up calls to API)

In [1]:
import requests # library to handle requests
from pandas.io.json import json_normalize

CLIENT_ID = '5MXDGJBX0OMLHH0OPKIN3W44EBFGV1X0RCVBWAJSGBDPYQMS' # your Foursquare ID
CLIENT_SECRET = '1SWB3DXCV5DJCF3JT0KMSEJDYJD4GHELTHWXRNARV0XRE0CJ' # your Foursquare Secret
VERSION = '20190705' # Foursquare API version

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

# -----------------------------------------------------------------
# Coordinates of central Kitchener - Waterloo (KW)
# -----------------------------------------------------------------
KW_latitude = 43.452969
KW_longitude = -80.495064

search_query = 'yarn'
print(search_query + ' .... OK!')

radius = 30000 #meters
print("Searching a radius of", radius/1000, "km")
LIMIT = 30
print("Limiting the number of returned LYS to", LIMIT)

# -----------------------------------------------------------------
# create URL
# -----------------------------------------------------------------
url_LYS = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, KW_latitude, KW_longitude, VERSION, search_query, radius, LIMIT)
print("URL generated:")
# -----------------------------------------------------------------
# display URL
# -----------------------------------------------------------------
url_LYS 

Your credentials:
CLIENT_ID: 5MXDGJBX0OMLHH0OPKIN3W44EBFGV1X0RCVBWAJSGBDPYQMS
CLIENT_SECRET:1SWB3DXCV5DJCF3JT0KMSEJDYJD4GHELTHWXRNARV0XRE0CJ
yarn .... OK!
Searching a radius of 30.0 km
Limiting the number of returned LYS to 30
URL generated:


'https://api.foursquare.com/v2/venues/search?client_id=5MXDGJBX0OMLHH0OPKIN3W44EBFGV1X0RCVBWAJSGBDPYQMS&client_secret=1SWB3DXCV5DJCF3JT0KMSEJDYJD4GHELTHWXRNARV0XRE0CJ&ll=43.452969,-80.495064&v=20190705&query=yarn&radius=30000&limit=30'

#### Stats Canada Census Example: 

Population data for the KW region can be obtained from StatsCanada: https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/details/page_Hierarchy-Hierarchie.cfm?Lang=E&Tab=1&Geo1=CMACA&Code1=541&Geo2=PR&Code2=35&SearchText=Kitchener%20-%20Cambridge%20-%20Waterloo&SearchType=Begins&SearchPR=01&B1=Population&TABID=1&type=0

From this URL, I obtained 2016 census data for the following regions:
* Kitchener
* Cambridge
* Waterloo
* North Dumfries
* Wilmot
* Woolwich
* Wellesley

These .CSV files will be cleaned in such a way as to reduce confusion when read in by pandas. 

Placing one of these into a dataframe for example:

In [2]:
import pandas as pd

Waterloo_df = pd.read_csv('WaterlooCensus2016.csv', skiprows = 1)
Waterloo_df.drop(columns = ['Note', 'Flag_Total', 'Flag_Male', 'Flag_Female', 'Flag_Total.1', 'Flag_Male.1', 'Flag_Female.1'], axis = 1, inplace = True)
Waterloo_df.rename(columns={'Total.1':'Total_ON', 'Male.1':'Male_ON', 'Female.1':'Female_ON' })


Topics = ['Population and dwellings', 'Age characteristics']
Waterloo_df = Waterloo_df[Waterloo_df['Topic'].isin(Topics)]

Waterloo_df.head(6)

Unnamed: 0,Topic,Characteristics,Total,Male,Female,Total.1,Male.1,Female.1
1,Population and dwellings,Population; 2016,104986.0,,,13448494.0,,
2,Population and dwellings,Population; 2011,98780.0,,,12851821.0,,
3,Population and dwellings,Population percentage change; 2011 to 2016,6.3,,,4.6,,
4,Population and dwellings,Total private dwellings,46096.0,,,5598391.0,,
5,Population and dwellings,Private dwellings occupied by usual residents,40381.0,,,5169174.0,,
6,Population and dwellings,Population density per square kilometre,1639.8,,,14.8,,


#### GeoJSON data example:

*Municipality Boundary Data*:

Acquiring GeoJSON file and .csv file of the Region of Waterloo municipality boundaries from https://open-kitchenergis.opendata.arcgis.com/datasets/RMW::regional-boundaries/data

In [3]:
# -----------------------------------------------------------------------
# GeoJSON of Region of Waterloo
# -----------------------------------------------------------------------
GeoJSON_WaterlooRegion = r'https://opendata.arcgis.com/datasets/dc4eff944b774abdb6ee0e1931a8663f_17.geojson'
GeoJSON_Municipalities = r'https://opendata.arcgis.com/datasets/2840815b1dff4989b8c8513541a00b49_0.geojson'

In [4]:
Municipality_df = pd.read_csv('Municipal_Boundary.csv', usecols= [0,1,2,3,4,5])

Municipality_df

Unnamed: 0,OBJECTID,MUNICIPALITY,PERIMETER,SDE_MUNICIPALITY_AREA,MUNICIPALITYID,CATEGORY
0,1,WILMOT,69.837948,266.183351,104,MUNICIPALITY
1,2,WATERLOO,41.257436,65.236477,100,MUNICIPALITY
2,4,WOOLWICH,100.704515,329.683461,103,MUNICIPALITY
3,5,NORTH DUMFRIES,85.626001,190.228732,101,MUNICIPALITY
4,6,CAMBRIDGE,60.045538,115.362894,106,MUNICIPALITY
5,7,WELLESLEY,65.45636,278.415743,102,MUNICIPALITY
6,1608,KITCHENER,59.752821,138.405656,105,MUNICIPALITY


### Discussions

Will determine the 5 best locations via inspection of in-depth Folium maps based on the following criteria:

* Near high population density regions
* Far enough away from other yarn stores that competition isn't a concern
* Near highways for easy access
* Perhaps near high traffic shopping centers

### Conclusions

The final best location will be determined by optimizing each of the above criteria - highest density population, central location, near local highways, ideal location separate from other LYS, and nearest high traffic shopping centers.