# Capstone project: Finding a neighbourhood to call home
This notebook forms the basis for my IBM data science capstone project

## Introduction / Business Problem
A couple is looking to move to the western edge of Lake Ontario, because they have family in the area and they already know they don't want to live in the heart of Toronto. They're looking for a potential neighbourhod to move into that will continue to provide local activities for them and potential future kids. The couple is therefore looking into neighbourhoods within west Hamilton, Burlington, Oakville, and Mississauga. The couple want daycare and elementary schools to be local, as well as a high diversity of outdoor and athletic facilities.

The couple have a lot of outdoor facilities and athletic facilities in mind that they are interested in, including:
   * Playground			   
   * Bike Trail			   
   * Beach			    	
   * Botanical Garden		
   * Dog Run				 
   * Fishing Spot			
   * Lake			    	 
   * Nature Preserve		  
   * Other Great Outdoors	 
   * Park					 
   * Pool	
   * Gym / Fitness Center	 
   * Boxing Gym			  
   * Climbing Gym			
   * Cycle Studio			 
   * Gym Pool				
   * Gymnastics Gym		
   * Gym					  
   * Martial Arts Dojo	    
   * Outdoor Gym			
   * Pilates Studio		   
   * Track				    
   * Yoga Studio			 
   * Indoor Play Area		
   * Recreation Center	

A search of this magnitude (eg. 4 cities and many neighbourhoods within them, across so many different types of amenities, and factoring in geolocation) would take up an extraordinary amount of time to individually search per neighbourhood and manually combs the results (i.e. Google searches). Therefore, the couple are looking for a data science approach that could help them pin-point the neighbourhoods that offer the amenities they are interested in, as well as the neighbourhoods based on the diversity and quantity of amenities offered. Furthermore, the couple is interested in knowing generally which neighbourhoods are more and less similar to each other based on these amenities.

#### Target audience
This problem is relevant to other couples and young adults in similar stages of life, who for personal reasons are interested in the communities to the west of Toronto, Ontario, Canada. 

## Data
### Neighbourhood names and geolocations 
Unfortunately, the Wikipedia page for the neighbourhood and postal code information for this region is incompletely annotated;
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_L. However, the following database,
https://www.postalpinzipcodes.com/Postcode-CAN-Canada-Postal-code-L9K-ZIP-Code, is available to curate this information for the following cities: Hamilton (west), Burlington, Oakville, and Mississauga. Conveniently, neighbourhoods are grouped by postal code so as to have roughly even neighbourhoods areas. Two small postal code areas were joined (i.e. Burlington Central and Maple regions) into one neighbourhood, since they are both relatively small territories. The neighbourhood of Malton was excluded from Mississauga due to the highly urban location of this neighbourhood and proximity to the airport. After preprocessing the data using the aforementioned descriptions, 35 distinct neighbourhoods were  assembled and used for subsequent analysis.

In [4]:
import types
import pandas as pd
import numpy as np
# The next line is hidden as it contains user credentials

In [2]:
# The code was removed by Watson Studio for sharing.

In [3]:
city_df = pd.read_csv(body)
city_df

Unnamed: 0,City,Neighbourhood,Postal_code,Latitude,Longitude
0,Mississauga,"Applewood, Dixie",L4Y,43.6028,-79.5929
1,Mississauga,"East Hurontario, West Rathwood",L4Z,43.6192,-79.6538
2,Mississauga,"Mississauga Valley, East Cooksville",L5A,43.5883,-79.6091
3,Mississauga,"City Centre, Cooksville, Fairview, East Credit...",L5B,43.5771,-79.6306
4,Mississauga,"West Creditview, Mavis, Erindale",L5C,43.5624,-79.6504
5,Mississauga,Lakeview,L5E,43.5836,-79.561
6,Mississauga,"SW Lakeview, Mineola, East Port Credit",L5G,43.5647,-79.5852
7,Mississauga,"Lorne Park, West Port Credit",L5H,43.5419,-79.6164
8,Mississauga,"Park Royal, Clarkson, Birchwood, Rattray Park ...",L5J,43.5102,-79.6296
9,Mississauga,"Sheridan Homelands, Sherwood Forest",L5K,43.5272,-79.6617


## Data (continued)
### Foursquare 

Foursquare offers customizable searches based on a particular type of venue, which is codified using a category ID. The category ID's for various establishments are available from Foursquare here: https://developer.foursquare.com/docs/build-with-foursquare/categories/. 

#### The couple's neighbourhood criteria is: 
   1. Must have local daycare or child care service 
   2. Must have local elementary school 
   3. Local fitness amenities, i.e. gyms, pools, dojo, studios; and outdoor amenities, i.e. trails, parks, dog run, fishing, playground, etc.
   
Therefore, sequential calls to the Foursquare API will be used to generate the data for the following:
   1. day care '5744ccdfe4b0c0459246b4c7' 
   2. elementary school '4f4533804b9074f6e4fb0105'
   3. outdoors and recreation '4d4b7105d754a06377d81259'
   
The data will be manipulated using pandas and numpy packages in python. Neighbourhoods will be removed if they do not have a day care or elementary school within 1000 metres of their neighbourhood's geolocation, which will be confirmed using map visualization packages (folium) to represent the middle of the centre of the neighbourhood. Then, neighbourhoods will be evaluated for outdoor and recreation amenities. They will be ranked by both quantity and diversity, and clustered (K Means clustering) to evaluate neighbourhoods that are similar in the types of outdoor and recreation activites they offer. Map visualization packages will be used to identify similar neighbourhoods, as well as the highly desirable neighbourhoods based on the couple's criteria.