# Introduction / Business Problem

For families and individuals considering relocation to an unfamiliar large city, it can be overwhelming to know where to start looking for a neighborhood that will fit their preferences and needs at that time in their lives.  Young, single individuals may prefer to stay closer to areas with more nightlife, while families may gravitate to areas that have less nightlife but more parks and other outdoor recreation.  

A recommendation engine assessing the various neighborhoods within a metropolitan area would assist potential movers with little-to-no familiarity with their future city in quickly locating neighborhoods that would be the most likely fit for them.  For this application, a set of personas with assumptions regarding their preferences will be formed, and the recommendations will be formed for the personas.  

A recommendation engine like this could additionally be utilized by realtors in helping to identify key preferences and priorities of their clients, and identify neighborhoods in which they are more likely to buy.

# Data Plan

###  Data Background

In the case of this neighborhood recommendation engine, we will use the categories of venues that can be found by Foursquare API's to approximate the overall feel of the neighborhood.  However, studies have shown that having too many options can result in analysis paralysis and frustration, and the number of potential venue categories within Foursquare is not small.  In order to prevent overwhelming potential users, we will keep the number of input options to a relatively small number by leveraging the hierarchical structure of the Venue Category List. Specifically, we will identify the top-level categories available in this hierarchy and compare neighbohoods using a combination of the top-level categories and preferences of our user personas to generate a recommendation.   

The top level categories include:
- Arts & Entertainment
- College & University
- Event
- Food
- Nightlife Spot
- Outdoors & Recreation
- Professional & Other Places
- Residence
- Shop & Service
- Travel & Transport


### Data Needs

The following are our high-level data needs for this analysis:
- A list of neighborhoods with geographical coordinates (latitude and longitude)
- A list of Foursquare venue categories plus the top level category for each
- A list of venues for each neighborhood with category assigned.

### Data Sources

The following data sources will be used to conduct this analysis:
- [List of Toronto Postal Codes](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) - includes Postcode, Borough, Neighborhood
- [Toronto Postal Codes with Latitude and Longitude](https://cocl.us/Geospatial_data)
- [Foursquare Venues / Explore API](https://developer.foursquare.com/docs/api/venues/explore) - to retrieve venues located within a certain distance of the center of the neighborhood
- [Foursquare Venue Category Hierarchy](https://developer.foursquare.com/docs/resources/categories) - the foursquare category list / hierarchy will be retrieved in JSON format via the corresponding API.

### Data Wrangling

The following steps will be conducted to get the data ready for analysis
- Form a list of neighborhood names with latitude and longitude by combining the list of Neighborhoods with Postal Codes with the list of Postal Codes with latitude and longitude
- Parse through the list of categories available from the Category API using a recursive routine to determine the top-level category for each foursquare category 
- Retrieve all of the venues within a certain radius of each neighborhood's latitude and longitude coordinates
- Iterate through the venues to assign the top-level category to each venue

### Planned Data Analysis

Once the data has been collected, various assessments will be conducted to generate relative profiles of the neighborhoods, including the following.  

- which neighbhoods have the highest / lowest density of venues
- which neighborhoods have the most / least of each high-level category
- which neighborhoods have the most / least variety within each category

These will then compared to the user's preferences to form a set of recommendations for each persona.  