# The Big Move: Working out where to live in Toronto - Report

## 1. Introduction
For most of my life, I have lived in Melbourne. Most recently, I lived in a fun neighbourhood called North Melbourne, which was known for having a lot of nice brunch places, cafes, and proximity to Melbourne's city centre. However, I am planning to move to a city like Toronto for work, and am looking for similar neighbourhoods to live in.

The problem to be solved is finding hip neighbourhoods similar to North Melbourne in Toronto to live in. There is a sizeable community of Australians living in Toronto, and so such a report may either be directly useful for people looking for similar neighbourhoods, or may be repurposed to look for similarities between other neighbourhoods.

This kind of tool is therefore undoubtedly of value to people moving between the two cities, a common occurrence. For my own case, this will directly help to find a shortlist of neighbourhoods, from which I can compare other factors important to me other than venues.

## 2. Data
The data required for this case is:

* A list of neighbourhoods in Toronto (scraped from Wikipedia)
* The geographical coordinates of these neighbourhoods (using the CSV file given in Module 3)
* Venue data within these neighbourhoods, including venue categories (from the Foursquare API)
* Distance of these neighbourhoods from the city centre of Toronto (using Haversine formula & geographical coordinates)
* The geographical coordinates of North Melbourne (manually collected)
* Venue data within the neighbourhood of North Melbourne (from the Foursquare API)
* Distance of North Melbourne from the city centre of Melbourne (using Haversine formula & geographical coordinates)

Regarding the Foursquare API data, the following variables will be taken:
* Neighborhood
* Neighborhood Latitude
* Neighborhood Longitude
* Venue
* Name of the venue e.g. the name of a store or restaurant
* Venue Latitude
* Venue Longitude
* Venue Category

For the purpose of this exercise, I have defined the city centre as the busiest transportation station in the city. In the case of Toronto, this would be Union Station, whereas for Melbourne, this would be Flinders Street Station.

## 3. Methodology
### 3.1 Exploratory Data Analysis
I collected the above data retrieving 1520 venue records, including 1507 from 32 neighbourhoods of Toronto within 7.2km of the city centre & 13 from North Melbourne. The value 7.2km was chosen as double the distance from North Melbourne to the city centre, as I would like to live in a neighbourhood with a comparable distance from the city centre. I performed count analysis to identify how many nearby venues were obtained from each neighbourhood within a radius of 500 metres up to a limit of 100 venues.

I also identified the top 5 and top 10 most common venue categories in each neighbourhood based on the Foursquare API data.

### 3.2 Clustering
I performed k-means cluster analysis to segment neighbourhoods (including North Melbourne) by similar venues. As there is limited pre-existing information available on the number of clusters to choose, I ran this analysis over cluster numbers between 1 & 10, using the number of initiations as 10 & a plot of clusters vs distortion to determine the elbow point and choose an optimal number of clusters. I then identified which cluster North Melbourne was in and found the neighbourhoods considered similar from this analysis to be the results of this analysis.

## 4. Results - see week-5-assignment-notebook.ipynb
### 4.1 Exploratory Data Analysis
The neighbourhoods and the counts of nearby venues within 500 metres were obtained as per the table printed in the notebook. A minimum of 2 venues (Moore Park, Summerhill East) and a maximum of 100 venues were obtained for each neighbourhood.

The top 5 & a sample of the top 10 most common venues for each neighbourhood are also printed in the notebook.

### 4.2 Clustering
For reproducibility of results for the purpose of this assignment, the random state 0 was assigned to the clustering analysis. The elbow point of 4 clusters was determined, yielding a primary cluster of 28 neighbourhoods similar to North Melbourne. The remaining clusters had 2, 1 & 1 neighbourhoods in them respectively. These neighbourhoods are listed out here:

* The Danforth West, Riverdale
* India Bazaar, The Beaches West
* Studio District
* Summerhill West, Rathnelly, South Hill, Forest...
* St. James Town, Cabbagetown
* Church and Wellesley
* Regent Park, Harbourfront
* Garden District, Ryerson
* St. James Town
* Berczy Park
* Central Bay Street
* Richmond, Adelaide, King
* Harbourfront East, Union Station, Toronto Islands
* Toronto Dominion Centre, Design Exchange
* Commerce Court, Victoria Hotel
* The Annex, North Midtown, Yorkville
* University of Toronto, Harbord
* Kensington Market, Chinatown, Grange Park
* CN Tower, King and Spadina, Railway Lands, Har...
* Stn A PO Boxes
* First Canadian Place, Underground city
* Christie
* Dufferin, Dovercourt Village
* Little Portugal, Trinity
* Brockton, Parkdale Village, Exhibition Place
* Parkdale, Roncesvalles
* Queen's Park, Ontario Provincial Government
* Business reply mail Processing Centre, South C...

A map of these neighbourhoods is printed in the notebook.

# 5. Discussion
The key findings identify the 28 neighbourhoods in Toronto considered similar to North Melbourne based purely on venue categories & distance from the city centre. From this shortlist of neighbourhoods, it should be possible to find potential places to live when moving to Toronto.

However, a key limitation needs to be discussed. The fact that 28 out of 32 possible neighbourhoods meeting the distance criteria suggests that either a large proportion Toronto shares significant similarities with North Melbourne, and/or top 10 venue categories are a poorly sensitive criteria to assess similarity. An alternative explanation could be the low numbers of venues in some neighbourhoods limiting the strength of the clustering analysis. Future analysis could include other variables such as house prices, cost of living, percentage of parkland, ease of public transportation etc.

The key recommendation from this analysis is that I should look at properties in these 28 neighbourhoods initially, but that further analysis could be of benefit utilising the above variables to further determine similarity. The variable of distance from city centre could also have been used in a normalised or categorical way in the cluster analysis to weight neighbourhoods at a similar level of distance from the city centre.

# 6. Conclusion
The above 28 neighbourhoods should be considered as similar neighbourhoods in Toronto to North Melbourne in terms of venue categories & distance from the city centre, and initial searches for properties in Toronto when moving should be based on these findings. However, further analysis is warranted if data on other parameters are available.