<img style="float: right;" src="https://dm2302files.storage.live.com/y4pP_1KkMzOiZ2WHTAowvxnvX_Ik2vc_h0cV4GsdlOKMTa9kvNnD9TA8E-ey1are6dvQ06fA5GH7ImDw1Ke05RT44Z9QhfqPAKk2PSC9NbGVhtgRPdZb2SU-ln8giolTSGAnwnHTyinah5BPkj0vDgqHxo2H6xEpEXwIhltZXQsLD8/Logo.svg.jpg?psid=1&ck=0&ex=720&width=1360&height=2048" width="300">

# The Battle of Neighborhoods

<span style="color:gray">*IBM Applied Data Science Capstone - Week 5*</span>

## 1. Introduction

### 1.1 Background

Evaluating houses is a deeply personal and complex process, impacted by diverse factors ranging from the physical characteristics and local amenities to politic-economic factors. It's important to analyze the situation, research the options, and gather all the necessary information before making an important decision about moving.

Besides the intrinsic features of a property such as the number of beds, baths, square footage, price, age, etc, a  factor that can affect the decision of evaluating houses is the proximity to the things that matter most to them. This may include a workplace, views, parks, schools, community service, residences of relatives and so on.

The same rings true for all who seeks a new place around the world in pursuit of their dreams or in search of a better life.

### 1.2 Problem

Both cities have a large and diverse population of both Toronto and New York, including . Every year hundreds of thousands of immigrants, businessmen and professionals visit, migrate to or settle in these cities for work, livelihood and tourism. 

One of my clients lives in Yorkville, Manhattan, New York and she loves her neighborhood. She recieved a great job offer from Toronto, and she decided to move to Toronto in 3 weeks to take up the new opportunity. She wants to find out a neighborhood in Toronto that has similar amenities available near her that she gets in Yorkville.

Therefore the problem is to find neighborhoods that provides similar amenities of its current neighborhood. In this project, Python's data analysis and geospatial analysis packages was used to analyze the whole spectrum of available listings in a market, evaluate and rank properties based on venue category frequency and arrive at a shortlist of neighborhoods in Toronto, similars to Yorkville.

### 1.3 Target Audience

Target audience for this project is anyone who is searching for a new properties in neighborhoods that provides similar amenities of their current neighborhood.

## 2. Data

Housing data for the city of New York was collected from the New York University Libraries <span style="color:blue">[1]</span>. The New York dataset has a total of 5 boroughs and 306 neighborhoods, as well as the latitude and logitude coordinates of each neighborhood.

Housing data for the city of Toronto was scraped from Wikipedia <span style="color:blue">[2]</span> that contains a list of postal codes, boroughs and neighborhoods, with a total of 10 boroughs and 217 neighborhoods.

## 3. Methodology

Firstly, the boroughs and neighborhoods of New York was collected from the New York University Libraries <span style="color:blue">[1]</span>; and postal codes, boroughs and neighborhoods of Toronto was scraped from Wikipedia <span style="color:blue">[2]</span> using Pandas package.

In order to use Foursquare API, the geographical coordinates in the form os latitude and longitude of neighborhoods of Toronto was needed. To do so, the Geocoder package was used to convert address into latitude and longitude.

Next, Foursquare API was used passing the geographical coordinates to get the top 100 venues that are with a radius of 750 meters. Foursquare returns the venue data in JSON format and the venue name, category, latitude and longitude was acquired. Then, each neighborhood was analyzed by grouping the rows by neighborhood and calculating the mean of the frequency of each venue category. 


Lastly, a shortlist of Toronto neighborhoods was defined by filtering the data with the top 10 most frequent venue category of Yorkville.

#### Import required libraries

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

from tqdm.notebook import tqdm # progress bar library

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

import warnings # warnings library
warnings.filterwarnings("ignore")

print('Libraries imported.')

Libraries imported.


#### Create New York DataFrame

In [3]:
# Get New York Dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

newyork_data = newyork_data['features']

# Define the DataFrame columns
columns_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# Instantiate the dataframe
ny_df = pd.DataFrame(columns=columns_names)

for data in newyork_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    ny_df = ny_df.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

# Drop duplicated neighborhoods
ny_df.drop_duplicates(subset=['Neighborhood'], inplace=True)

ny_df.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


#### Create Toronto DataFrame

In [4]:
# Get tables from the URL and transforme into a DataFrame
tr_df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0] # index 0 is the table of interest

# Rename columns
columns_names = ["PostalCode", "Borough", "Neighborhood"]
tr_df.columns = columns_names

# Drop cells with a borough that is "Not assigned"
tr_df = tr_df[tr_df.Borough != "Not assigned"].reset_index(drop=True)

# Make neighborhood equals the borough if neighborhood is "Not assigned"
for index, row in tr_df.iterrows():
    if row["Neighborhood"] == "Not assigned":
        row["Neighborhood"] = row["Borough"]

# Convert Neighborhood strings to list
tr_df['Neighborhood'] = tr_df.Neighborhood.str.split(',', expand=False)

# Explode Neighborhood list to single rows
tr_df = tr_df.explode('Neighborhood').reset_index(drop=True)

tr_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park
3,M5A,Downtown Toronto,Harbourfront
4,M6A,North York,Lawrence Manor


In [5]:
# Define a function to get coordinates
def get_latlng(neighborhood, state='Toronto', country='Canada'):
    # Initialize your variable to None
    lat_lng_coords = None
    # Loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, {}, {}'.format(neighborhood, state, country))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [6]:
# Call the function to get the coordinates, store in a new list using list comprehension
coords = [get_latlng(neighborhood) for neighborhood in tqdm(tr_df["Neighborhood"].tolist(), 'Getting latitudes and longitudes')]

# Create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

# Merge the coordinates into the original dataframe
tr_df['Latitude'] = df_coords['Latitude']
tr_df['Longitude'] = df_coords['Longitude']

# Check the neighborhoods and the coordinates
print(tr_df.shape)

tr_df.head()

HBox(children=(FloatProgress(value=0.0, description='Getting latitudes and longitudes', max=217.0, style=Progr…


(217, 5)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.686575,-79.409993
1,M4A,North York,Victoria Village,43.73154,-79.31428
2,M5A,Downtown Toronto,Regent Park,43.66069,-79.36031
3,M5A,Downtown Toronto,Harbourfront,43.63951,-79.38316
4,M6A,North York,Lawrence Manor,43.72294,-79.43116


#### Define Foursquare Credentials and Version

In [7]:
CLIENT_ID = 'MDOOB1JIPM2FOIP0BDZ2IJQFB14NTHEVORYGDCITKFFW2GSC' # your Foursquare ID
CLIENT_SECRET = 'X0BEVTOL5WVOY3JKAJNYKU5FD3P0QHY2D0HCYG2MJKEF1KW2' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#### Get the top 100 venues that are within a radius of 500 meters

In [8]:
def getNearbyVenues(neighborhoods, latitudes, longitudes, radius=750, LIMIT=None):
    
    venues_list=[]
    for neighborhood, lat, lng in tqdm(zip(neighborhoods, latitudes, longitudes), 'Getting data'):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except:
            pass
        # return only relevant information for each nearby venue
        venues_list.append([(
            neighborhood,
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                            'Neighborhood_Latitude', 
                            'Neighborhood_Longitude', 
                            'Venue', 
                            'Venue_Latitude', 
                            'Venue_Longitude', 
                            'Venue_Category']
                  
    
    return(nearby_venues)

In [10]:
neighborhood = 'Yorkville'

# Get Yorkville venues
ny_venues = getNearbyVenues(neighborhoods=ny_df.loc[ny_df['Neighborhood'] == neighborhood, 'Neighborhood'],
                                 latitudes=ny_df.loc[ny_df['Neighborhood'] == neighborhood, 'Latitude'],
                                 longitudes=ny_df.loc[ny_df['Neighborhood'] == neighborhood, 'Longitude'],
                                 LIMIT=100
                                  )
ny_venues.head()

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Getting data', max=1.0, style=ProgressS…




Unnamed: 0,Neighborhood,Neighborhood_Latitude,Neighborhood_Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category
0,Yorkville,40.77593,-73.947118,Bagel Bob's on York,40.776459,-73.946972,Bagel Shop
1,Yorkville,40.77593,-73.947118,Peng's Noodle Folk,40.777258,-73.94911,Asian Restaurant
2,Yorkville,40.77593,-73.947118,Carl Schurz Park,40.775118,-73.943763,Park
3,Yorkville,40.77593,-73.947118,Park East Wines & Spirits,40.776715,-73.946663,Liquor Store
4,Yorkville,40.77593,-73.947118,Shorty's,40.777957,-73.948561,Sandwich Place


In [11]:
# Get Toronto venues
toronto_venues = getNearbyVenues(neighborhoods=tr_df['Neighborhood'],
                                 latitudes=tr_df['Latitude'],
                                 longitudes=tr_df['Longitude'],
                                 LIMIT=100
                                  )
toronto_venues.head()

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Getting data', max=1.0, style=ProgressS…




Unnamed: 0,Neighborhood,Neighborhood_Latitude,Neighborhood_Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category
0,Parkwoods,43.686575,-79.409993,Aroma Espresso Bar,43.68817,-79.412599,Café
1,Parkwoods,43.686575,-79.409993,Sir Winston Churchill Park,43.683732,-79.409881,Park
2,Parkwoods,43.686575,-79.409993,Mashu Mashu Mediterranean Grill,43.688297,-79.412563,Middle Eastern Restaurant
3,Parkwoods,43.686575,-79.409993,What A Bagel,43.688079,-79.414544,Bagel Shop
4,Parkwoods,43.686575,-79.409993,Loblaws,43.684188,-79.415485,Grocery Store


#### One Hot Encoded DataFrame 

In [12]:
# One hot encoding
ny_onehot = pd.get_dummies(ny_venues[['Venue_Category']], prefix="", prefix_sep="")
toronto_onehot = pd.get_dummies(toronto_venues[['Venue_Category']], prefix="", prefix_sep="")

# Add neighborhood column back to dataframe
ny_onehot['Neighborhood'] = ny_venues['Neighborhood']
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# Move neighborhood column to the first column
ny_fixed_columns = [ny_onehot.columns[-1]] + list(ny_onehot.columns[:-1])
ny_onehot = ny_onehot[ny_fixed_columns]

tr_fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[tr_fixed_columns]

# Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
ny_grouped = ny_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()

#### Create a new DataFrame with sorted values

In [41]:
# Function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# Create the new DataFrame and display the top 10 venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# Create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [42]:
# Create a new DataFrame
ny_neighborhood_venues_sorted = pd.DataFrame(columns=columns)
ny_neighborhood_venues_sorted['Neighborhood'] = ny_grouped['Neighborhood']

for ind in np.arange(ny_grouped.shape[0]):
    ny_neighborhood_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ny_grouped.iloc[ind, :], num_top_venues)

ny_neighborhood_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Yorkville,Coffee Shop,Gym,Ice Cream Shop,Italian Restaurant,Deli / Bodega,Pizza Place,Bagel Shop,Wine Shop,Bar,Mexican Restaurant


In [43]:
# Create a new DataFrame
tr_neighborhood_venues_sorted = pd.DataFrame(columns=columns)
tr_neighborhood_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    tr_neighborhood_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

tr_neighborhood_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Yoga Studio,Spa,Coffee Shop,Pizza Place,French Restaurant,Dessert Shop,Gym,Optical Shop,Restaurant,Park
1,Agincourt North,Bank,Bakery,Pizza Place,Park,Frozen Yogurt Shop,Juice Bar,Sandwich Place,Discount Store,Movie Theater,Restaurant
2,Albion Gardens,Filipino Restaurant,Gym / Fitness Center,Bank,Gas Station,Camera Store,Paper / Office Supplies Store,Ice Cream Shop,Supermarket,Food & Drink Shop,Beer Store
3,Bathurst Quay,Coffee Shop,Gym,Park,Café,Grocery Store,Light Rail Station,Bank,Track,Historic Site,Harbor / Marina
4,Beaumond Heights,Indian Restaurant,Caribbean Restaurant,Supermarket,Pizza Place,Thai Restaurant,Bank,Coffee Shop,Ice Cream Shop,Pharmacy,American Restaurant
5,Bloordale Gardens,Convenience Store,Bank,Discount Store,Fast Food Restaurant,Coffee Shop,Pizza Place,Donut Shop,Sandwich Place,Deli / Bodega,Drugstore
6,Broadview North (Old East York),Pharmacy,Intersection,Greek Restaurant,Theater,Chinese Restaurant,Bank,Sandwich Place,Business Service,Bus Stop,Frame Store
7,Cabbagetown,Coffee Shop,Restaurant,Thai Restaurant,Diner,Café,Park,Grocery Store,Pharmacy,Gastropub,Japanese Restaurant
8,Chinatown,Coffee Shop,Vegetarian / Vegan Restaurant,Mexican Restaurant,Café,Yoga Studio,Gym,Bar,Dessert Shop,Beer Bar,Art Gallery
9,Clairlea,Coffee Shop,Indian Restaurant,Chinese Restaurant,Rental Car Location,Sandwich Place,Discount Store,Shopping Mall,Fast Food Restaurant,Beer Store,Storage Facility


## 4. Results

The results from ranking show that Harbourfront, Lawrence Manor, Parkwoods, Regent Park and Victoria Village are the most similar neighborhoods to Yorkville, based on the most frequent venue category.

The shortlist of similar venues are visualized in the map bellow. 

#### Get top 10 Yorkville venues

In [14]:
# Create a ny_grouped copy
ny_grouped_top10 = ny_grouped.copy()

# Drop Neighborhood column
ny_grouped_top10.drop(labels='Neighborhood', axis=1, inplace=True)

# Sort columns by frequency
ny_grouped_top10 = ny_grouped_top10.sort_values(by=0, axis=1, ascending=False).iloc[:,:10]

# Reinsert Neighborhood column
ny_grouped_top10['Neighborhood'] = ny_venues['Neighborhood']

# Reorder columns to put Neighborhood in the first column
ny_grouped_top10_columns = [ny_grouped_top10.columns[-1]] + list(ny_grouped_top10.columns[:-1])
ny_grouped_top10_n = ny_grouped_top10[ny_grouped_top10_columns]
ny_grouped_top10_n

Unnamed: 0,Neighborhood,Coffee Shop,Ice Cream Shop,Pizza Place,Gym,Italian Restaurant,Deli / Bodega,Bagel Shop,Wine Shop,Bar,Mexican Restaurant
0,Yorkville,0.07,0.04,0.04,0.04,0.04,0.04,0.03,0.03,0.03,0.03


#### Get top 5 Toronto Neighborhoods

In [15]:
ny_grouped_top10_columns = ny_grouped_top10_n.drop(labels='Neighborhood', axis=1)

# Create a ny_grouped copy
toronto_grouped_intersection = toronto_grouped.copy()

# Drop Neighborhood column
toronto_grouped_intersection.drop(labels='Neighborhood', axis=1, inplace=True)

# Filter the same columns as the Top 10
toronto_grouped_intersection = toronto_grouped_intersection[ny_grouped_top10_columns.columns]

# Reinsert Neighborhood column
toronto_grouped_intersection['Neighborhood'] = toronto_venues['Neighborhood']

# Reorder columns to put Neighborhood in the first column
toronto_grouped_intersection_columns = [toronto_grouped_intersection.columns[-1]] + list(toronto_grouped_intersection.columns[:-1])
toronto_grouped_intersection = toronto_grouped_intersection[toronto_grouped_intersection_columns]

# Group column by Neighborhood
toronto_grouped_top = toronto_grouped_intersection.groupby('Neighborhood').mean().reset_index()
toronto_grouped_top

Unnamed: 0,Neighborhood,Coffee Shop,Ice Cream Shop,Pizza Place,Gym,Italian Restaurant,Deli / Bodega,Bagel Shop,Wine Shop,Bar,Mexican Restaurant
0,Harbourfront,0.063075,0.005779,0.030619,0.009808,0.013503,0.009807,0.001723,0.000688,0.007991,0.010083
1,Lawrence Manor,0.065583,0.015518,0.031787,0.004697,0.019946,0.002273,0.0,0.0,0.007108,0.009367
2,Parkwoods,0.070725,0.018485,0.030372,0.015229,0.008921,0.006578,0.0,0.0,0.007644,0.005332
3,Regent Park,0.072782,0.007766,0.064346,0.008766,0.016415,0.003538,0.002019,0.0,0.007092,0.004394
4,Victoria Village,0.054502,0.008497,0.06986,0.0,0.008497,0.011364,0.0,0.0,0.012529,0.0


In [28]:
toronto_merged = toronto_grouped_top.copy()

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(tr_df.set_index('Neighborhood'), on='Neighborhood')
toronto_merged_columns = list(toronto_merged.columns[11:13]) + [toronto_merged.columns[0]] + list(toronto_merged.columns[13:15]) +  list(toronto_merged.columns[1:10])
toronto_merged = toronto_merged[toronto_merged_columns]

toronto_merged # check the last columns! 

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Coffee Shop,Ice Cream Shop,Pizza Place,Gym,Italian Restaurant,Deli / Bodega,Bagel Shop,Wine Shop,Bar
0,M5A,Downtown Toronto,Harbourfront,43.63951,-79.38316,0.063075,0.005779,0.030619,0.009808,0.013503,0.009807,0.001723,0.000688,0.007991
1,M6A,North York,Lawrence Manor,43.72294,-79.43116,0.065583,0.015518,0.031787,0.004697,0.019946,0.002273,0.0,0.0,0.007108
2,M3A,North York,Parkwoods,43.686575,-79.409993,0.070725,0.018485,0.030372,0.015229,0.008921,0.006578,0.0,0.0,0.007644
3,M5A,Downtown Toronto,Regent Park,43.66069,-79.36031,0.072782,0.007766,0.064346,0.008766,0.016415,0.003538,0.002019,0.0,0.007092
4,M4A,North York,Victoria Village,43.73154,-79.31428,0.054502,0.008497,0.06986,0.0,0.008497,0.011364,0.0,0.0,0.012529


#### Get coordinates of Toronto

In [16]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


#### Create a map to visualize the top 5 neighborhoods

In [29]:
# Create map
top_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# Add markers to the map
markers_colors = []
for lat, lon, poi in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label).add_to(top_map)
       
top_map

## 5. Conclusion

In this case study, the input data set was scraped from different sources and was spatially enriched with information about access to different venues. It demonstrates how data science can be employed to one aspect of the real estate industry. Buying a home is a personal process, however a lot of decisions are heavily influenced by location. As shown in this study, Python libraries such as Pandas can be used for visualization and statistical analysis, and libraries such as the Foursquare API for Python for spatial analysis. The methods adopted in this study can be applied to any other real estate market to build other recommendation engines.

## References

[1] https://geo.nyu.edu/catalog/nyu_2451_34572

[2] https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

---

<h4>Author:  <a href="https://br.linkedin.com/in/henrique-mand">Henrique Mandt</a></h4>
<p><a href="https://br.linkedin.com/in/henrique-mand">Henrique Mandt</a>, Civil Engineer and Consultant with a track record of developing soluctions that substantially increases operational efficiency, mitigate risks and maximize benefits for teams, investors and customers. He is a Data Scientist enthusiast with interest in data mining, machine learning and spatial statistical modelling.</p>

---