<h1>Capstone Project - The Battle of Neighborhoods</h1>

<h2>EASE RELOCATION BETWEEN PLACES WITH ML</h2>

<h3>INTRODUCTION</h3>
Every year thousands of people relocate to new cities and finding the right place to live is not an easy task.
Each area/zipcode has its own characteristic and finding the right one can be a challenge.
This project seeks to find a solution to a this common problem: define a recommending system that will find the best suitable area/zipcode in a pre-defined city based on user input.
The recommending system can cater to both individuals and/or relocation agencies.
For this project I will use Toronto as relocation city.

<h3>DATA</h3>

For this project I will use Toronto as relocation city.
In order to accomplish the goal multiple data sets are required:
- Toronto area/zipcode which can be scraped from Wikipedia
- Latitudes and longitudes of Toronto area/zipcode which can be extracted using geocoder.
- The number of recommended venues in a specific category for each area/zipcode which can be fetched from the Forsquare API.
- A user input that includes:
    - Importance/rating for each venue category.
    - Workplace address.
    - Importance/rating for distance from Workplace.


<h3>METHODOLOGY</h3>
After extracting latitudes and longitudes for area/zipcode, I can call the Foursquare API to retreive the number of recommended venues for each category in each area/zipcode.
To reduce the complexity of the data set I can apply the following constraints:
<ul>
<li>limit the venue search to a radius of 1000m, this was chosen because 1000m is a reasonable walking distance.
<li>categorise venues using Foursquare high-level venue categories:
    <ul><li>Arts & Entertainment (4d4b7104d754a06370d81259)</li>
    <li>College & University (4d4b7105d754a06372d81259)</li>
    <li>Event (4d4b7105d754a06373d81259)</li>
    <li>Food (4d4b7105d754a06374d81259)</li>
    <li>Nightlife Spot (4d4b7105d754a06376d81259)</li>
    <li>Outdoors & Recreation (4d4b7105d754a06377d81259)</li>
    <li>Professional & Other Places (4d4b7105d754a06375d81259)</li>
    <li>Residence (4e67e38e036454776db1fb3a)</li>
    <li>Shop & Service (4d4b7105d754a06378d81259)</li>
    <li>Travel & Transport (4d4b7105d754a06379d81259)</li>
</ul></ul>    

After all the data cleaning, transformation and normalisation is done, the result will be a dataframe that lists all the area/zipcode in Toronto with the relative score for each venue category.

Next, I can process the user input data.

First I extract the longitude and latitude of the workplace address using geocoder.
Then I calculate the distance between the workplace and the nieghborhood and append this to the main dataframe.

Now all the required data are included in the dataframe and I can create the desired content based recommending system.

In [9]:
#Import Libraries
import pandas as pd
import numpy as np
import requests
import json
from pandas.io.json import json_normalize

from bs4 import BeautifulSoup

!pip install pgeocode
import pgeocode

!pip install geopy
!pip install geocoder
import geopy
import geocoder
from geopy import Nominatim

import matplotlib.cm as cm
import matplotlib.colors as colors
import numpy as np

!pip install folium
import folium

from sklearn import preprocessing



In [10]:
#scrape the page for postal codes in Toronto
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

df=pd.read_html(url)

#extract only the first table on the page
df=df[0]
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


In [11]:
#Data Cleanup

#removing Not assigned Borough

df=df[df.Borough !='Not assigned']


#reindexing the dataframe

df=df.reset_index()

df=df.drop(["index", "Neighborhood"], axis=1)

df

Unnamed: 0,Postal Code,Borough
0,M3A,North York
1,M4A,North York
2,M5A,Downtown Toronto
3,M6A,North York
4,M7A,Downtown Toronto
5,M9A,Etobicoke
6,M1B,Scarborough
7,M3B,North York
8,M4B,East York
9,M5B,Downtown Toronto


In [12]:
#Fetch zipcodes coordinates

#extract an array of Zipcodes from the dataframe
postal_code=df["Postal Code"].values

#get the coordinates for all zipcodes
nomi = pgeocode.Nominatim('ca')
geocoord= nomi.query_postal_code(postal_code)
geocoord


Unnamed: 0,postal_code,country code,place_name,state_name,state_code,county_name,county_code,community_name,community_code,latitude,longitude,accuracy
0,M3A,CA,North York (York Heights / Victoria Village / ...,Ontario,ON,North York,,,,43.7545,-79.3300,1.0
1,M4A,CA,North York (Sweeney Park / Wigmore Park),Ontario,ON,,,,,43.7276,-79.3148,6.0
2,M5A,CA,Downtown Toronto (Regent Park / Port of Toronto),Ontario,ON,Toronto,8133394.0,,,43.6555,-79.3626,6.0
3,M6A,CA,North York (Lawrence Manor / Lawrence Heights),Ontario,ON,North York,,,,43.7223,-79.4504,6.0
4,M7A,CA,Queen's Park Ontario Provincial Government,Ontario,ON,,,,,43.6641,-79.3889,
5,M9A,CA,Etobicoke (Islington Avenue),Ontario,ON,Etobicoke,,,,43.6662,-79.5282,6.0
6,M1B,CA,Scarborough (Malvern / Rouge River),Ontario,ON,Scarborough,,,,43.8113,-79.1930,6.0
7,M3B,CA,Don Mills North,Ontario,ON,Don Mills,,,,43.7450,-79.3590,4.0
8,M4B,CA,East York (Parkview Hill / Woodbine Gardens),Ontario,ON,East York,,,,43.7063,-79.3094,6.0
9,M5B,CA,Downtown Toronto (Ryerson),Ontario,ON,Toronto,8133394.0,,,43.6572,-79.3783,6.0


In [13]:
#add coordinates columns to dataframe

df["Latitude"]=geocoord["latitude"]
df["Longitude"]=geocoord["longitude"]
df=df.dropna(subset=['Latitude', 'Longitude'])
df=df.reset_index()
df=df.drop(["index"], axis=1)
df

Unnamed: 0,Postal Code,Borough,Latitude,Longitude
0,M3A,North York,43.7545,-79.3300
1,M4A,North York,43.7276,-79.3148
2,M5A,Downtown Toronto,43.6555,-79.3626
3,M6A,North York,43.7223,-79.4504
4,M7A,Downtown Toronto,43.6641,-79.3889
5,M9A,Etobicoke,43.6662,-79.5282
6,M1B,Scarborough,43.8113,-79.1930
7,M3B,North York,43.7450,-79.3590
8,M4B,East York,43.7063,-79.3094
9,M5B,Downtown Toronto,43.6572,-79.3783


In [14]:
#Creating a backup df
#df_backup=df

#Restore Backup

#df=df_backup



In [33]:
#get coordinates of Toronto
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="CA_explorer")
location = geolocator.geocode(address)
toronto_latitude = location.latitude
toronto_longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(toronto_latitude, toronto_longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [16]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=10)

# add zipcodes markers to map
for lat, lng, postal_code, borough in zip(df['Latitude'], df['Longitude'], df['Postal Code'], df['Borough']):
    label = '{}, {}'.format(borough, postal_code)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [17]:
#Define Foursquare credentials
CLIENT_ID = 'PA4GRTGBA3RGSLMSPK4H3KPZTXGSH1KWSXVCPSK1MDYCBN4G' # your Foursquare ID
CLIENT_SECRET = 'TAOSNIHF5PDRFWWU00EMBI3FPL2NOZCHCS2INSXT20VTRFHO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PA4GRTGBA3RGSLMSPK4H3KPZTXGSH1KWSXVCPSK1MDYCBN4G
CLIENT_SECRET:TAOSNIHF5PDRFWWU00EMBI3FPL2NOZCHCS2INSXT20VTRFHO


In [18]:
#Create a data frame of Venues High Level Categories

categories= {'Category': ["Arts & Entertainment", "College & University","Event","Food","Nightlife Spot","Outdoors & Recreation","Professional & Other Places","Residence","Shop & Service","Travel & Transport"], 'ID': ["4d4b7104d754a06370d81259", "4d4b7105d754a06372d81259","4d4b7105d754a06373d81259","4d4b7105d754a06374d81259","4d4b7105d754a06376d81259","4d4b7105d754a06377d81259","4d4b7105d754a06375d81259","4e67e38e036454776db1fb3a","4d4b7105d754a06378d81259","4d4b7105d754a06379d81259"]}
categories=pd.DataFrame(data=categories)
categories.astype(str)
categories


Unnamed: 0,Category,ID
0,Arts & Entertainment,4d4b7104d754a06370d81259
1,College & University,4d4b7105d754a06372d81259
2,Event,4d4b7105d754a06373d81259
3,Food,4d4b7105d754a06374d81259
4,Nightlife Spot,4d4b7105d754a06376d81259
5,Outdoors & Recreation,4d4b7105d754a06377d81259
6,Professional & Other Places,4d4b7105d754a06375d81259
7,Residence,4e67e38e036454776db1fb3a
8,Shop & Service,4d4b7105d754a06378d81259
9,Travel & Transport,4d4b7105d754a06379d81259


In [19]:
radius= 1000
LIMIT= 1 #added limit to reduce memory usage

for cat, ids in zip(categories["Category"],categories["ID"]):
    dict={"{}".format(cat) :[]}
    for lat, lng, postal_code, borough in zip(df['Latitude'], df['Longitude'], df['Postal Code'], df['Borough']):

        url= 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            lat,
            lng,
            VERSION,
            radius,
            LIMIT,
            ids)
        
        
        results = requests.get(url).json()
        if (results['response']) == {}:
            results = 0
        else:
            results = results['response']['totalResults']
        
        dict["{}".format(cat)].append(results)
        print("SUCCESS! Numnber of venues for ", cat, "in ", postal_code," - ", borough, " is ", results)
        
    df=df.join(pd.DataFrame(dict))
    print("")
    print(cat, " added to df")
    print("")

df

SUCCESS! Numnber of venues for  Arts & Entertainment in  M3A  -  North York  is  1
SUCCESS! Numnber of venues for  Arts & Entertainment in  M4A  -  North York  is  3
SUCCESS! Numnber of venues for  Arts & Entertainment in  M5A  -  Downtown Toronto  is  27
SUCCESS! Numnber of venues for  Arts & Entertainment in  M6A  -  North York  is  10
SUCCESS! Numnber of venues for  Arts & Entertainment in  M7A  -  Downtown Toronto  is  38
SUCCESS! Numnber of venues for  Arts & Entertainment in  M9A  -  Etobicoke  is  2
SUCCESS! Numnber of venues for  Arts & Entertainment in  M1B  -  Scarborough  is  9
SUCCESS! Numnber of venues for  Arts & Entertainment in  M3B  -  North York  is  2
SUCCESS! Numnber of venues for  Arts & Entertainment in  M4B  -  East York  is  1
SUCCESS! Numnber of venues for  Arts & Entertainment in  M5B  -  Downtown Toronto  is  31
SUCCESS! Numnber of venues for  Arts & Entertainment in  M6B  -  North York  is  5
SUCCESS! Numnber of venues for  Arts & Entertainment in  M9B  -  E

SUCCESS! Numnber of venues for  Arts & Entertainment in  M8X  -  Etobicoke  is  7
SUCCESS! Numnber of venues for  Arts & Entertainment in  M4Y  -  Downtown Toronto  is  34
SUCCESS! Numnber of venues for  Arts & Entertainment in  M7Y  -  East Toronto  is  4
SUCCESS! Numnber of venues for  Arts & Entertainment in  M8Y  -  Etobicoke  is  2
SUCCESS! Numnber of venues for  Arts & Entertainment in  M8Z  -  Etobicoke  is  1

Arts & Entertainment  added to df

SUCCESS! Numnber of venues for  College & University in  M3A  -  North York  is  2
SUCCESS! Numnber of venues for  College & University in  M4A  -  North York  is  2
SUCCESS! Numnber of venues for  College & University in  M5A  -  Downtown Toronto  is  22
SUCCESS! Numnber of venues for  College & University in  M6A  -  North York  is  5
SUCCESS! Numnber of venues for  College & University in  M7A  -  Downtown Toronto  is  112
SUCCESS! Numnber of venues for  College & University in  M9A  -  Etobicoke  is  0
SUCCESS! Numnber of venues for 

SUCCESS! Numnber of venues for  College & University in  M8W  -  Etobicoke  is  5
SUCCESS! Numnber of venues for  College & University in  M9W  -  Etobicoke  is  1
SUCCESS! Numnber of venues for  College & University in  M1X  -  Scarborough  is  0
SUCCESS! Numnber of venues for  College & University in  M4X  -  Downtown Toronto  is  11
SUCCESS! Numnber of venues for  College & University in  M5X  -  Downtown Toronto  is  64
SUCCESS! Numnber of venues for  College & University in  M8X  -  Etobicoke  is  0
SUCCESS! Numnber of venues for  College & University in  M4Y  -  Downtown Toronto  is  83
SUCCESS! Numnber of venues for  College & University in  M7Y  -  East Toronto  is  4
SUCCESS! Numnber of venues for  College & University in  M8Y  -  Etobicoke  is  0
SUCCESS! Numnber of venues for  College & University in  M8Z  -  Etobicoke  is  0

College & University  added to df

SUCCESS! Numnber of venues for  Event in  M3A  -  North York  is  0
SUCCESS! Numnber of venues for  Event in  M4A  

SUCCESS! Numnber of venues for  Food in  M6A  -  North York  is  51
SUCCESS! Numnber of venues for  Food in  M7A  -  Downtown Toronto  is  239
SUCCESS! Numnber of venues for  Food in  M9A  -  Etobicoke  is  4
SUCCESS! Numnber of venues for  Food in  M1B  -  Scarborough  is  2
SUCCESS! Numnber of venues for  Food in  M3B  -  North York  is  7
SUCCESS! Numnber of venues for  Food in  M4B  -  East York  is  10
SUCCESS! Numnber of venues for  Food in  M5B  -  Downtown Toronto  is  149
SUCCESS! Numnber of venues for  Food in  M6B  -  North York  is  24
SUCCESS! Numnber of venues for  Food in  M9B  -  Etobicoke  is  6
SUCCESS! Numnber of venues for  Food in  M1C  -  Scarborough  is  2
SUCCESS! Numnber of venues for  Food in  M3C  -  North York  is  19
SUCCESS! Numnber of venues for  Food in  M4C  -  East York  is  26
SUCCESS! Numnber of venues for  Food in  M5C  -  Downtown Toronto  is  213
SUCCESS! Numnber of venues for  Food in  M6C  -  York  is  24
SUCCESS! Numnber of venues for  Food in 

SUCCESS! Numnber of venues for  Nightlife Spot in  M9C  -  Etobicoke  is  2
SUCCESS! Numnber of venues for  Nightlife Spot in  M1E  -  Scarborough  is  4
SUCCESS! Numnber of venues for  Nightlife Spot in  M4E  -  East Toronto  is  14
SUCCESS! Numnber of venues for  Nightlife Spot in  M5E  -  Downtown Toronto  is  102
SUCCESS! Numnber of venues for  Nightlife Spot in  M6E  -  York  is  11
SUCCESS! Numnber of venues for  Nightlife Spot in  M1G  -  Scarborough  is  0
SUCCESS! Numnber of venues for  Nightlife Spot in  M4G  -  East York  is  4
SUCCESS! Numnber of venues for  Nightlife Spot in  M5G  -  Downtown Toronto  is  96
SUCCESS! Numnber of venues for  Nightlife Spot in  M6G  -  Downtown Toronto  is  34
SUCCESS! Numnber of venues for  Nightlife Spot in  M1H  -  Scarborough  is  2
SUCCESS! Numnber of venues for  Nightlife Spot in  M2H  -  North York  is  2
SUCCESS! Numnber of venues for  Nightlife Spot in  M3H  -  North York  is  5
SUCCESS! Numnber of venues for  Nightlife Spot in  M4H 

SUCCESS! Numnber of venues for  Outdoors & Recreation in  M9C  -  Etobicoke  is  3
SUCCESS! Numnber of venues for  Outdoors & Recreation in  M1E  -  Scarborough  is  5
SUCCESS! Numnber of venues for  Outdoors & Recreation in  M4E  -  East Toronto  is  11
SUCCESS! Numnber of venues for  Outdoors & Recreation in  M5E  -  Downtown Toronto  is  79
SUCCESS! Numnber of venues for  Outdoors & Recreation in  M6E  -  York  is  4
SUCCESS! Numnber of venues for  Outdoors & Recreation in  M1G  -  Scarborough  is  4
SUCCESS! Numnber of venues for  Outdoors & Recreation in  M4G  -  East York  is  5
SUCCESS! Numnber of venues for  Outdoors & Recreation in  M5G  -  Downtown Toronto  is  61
SUCCESS! Numnber of venues for  Outdoors & Recreation in  M6G  -  Downtown Toronto  is  22
SUCCESS! Numnber of venues for  Outdoors & Recreation in  M1H  -  Scarborough  is  2
SUCCESS! Numnber of venues for  Outdoors & Recreation in  M2H  -  North York  is  2
SUCCESS! Numnber of venues for  Outdoors & Recreation in 

SUCCESS! Numnber of venues for  Professional & Other Places in  M6B  -  North York  is  15
SUCCESS! Numnber of venues for  Professional & Other Places in  M9B  -  Etobicoke  is  6
SUCCESS! Numnber of venues for  Professional & Other Places in  M1C  -  Scarborough  is  2
SUCCESS! Numnber of venues for  Professional & Other Places in  M3C  -  North York  is  30
SUCCESS! Numnber of venues for  Professional & Other Places in  M4C  -  East York  is  19
SUCCESS! Numnber of venues for  Professional & Other Places in  M5C  -  Downtown Toronto  is  0
SUCCESS! Numnber of venues for  Professional & Other Places in  M6C  -  York  is  12
SUCCESS! Numnber of venues for  Professional & Other Places in  M9C  -  Etobicoke  is  10
SUCCESS! Numnber of venues for  Professional & Other Places in  M1E  -  Scarborough  is  14
SUCCESS! Numnber of venues for  Professional & Other Places in  M4E  -  East Toronto  is  28
SUCCESS! Numnber of venues for  Professional & Other Places in  M5E  -  Downtown Toronto  is

SUCCESS! Numnber of venues for  Professional & Other Places in  M7Y  -  East Toronto  is  42
SUCCESS! Numnber of venues for  Professional & Other Places in  M8Y  -  Etobicoke  is  7
SUCCESS! Numnber of venues for  Professional & Other Places in  M8Z  -  Etobicoke  is  13

Professional & Other Places  added to df

SUCCESS! Numnber of venues for  Residence in  M3A  -  North York  is  5
SUCCESS! Numnber of venues for  Residence in  M4A  -  North York  is  2
SUCCESS! Numnber of venues for  Residence in  M5A  -  Downtown Toronto  is  16
SUCCESS! Numnber of venues for  Residence in  M6A  -  North York  is  0
SUCCESS! Numnber of venues for  Residence in  M7A  -  Downtown Toronto  is  41
SUCCESS! Numnber of venues for  Residence in  M9A  -  Etobicoke  is  0
SUCCESS! Numnber of venues for  Residence in  M1B  -  Scarborough  is  0
SUCCESS! Numnber of venues for  Residence in  M3B  -  North York  is  1
SUCCESS! Numnber of venues for  Residence in  M4B  -  East York  is  0
SUCCESS! Numnber of venu

SUCCESS! Numnber of venues for  Shop & Service in  M9A  -  Etobicoke  is  12
SUCCESS! Numnber of venues for  Shop & Service in  M1B  -  Scarborough  is  3
SUCCESS! Numnber of venues for  Shop & Service in  M3B  -  North York  is  8
SUCCESS! Numnber of venues for  Shop & Service in  M4B  -  East York  is  8
SUCCESS! Numnber of venues for  Shop & Service in  M5B  -  Downtown Toronto  is  102
SUCCESS! Numnber of venues for  Shop & Service in  M6B  -  North York  is  41
SUCCESS! Numnber of venues for  Shop & Service in  M9B  -  Etobicoke  is  4
SUCCESS! Numnber of venues for  Shop & Service in  M1C  -  Scarborough  is  0
SUCCESS! Numnber of venues for  Shop & Service in  M3C  -  North York  is  36
SUCCESS! Numnber of venues for  Shop & Service in  M4C  -  East York  is  36
SUCCESS! Numnber of venues for  Shop & Service in  M5C  -  Downtown Toronto  is  104
SUCCESS! Numnber of venues for  Shop & Service in  M6C  -  York  is  12
SUCCESS! Numnber of venues for  Shop & Service in  M9C  -  Etob

SUCCESS! Numnber of venues for  Travel & Transport in  M1B  -  Scarborough  is  1
SUCCESS! Numnber of venues for  Travel & Transport in  M3B  -  North York  is  1
SUCCESS! Numnber of venues for  Travel & Transport in  M4B  -  East York  is  1
SUCCESS! Numnber of venues for  Travel & Transport in  M5B  -  Downtown Toronto  is  73
SUCCESS! Numnber of venues for  Travel & Transport in  M6B  -  North York  is  4
SUCCESS! Numnber of venues for  Travel & Transport in  M9B  -  Etobicoke  is  3
SUCCESS! Numnber of venues for  Travel & Transport in  M1C  -  Scarborough  is  3
SUCCESS! Numnber of venues for  Travel & Transport in  M3C  -  North York  is  1
SUCCESS! Numnber of venues for  Travel & Transport in  M4C  -  East York  is  4
SUCCESS! Numnber of venues for  Travel & Transport in  M5C  -  Downtown Toronto  is  83
SUCCESS! Numnber of venues for  Travel & Transport in  M6C  -  York  is  3
SUCCESS! Numnber of venues for  Travel & Transport in  M9C  -  Etobicoke  is  6
SUCCESS! Numnber of ve

Unnamed: 0,Postal Code,Borough,Latitude,Longitude,Arts & Entertainment,College & University,Event,Food,Nightlife Spot,Outdoors & Recreation,Professional & Other Places,Residence,Shop & Service,Travel & Transport
0,M3A,North York,43.7545,-79.3300,1,2,0,5,1,3,10,5,7,8
1,M4A,North York,43.7276,-79.3148,3,2,0,5,1,0,12,2,8,1
2,M5A,Downtown Toronto,43.6555,-79.3626,27,22,0,95,28,38,0,16,58,27
3,M6A,North York,43.7223,-79.4504,10,5,0,51,4,8,21,0,115,13
4,M7A,Downtown Toronto,43.6641,-79.3889,38,112,5,239,78,67,84,41,119,79
5,M9A,Etobicoke,43.6662,-79.5282,2,0,0,4,0,3,4,0,12,1
6,M1B,Scarborough,43.8113,-79.1930,9,0,0,2,0,2,2,0,3,1
7,M3B,North York,43.7450,-79.3590,2,5,1,7,3,4,14,1,8,1
8,M4B,East York,43.7063,-79.3094,1,0,0,10,4,3,9,0,8,1
9,M5B,Downtown Toronto,43.6572,-79.3783,31,75,2,149,79,72,96,40,102,73


In [294]:
#restore backup
df=pd.read_csv("api_call_result.csv")
df.drop(["Unnamed: 0"], axis=1, inplace=True)
df

Unnamed: 0,Postal Code,Borough,Latitude,Longitude,Arts & Entertainment,College & University,Event,Food,Nightlife Spot,Outdoors & Recreation,Professional & Other Places,Residence,Shop & Service,Travel & Transport
0,M3A,North York,43.7545,-79.3300,1,2,0,5,1,3,10,5,7,8
1,M4A,North York,43.7276,-79.3148,3,2,0,5,1,0,12,2,8,1
2,M5A,Downtown Toronto,43.6555,-79.3626,27,22,0,95,28,38,0,16,58,27
3,M6A,North York,43.7223,-79.4504,10,5,0,51,4,8,21,0,115,13
4,M7A,Downtown Toronto,43.6641,-79.3889,38,112,5,239,78,67,84,41,119,79
5,M9A,Etobicoke,43.6662,-79.5282,2,0,0,4,0,3,4,0,12,1
6,M1B,Scarborough,43.8113,-79.1930,9,0,0,2,0,2,2,0,3,1
7,M3B,North York,43.7450,-79.3590,2,5,1,7,3,4,14,1,8,1
8,M4B,East York,43.7063,-79.3094,1,0,0,10,4,3,9,0,8,1
9,M5B,Downtown Toronto,43.6572,-79.3783,31,75,2,149,79,72,96,40,102,73


<h3>Informations about the user</h3>

<h5>Q1: provide your workplace address address


Q2: On a scale from 1 to 10 what kind of aspects & venue categories are important to you when thinking of an area/zipcode to relocate to?
<ul><li>Arts & Entertainment</li>
    <li>College & University</li>
    <li>Event Venues</li>
    <li>Food</li>
    <li>Nightlife Spot</li>
    <li>Outdoors & Recreation</li>
    <li>Professional & Other Places</li>
    <li>Residence</li>
    <li>Shop & Service</li>
    <li>Travel & Transport </li>
    <li>Distance from Workplace</li>
</ul></h5>

In [295]:
#Q1
work_address="120 Bremner Blvd #1600, Toronto"

#Q2
user_input= {'Category': ["Arts & Entertainment", "College & University","Event","Food","Nightlife Spot","Outdoors & Recreation","Professional & Other Places","Residence","Shop & Service","Travel & Transport","Distance from Workplace"], 'Rating': [7,1,7,8,6,6,3,5,5,4,9]}

In [296]:
#Fetch workplace coordinates

geolocator = Nominatim(user_agent="Toronto_explorer")
location = geolocator.geocode(work_address)
workplace_latitude = location.latitude
workplace_longitude = location.longitude
print('The geograpical coordinate of the Workplace are {}, {}.'.format(workplace_latitude, workplace_longitude))

The geograpical coordinate of the Workplace are 43.6429936, -79.3829397.


In [297]:
#Calculate distance between Workplace and zipcodes/areas

from geopy.distance import distance
dict={"Distance from Workplace" :[]}
for lat, lng, postal_code, borough in zip(df['Latitude'], df['Longitude'], df['Postal Code'], df['Borough']):
    zip_cord=(lat, lng)
    work_cord=(workplace_latitude, workplace_longitude)
    calc = distance(zip_cord, work_cord).m
    dict["Distance from Workplace"].append(calc)

distance=pd.DataFrame(dict)
#append the distances to the main Data Frame
df=df.join(distance)
df

Unnamed: 0,Postal Code,Borough,Latitude,Longitude,Arts & Entertainment,College & University,Event,Food,Nightlife Spot,Outdoors & Recreation,Professional & Other Places,Residence,Shop & Service,Travel & Transport,Distance from Workplace
0,M3A,North York,43.7545,-79.3300,1,2,0,5,1,3,10,5,7,8,13103.463106
1,M4A,North York,43.7276,-79.3148,3,2,0,5,1,0,12,2,8,1,10888.055182
2,M5A,Downtown Toronto,43.6555,-79.3626,27,22,0,95,28,38,0,16,58,27,2150.233892
3,M6A,North York,43.7223,-79.4504,10,5,0,51,4,8,21,0,115,13,10355.151986
4,M7A,Downtown Toronto,43.6641,-79.3889,38,112,5,239,78,67,84,41,119,79,2393.823444
5,M9A,Etobicoke,43.6662,-79.5282,2,0,0,4,0,3,4,0,12,1,11998.457763
6,M1B,Scarborough,43.8113,-79.1930,9,0,0,2,0,2,2,0,3,1,24164.050293
7,M3B,North York,43.7450,-79.3590,2,5,1,7,3,4,14,1,8,1,11496.698784
8,M4B,East York,43.7063,-79.3094,1,0,0,10,4,3,9,0,8,1,9200.198753
9,M5B,Downtown Toronto,43.6572,-79.3783,31,75,2,149,79,72,96,40,102,73,1622.184197


In [298]:
#Copy of Dataframe for Normalization

x=df.set_index(df['Postal Code']).drop(['Postal Code',"Longitude","Latitude","Borough"], axis=1).set_index(df['Postal Code'])
x

Unnamed: 0_level_0,Arts & Entertainment,College & University,Event,Food,Nightlife Spot,Outdoors & Recreation,Professional & Other Places,Residence,Shop & Service,Travel & Transport,Distance from Workplace
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
M3A,1,2,0,5,1,3,10,5,7,8,13103.463106
M4A,3,2,0,5,1,0,12,2,8,1,10888.055182
M5A,27,22,0,95,28,38,0,16,58,27,2150.233892
M6A,10,5,0,51,4,8,21,0,115,13,10355.151986
M7A,38,112,5,239,78,67,84,41,119,79,2393.823444
M9A,2,0,0,4,0,3,4,0,12,1,11998.457763
M1B,9,0,0,2,0,2,2,0,3,1,24164.050293
M3B,2,5,1,7,3,4,14,1,8,1,11496.698784
M4B,1,0,0,10,4,3,9,0,8,1,9200.198753
M5B,31,75,2,149,79,72,96,40,102,73,1622.184197


In [299]:
#Data Normalization

from sklearn import preprocessing

min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
x = pd.DataFrame(x_scaled)
#re-add columns name
x = pd.DataFrame(data=x.values, columns=df.columns[4:15]).set_index(df['Postal Code'])
x

  return self.partial_fit(X, y)


Unnamed: 0_level_0,Arts & Entertainment,College & University,Event,Food,Nightlife Spot,Outdoors & Recreation,Professional & Other Places,Residence,Shop & Service,Travel & Transport,Distance from Workplace
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
M3A,0.014493,0.017857,0.000000,0.020921,0.006849,0.028037,0.092593,0.104167,0.051471,0.081633,0.506587
M4A,0.043478,0.017857,0.000000,0.020921,0.006849,0.000000,0.111111,0.041667,0.058824,0.010204,0.418581
M5A,0.391304,0.196429,0.000000,0.397490,0.191781,0.355140,0.000000,0.333333,0.426471,0.275510,0.071475
M6A,0.144928,0.044643,0.000000,0.213389,0.027397,0.074766,0.194444,0.000000,0.845588,0.132653,0.397412
M7A,0.550725,1.000000,0.333333,1.000000,0.534247,0.626168,0.777778,0.854167,0.875000,0.806122,0.081152
M9A,0.028986,0.000000,0.000000,0.016736,0.000000,0.028037,0.037037,0.000000,0.088235,0.010204,0.462691
M1B,0.130435,0.000000,0.000000,0.008368,0.000000,0.018692,0.018519,0.000000,0.022059,0.010204,0.945964
M3B,0.028986,0.044643,0.066667,0.029289,0.020548,0.037383,0.129630,0.020833,0.058824,0.010204,0.442759
M4B,0.014493,0.000000,0.000000,0.041841,0.027397,0.028037,0.083333,0.000000,0.058824,0.010204,0.351532
M5B,0.449275,0.669643,0.133333,0.623431,0.541096,0.672897,0.888889,0.833333,0.750000,0.744898,0.050498


In [301]:
#calculate the reciprocal of Distance from Workplace

x['Distance from Workplace'] = x['Distance from Workplace'].apply(lambda x: abs(x - 1))
x

Unnamed: 0_level_0,Arts & Entertainment,College & University,Event,Food,Nightlife Spot,Outdoors & Recreation,Professional & Other Places,Residence,Shop & Service,Travel & Transport,Distance from Workplace
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
M3A,0.014493,0.017857,0.000000,0.020921,0.006849,0.028037,0.092593,0.104167,0.051471,0.081633,0.506587
M4A,0.043478,0.017857,0.000000,0.020921,0.006849,0.000000,0.111111,0.041667,0.058824,0.010204,0.418581
M5A,0.391304,0.196429,0.000000,0.397490,0.191781,0.355140,0.000000,0.333333,0.426471,0.275510,0.071475
M6A,0.144928,0.044643,0.000000,0.213389,0.027397,0.074766,0.194444,0.000000,0.845588,0.132653,0.397412
M7A,0.550725,1.000000,0.333333,1.000000,0.534247,0.626168,0.777778,0.854167,0.875000,0.806122,0.081152
M9A,0.028986,0.000000,0.000000,0.016736,0.000000,0.028037,0.037037,0.000000,0.088235,0.010204,0.462691
M1B,0.130435,0.000000,0.000000,0.008368,0.000000,0.018692,0.018519,0.000000,0.022059,0.010204,0.945964
M3B,0.028986,0.044643,0.066667,0.029289,0.020548,0.037383,0.129630,0.020833,0.058824,0.010204,0.442759
M4B,0.014493,0.000000,0.000000,0.041841,0.027397,0.028037,0.083333,0.000000,0.058824,0.010204,0.351532
M5B,0.449275,0.669643,0.133333,0.623431,0.541096,0.672897,0.888889,0.833333,0.750000,0.744898,0.050498


In [302]:
#Create User Profile 
userProfile=pd.DataFrame(user_input)
userProfile.set_index("Category", inplace=True)

#Normalize Rating
userProfile_scaled = min_max_scaler.fit_transform(userProfile.values)
userProfile["Rating"]= userProfile_scaled

userProfile




Unnamed: 0_level_0,Rating
Category,Unnamed: 1_level_1
Arts & Entertainment,0.75
College & University,0.0
Event,0.75
Food,0.875
Nightlife Spot,0.625
Outdoors & Recreation,0.625
Professional & Other Places,0.25
Residence,0.5
Shop & Service,0.5
Travel & Transport,0.375


In [308]:
#Calculate Recommendation Score
recommendationScore_df = x*userProfile["Rating"]
recommendationScore_df["Recommendation Score"]=recommendationScore_df.sum(axis=1)
recommendationScore_df=recommendationScore_df.reset_index()
recommendationScore_df

Unnamed: 0,Postal Code,Arts & Entertainment,College & University,Event,Food,Nightlife Spot,Outdoors & Recreation,Professional & Other Places,Residence,Shop & Service,Travel & Transport,Distance from Workplace,Recommendation Score
0,M3A,0.010870,0.0,0.00,0.018305,0.004281,0.017523,0.023148,0.052083,0.025735,0.030612,0.506587,0.689146
1,M4A,0.032609,0.0,0.00,0.018305,0.004281,0.000000,0.027778,0.020833,0.029412,0.003827,0.418581,0.555626
2,M5A,0.293478,0.0,0.00,0.347803,0.119863,0.221963,0.000000,0.166667,0.213235,0.103316,0.071475,1.537801
3,M6A,0.108696,0.0,0.00,0.186715,0.017123,0.046729,0.048611,0.000000,0.422794,0.049745,0.397412,1.277825
4,M7A,0.413043,0.0,0.25,0.875000,0.333904,0.391355,0.194444,0.427083,0.437500,0.302296,0.081152,3.705778
5,M9A,0.021739,0.0,0.00,0.014644,0.000000,0.017523,0.009259,0.000000,0.044118,0.003827,0.462691,0.573802
6,M1B,0.097826,0.0,0.00,0.007322,0.000000,0.011682,0.004630,0.000000,0.011029,0.003827,0.945964,1.082280
7,M3B,0.021739,0.0,0.05,0.025628,0.012842,0.023364,0.032407,0.010417,0.029412,0.003827,0.442759,0.652395
8,M4B,0.010870,0.0,0.00,0.036611,0.017123,0.017523,0.020833,0.000000,0.029412,0.003827,0.351532,0.487731
9,M5B,0.336957,0.0,0.10,0.545502,0.338185,0.420561,0.222222,0.416667,0.375000,0.279337,0.050498,3.084928


In [315]:
#Preparing the Recommendation Table
recommendationTable= pd.merge(df[['Postal Code',"Borough","Latitude","Longitude"]],recommendationScore_df[['Postal Code',"Recommendation Score"]], on='Postal Code').sort_values(by="Recommendation Score",ascending=False)
recommendationTable

Unnamed: 0,Postal Code,Borough,Latitude,Longitude,Recommendation Score
30,M5H,Downtown Toronto,43.6496,-79.3833,4.875008
96,M5X,Downtown Toronto,43.6492,-79.3823,4.854897
48,M5L,Downtown Toronto,43.6492,-79.3823,4.854897
42,M5K,Downtown Toronto,43.6469,-79.3823,4.836827
91,M5W,Downtown Toronto,43.6437,-79.3787,4.430335
20,M5E,Downtown Toronto,43.6456,-79.3754,3.784004
4,M7A,Downtown Toronto,43.6641,-79.3889,3.705778
98,M4Y,Downtown Toronto,43.6656,-79.3830,3.478949
24,M5G,Downtown Toronto,43.6564,-79.3860,3.202910
86,M5V,Downtown Toronto,43.6404,-79.3995,3.197743


In [336]:
#Creating Backup

recommendationTable.to_csv("recommendationTable.csv")

In [348]:
#Top10 recommended Areas and assign Label for Recommendation

top10=recommendationTable.head(10)
top10["Recommendation"]=''
for j,i in enumerate(top10["Recommendation Score"]):
    print(i,j)
    if i >= 4:
        top10["Recommendation"].iloc[j] = 'Highly Recommended'
    if  i < 4:
        top10["Recommendation"].iloc[j] = 'Recommended'
        
top10

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


4.875007673125491 0
4.8548974107054566 1
4.8548974107054566 2
4.836827144017554 3
4.430335021046143 4
3.7840035703043755 5
3.705777944165767 6
3.4789491886476074 7
3.2029101520538505 8
3.197743115551762 9


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,Postal Code,Borough,Latitude,Longitude,Recommendation Score,Recommendation
30,M5H,Downtown Toronto,43.6496,-79.3833,4.875008,Highly Recommended
96,M5X,Downtown Toronto,43.6492,-79.3823,4.854897,Highly Recommended
48,M5L,Downtown Toronto,43.6492,-79.3823,4.854897,Highly Recommended
42,M5K,Downtown Toronto,43.6469,-79.3823,4.836827,Highly Recommended
91,M5W,Downtown Toronto,43.6437,-79.3787,4.430335,Highly Recommended
20,M5E,Downtown Toronto,43.6456,-79.3754,3.784004,Recommended
4,M7A,Downtown Toronto,43.6641,-79.3889,3.705778,Recommended
98,M4Y,Downtown Toronto,43.6656,-79.383,3.478949,Recommended
24,M5G,Downtown Toronto,43.6564,-79.386,3.20291,Recommended
86,M5V,Downtown Toronto,43.6404,-79.3995,3.197743,Recommended


In [372]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=13)

# add zipcodes markers to map
#Red Markers for "Recommended" Areas
for lat, lng, postal_code, borough, recommendation in zip(top10['Latitude'], top10['Longitude'], top10['Postal Code'], top10['Borough'], top10['Recommendation']):
    if recommendation == "Recommended":
        label = '{}, {}, {}'.format(recommendation, borough, postal_code)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='red',
            fill=True,
            fill_color='#f55742',
            fill_opacity=0.7,
            parse_html=False).add_to(map_toronto)  

#Green Markers for "Highly Recommended" Areas
    else:
            label = '{}, {}, {}'.format(recommendation, borough, postal_code)
            label = folium.Popup(label, parse_html=True)
            folium.CircleMarker(
                [lat, lng],
                radius=5,
                popup=label,
                color='green',
                fill=True,
                fill_color='#7bf542',
                fill_opacity=0.7,
                parse_html=False).add_to(map_toronto)     
map_toronto

<h3>RESULTS</h3>
Based on the User Profile preferences and workplace location the Downtown Toronto area is the most suitable to relocate.
Zipcodes in this areas have different degree of recommendation and they are marked on the map in different colors:
<uli>
    <li>Red = "Recommeded" Zipcodes</li>
        <li>Green = "Highly Recommended" Zipcodes </li></uli>

<h3>DISCUSSION AND POSSIBLE IMPROVEMENTS</h3>
Foursquare data don't offer the best dataset to work with but it provides enough to build a very basic recommendation system.
Improvement in the generated recommendation could be achieved by:
    <li>Use a wider array of data points to generate the user profile (e.g. lower level venue category preference, Housing accomodation preferences, etc...)</li>
        <li>Add a weighting system based on ratings, tips and likes for the scores of each category.</li>
        <li>Combine other sources (e.g. city data on number of residents).</li></uli>
    