# 3. The Battle of Neighbourhoods (Week 2)

## 1. Introduction/Business problem

Moving around cities can be quite troublesome, mostly when we do not know if the new neighbourhood have everything we indeed want. To make our lifes easier, we'll try to sort this out. We'll be using the Foursquare API to propose the most suitable neighbourhood to move considering we are moving around Toronto, Canada.

For this challenge, we'll cluster both neighbourhoods to figure out their similarities and find out which are most similar considering the nearby venues. Once this is done, we'll be able to recommend a new place to start over!
In this matter, the purpose of this work is to help people understand there is a way to analyze and mathematically decide where is the best place to start over considering your own needs.

Regarding the dataset, we'll levarage the information we have already retrieved in the previous assignments and classes (see bellow). But this time, we'll point out which are the Venues in which we'd like to see in our new neighbourhood, as well as give them a mark (from 1 to 5). This will allow us to use the recommendation techniques to point us out in the right direction.

Toronto info - https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M


Please take note this exercise is only fictional. So we'll dig up the Toronto dataset and randomly choose a neighbourhood to be our "starting home". Once this is selected, we'll mark the venues nearby and move forward to explore the New York data.

## 2. Toronto data

As usual for any data science project, first we'll have to take a deep dive in our dataset and clean it up in a way we are indeed able to work. The dataset has been retrieved from Wikipedia.

The final data we are looking for must contain: Borough, Neighbourhood, latitude and longitude. Since the Wikipedia only have info for Borough and Neighbourhoods (as well as Postal code, but won't be used), we'll have to merge them with the required geospatial data from "http://cocl.us/Geospatial_data". Withouth further due, let's get into it.

In [1]:
# importing relevant libraries
import requests
import pandas as pd
import numpy as np
!pip install lxml
import lxml



In [2]:
wiki = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
wiki_pg = requests.get(wiki)

# Part of the data isn't assigned/has no info, so let's get rid of it
raw_data = pd.read_html(wiki_pg.content, header=0)[0]
df = raw_data[raw_data.Borough != 'Not assigned']

df

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [3]:
# Retriving Geospatial info
geo_info = 'http://cocl.us/Geospatial_data'
df_geo=pd.read_csv(geo_info)

print(df_geo.shape)
df_geo.head()

(103, 3)


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [133]:
# Joining the df together
toronto_df = df.join(df_geo.set_index('Postal Code'), on='Postal Code')

# Droping the postal code info
toronto_df = toronto_df.drop('Postal Code',1).reset_index(drop=True)
print(toronto_df.shape)
toronto_df.head()

(103, 4)


Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,North York,Parkwoods,43.753259,-79.329656
1,North York,Victoria Village,43.725882,-79.315572
2,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


Looks like we have a clean dataset in our hands!

Let's take a quick look at how split apart the neighbourhoos are from each other. And, to make our lifes easier, I've choosen the first row to be our starting home - North York, Parkwoods. So I'll plot it out as well. 

In [5]:
# Getting Toronto's geospatial info to set as a starting point
!conda install -c conda-forge geocoder --yes
!pip install geopy
import geocoder
from geopy.geocoders import Nominatim 

address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.5
  latest version: 4.9.0

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.



In [6]:
import folium

# create map of Toronto using latitude and longitude values
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(
    toronto_df['Latitude'],
    toronto_df['Longitude'],
    toronto_df['Borough'],
    toronto_df['Neighbourhood']):
    
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(toronto_map)  
    
folium.CircleMarker(
        [43.753259,-79.329656],
        radius=7,
        popup='Parkwoods - Starting Home',
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.5,
        ).add_to(toronto_map)

toronto_map

With this map it's pretty easy to see where we live at. Now, let's get the venues info and start to explore our preferences!

## 3. Nearby venues with Foursquare API

The Foursquare API has all the venues, tips, photos and a lot more from nearby places and it's constantly updated making it very reliable. Also, to our luck, the API is quite simple to work with, that's why we have choosen it (besides the requirements for this capstone).


In [7]:
# Setting some required info to make our lifes easier
CLIENT_ID = 'EXEJYF4TBRRBGBEQCGSLDSWSSU40QFEV12BUPA00X4KWQCMH'
CLIENT_SECRET = 'ACVFLPQ1OQU1AVW4NN2O0WQB4R14AHW0KQAWGLC1CDHKANPE'
VERSION = '20180604'
radius = 500
LIMIT = 100

Now, we'll define a function to retrieve all the nearby venues for our dataset.

In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
toronto_venues = getNearbyVenues(names=toronto_df['Neighbourhood'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude']
                                  )
print(toronto_venues.shape)
toronto_venues.head()

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [10]:
print('There are {} venues categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 273 venues categories.


Plenty of venues to go through! But looks like our home only have 2 venues nearby: Park and Food & Drink Shop. This must be the reason we are leaving! There's nothing around!

We'll leverage this info by freely selecting what we want to see in nearby our new home address.

## 4. Recommendation system

A recommendation system works by taking a look at what a certain users preferences might be and them matching it with the characteristics of your library. In our case, the user preference will be the venues in which we would like to see nearby our homes and the library are all the other possibilities of neighbourhoods to move in.

But to make this kind of analysis possible, we have to set the dataframe to point out specifically what is indeed nearby every neighbourhood. So we'll set our table to have only the neighbourhood name and every venue nearby it in a single row. The mean value of venues will be used as we want to use that as a weight of how many venues of that type exists in certain neighbourhood.

In [11]:
# One hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# Move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

# Grouping every neighbourhood together
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now we set a score stating what we think it's most important considering all the possible categories. This means, we'd like to see the following venues nearby our new home:

Park  
Food & Drink Shop  
Pub  
Gym 
Mexican Restaurant  
Yoga Studio  
Café  
Electronics Store  
Beer Store  
Gift Shop  
Sushi Restaurant  
Supermarket  
Drug Store  

Let's creat a new table and set a mark from 1 (good to have nearby) to 5 (must have nearby) for each of those venues.

In [66]:
userInput = [
            {'Venue':'Park', 'rating':3},
            {'Venue':'Food & Drink Shop', 'rating':5},
            {'Venue':'Pub', 'rating':4},
            {'Venue':"Gym", 'rating':4},
            {'Venue':'Mexican Restaurant', 'rating':3},
            {'Venue':'Yoga', 'rating':4},
            {'Venue':'Café', 'rating':2},
            {'Venue':'Electronic Store', 'rating':1},
            {'Venue':'Beer Store', 'rating':5},
            {'Venue':'Gift Shop', 'rating':1},
            {'Venue':'Sushi Restaurant', 'rating':3},
            {'Venue':'Supermarket', 'rating':5},
            {'Venue':'Drug Store', 'rating':5}
] 
pref = pd.DataFrame(userInput)
pref

Unnamed: 0,Venue,rating
0,Park,3
1,Food & Drink Shop,5
2,Pub,4
3,Gym,4
4,Mexican Restaurant,3
5,Yoga,4
6,Café,2
7,Electronic Store,1
8,Beer Store,5
9,Gift Shop,1


The next line of code is meant to retrieve all the unique categories in place our ratings on them (where possible).

In [152]:
# Getting the required info and making sure everything is 0
new_home_pref = toronto_grouped.iloc[0,:]*0
nhp = new_home_pref.to_frame().reset_index().drop(0,1)

cond = [
    nhp['index']=='Park',
    nhp['index']=='Food & Drink Shop',
    nhp['index']=='Pub',
    nhp['index']=='Gym',
    nhp['index']=='Mexican Restaurant',
    nhp['index']=='Yoga',
    nhp['index']=='Café',
    nhp['index']=='Electronic Store',
    nhp['index']=='Beer Store',
    nhp['index']=='Gift Shope',
    nhp['index']=='Sushi Restaurant',
    nhp['index']=='Supermarket',
    nhp['index']=='Drug Store']

values = [3,5,4,4,3,4,2,1,5,1,3,5,5]

nhp['User Input']=np.select(cond,values)
nhp = nhp.drop(0,0)
nhp = nhp['User Input']
nhp.head(50)

1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
15    0
16    0
17    0
18    0
19    0
20    0
21    0
22    0
23    0
24    0
25    0
26    0
27    0
28    0
29    0
30    0
31    5
32    0
33    0
34    0
35    0
36    0
37    0
38    0
39    0
40    0
41    0
42    0
43    0
44    0
45    0
46    0
47    0
48    0
49    0
50    2
Name: User Input, dtype: int64

That listing cointains the users preferences weight. Also known as user profile. We can use it to recommend a neighbourhood that satisfy our preferences. It's a bit difficult to see with this many 0, but the values are there! (row 31 = 5 and row 50 =2, i.e.)

We'll take a second look at the toronto table and deconstruct it in a way that better suits us.

In [153]:
# Retrieving the venues info from the toronto_grouped df
venues = toronto_grouped.set_index(toronto_grouped['Neighbourhood']).drop('Neighbourhood',1)
venues.head()

Unnamed: 0_level_0,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


With ours desired profile and the complete list of Neighbourhoods and their venues in hand, we're going to take the weighted average of every venues based on our profile and recommend the top 5 venues that most satisfy it.

In [154]:
#Multiply the venues by the weights and then take the weighted average
recommendation = ((venues*nhp).sum(axis=1))/(nhp.sum())
recomm = recommendation.sort_values(ascending=False)
recomm.head() 

Neighbourhood
York Mills, Silver Hills                                         0.0
York Mills West                                                  0.0
Dufferin, Dovercourt Village                                     0.0
East Toronto, Broadview North (Old East York)                    0.0
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood    0.0
dtype: float64

I'm not quite sure why there isn't a number being presented, but after checking it by hand, the output does match, so I'll take it.

The above five Neighbourhoods are the ones that most fit our desired needs! Which means, the chosen one will be "York Mills, Silver Hills"! Let's take a look at the map and see where this is.

In [155]:
new_home = 'York Mills, Silver Hills'

geoloc = Nominatim(user_agent="toronto_explorer")
loc = geolocator.geocode(new_home)
nh_lat = loc.latitude
nh_long = loc.longitude

In [156]:
import folium

# create map of Toronto using latitude and longitude values
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(
    toronto_df['Latitude'],
    toronto_df['Longitude'],
    toronto_df['Borough'],
    toronto_df['Neighbourhood']):
    
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(toronto_map)  
    
folium.CircleMarker(
        [43.753259,-79.329656],
        radius=7,
        popup='Parkwoods - Starting Home',
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.5,
        ).add_to(toronto_map)

folium.CircleMarker(
        [nh_lat,nh_long],
        radius=8,
        popup=new_home,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.5,
        ).add_to(toronto_map)

toronto_map

Peculiarly enough, it doesn't looks like we moved far away from were we started. Since I haven't set any 'distance from start', this might as well happen.

## 5. Conclusions

In this project we were able to go through a simple recommendation system. We worked around the dataset from Toronto to explore their neighbourhoods and, with the Foursquare API, we were able to indentify the nearby venues for all of them. We proceed inputing our profile and preferences in the recommendation system and, finally, figure out which would be the most suitable location for us to be.

Unfortunately, there were some numerical issues in which I couldn't figure out, so it would be interesting to proceed with caution if you decide to use this project as a model for your own recommendation system. Please take note that, from over 270 venues, only 13 inputs were placed and this kind of system is highly dependable on how many inputs it has. So for the next time, we might be interested in proceed with a full 270 venues input and take a look at what changes.

Either way, hopefully, we'll be very happy living in York Mills!