## Recommender System 

The stakeholder wants us to build a Recommendation engine to recommend the users for travel, food, adventure, parks etc based on analysing the neighbourhoods and ratings given by user to previously visited spots.


### Table of contents

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="#ref1">Introduction </a></li>
        <li><a href="#ref2">Data</a></li>
        <ul>
         <li><a href="#ref3">Acquiring the Data</a></li>
        <li><a href="#ref4">Preprocessing</a></li>
        </ul>
        <li><a href="#ref5">Methodology</a></li>
            <ul>
          <li><a href="#ref6">Content-Based Filtering</a></li>
        </ul>
        <li><a href="#ref7">Results</a></li>
        <li><a href="#ref8">Discussion section</a></li>
        <li><a href="#ref9">Conclusion section</a></li>
    </ol>
</div>
<br>

# 1. Introduction 

We have to build recommender system which recommends tourist travel locations based on his previous ratings.
We know that tourist whenever travels to a location, he tries to find best spots around in that specific location to explore. Keeping this in mind we have to recommend tourist a neighbourhood, with venues where he can visit.

# 2. Data

### 2.1  Acquiring the data
1. We will be working Neighbourhoods of toronto so first of all we have to get this data. We will get this data from a wikipedia page  
       1.1 We based on these neighbourhood we will get all the nearby venues using four square api
       1.2 We will also get ratings of these venues using four square api
2. We will be needing user's previous ratings, for which we will take some demo values from our above dataframe.

### 2.2  Preprocessing

#### Importing libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
from bs4 import BeautifulSoup
import requests # library to handle requests
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
import geopy.geocoders # convert an address into latitude and longitude values

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries are imported.')

Libraries are imported.


#### We are building recommender system for toronto for now, which can be scaled up later on for all the places

In [2]:
# getting all the neigbourhood data of toronto
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(website_url,'lxml')
#print(soup.prettify())
My_table = soup.find('table',{'class':'wikitable sortable'})
table1=""
for tr in My_table.find_all('tr'):
    row1=""
    for tds in tr.find_all('td'):
        row1=row1+","+tds.text
    table1=table1+row1[1:]
print(table1[:300]) #printing first 300 characters

M1A,Not assigned,Not assigned
M2A,Not assigned,Not assigned
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,Harbourfront
M5A,Downtown Toronto,Regent Park
M6A,North York,Lawrence Heights
M6A,North York,Lawrence Manor
M7A,Queen's Park,Not assigned
M8A,Not assigned,Not ass


#### Writing our data into as .csv file for further use

In [8]:
file=open("toronto.csv","wb")
#file.write(bytes(headers,encoding="ascii",errors="ignore"))
file.write(bytes(table1,encoding="ascii",errors="ignore"))

8773

#### Converting into dataframe and assigning columnnames

In [9]:
import pandas as pd
df = pd.read_csv('toronto.csv',header=None)
df.columns=["Postalcode","Borough","Neighbourhood"]
df.head(10)

Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


### Data Cleaning


#### Only processing the cells that have an assigned borough. Ignoring the cells with a borough that is Not assigned. Droping row where borough is "Not assigned"

In [11]:
# Get names of indexes for which column Borough has value "Not assigned"
indexNames = df[ df['Borough'] =='Not assigned'].index
# Delete these row indexes from dataFrame
df.drop(indexNames , inplace=True)

#### If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough


In [12]:
df.loc[df['Neighbourhood'] =='Not assigned' , 'Neighbourhood'] = df['Borough']

#### rows will be same postalcode will combined into one row with the neighborhoods separated with a comma

In [13]:
result = df.groupby(['Postalcode','Borough'], sort=False).agg( ', '.join)
df_new=result.reset_index()
df_new.head(15)

Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


In [14]:
df_new.shape

(103, 3)

#### We will be using a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data


In [15]:
!wget -q -O 'Toronto_long_lat_data.csv'  http://cocl.us/Geospatial_data
df_lon_lat = pd.read_csv('Toronto_long_lat_data.csv')
df_lon_lat.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [16]:
df_lon_lat.columns=['Postalcode','Latitude','Longitude']

In [17]:
df_toronto = pd.merge(df_new,
                 df_lon_lat[['Postalcode','Latitude', 'Longitude']],
                 on='Postalcode')
df_toronto.head(10)

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937


### Create a Map of Toronto City (with its Postal Codes' Regions)

In [18]:
from geopy.geocoders import Nominatim
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
latitude_toronto = location.latitude
longitude_toronto = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude_toronto, longitude_toronto))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [19]:
# for the city Toronto, latitude and longtitude are 
toronto_latitude = 43.6932; toronto_longitude = -79.3832
map_toronto = folium.Map(location = [toronto_latitude, toronto_longitude], zoom_start = 11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)  
    

map_toronto

# 3. Methodology

### Since our Api calls are limited we will focus on a particular borough for now

In [20]:
North_York_data = df_toronto[df_toronto['Borough'] == 'North York']
North_York_data = North_York_data.reset_index()
North_York_data.drop('index', axis=1, inplace=True)
North_York_data

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
3,M3B,North York,Don Mills North,43.745906,-79.352188
4,M6B,North York,Glencairn,43.709577,-79.445073
5,M3C,North York,"Flemingdon Park, Don Mills South",43.7259,-79.340923
6,M2H,North York,Hillcrest Village,43.803762,-79.363452
7,M3H,North York,"Bathurst Manor, Downsview North, Wilson Heights",43.754328,-79.442259
8,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556
9,M3J,North York,"Northwood Park, York University",43.76798,-79.487262


#### Define Foursquare Credentials and Version


In [21]:
# @hidden_cell
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '' # Foursquare API version
# defining radius and limit of venues to get
radius=500
LIMIT=100

## Getting nearby venues 

In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'],
            v['venue']['id'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'id',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [23]:
toronto_venues = getNearbyVenues(names=North_York_data['Neighbourhood'],
                                   latitudes=North_York_data['Latitude'],
                                   longitudes=North_York_data['Longitude']
                                  )


In [24]:
toronto_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,id,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,4e8d9dcdd5fbbbb6b3003c7b,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,KFC,4e6696b6d16433b9ffff47c3,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,4cb11e2075ebb60cd1c4caad,43.751974,-79.333114,Food & Drink Shop
3,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,53622a89498ed84d6853265e,43.755402,-79.333741,Bus Stop
4,Victoria Village,43.725882,-79.315572,Victoria Village Arena,4c633acb86b6be9a61268e34,43.723481,-79.315635,Hockey Arena


### Getting ratings of these venues

In [25]:
rating_list=[]
result_list=[]
c=0
for i in toronto_venues['id']:
    if c<10:
        #print(i)
        url1='https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(i, CLIENT_ID, CLIENT_SECRET, VERSION)
        results1 = requests.get(url1).json()#['response']['venue']['rating']
        
        result_list.append(results1)
        try:
            rating_list.append(results1['response']['venue']['rating'])
        except:
            rating_list.append("not rated")
        #print("\n")
    else:
        break

In [26]:
lol = pd.DataFrame(rating_list)

In [27]:
lol.columns=['ratings']

### saving to csv, just in case as number of api calls are limted

In [28]:
lol.to_csv('ratings.csv')

### add ratings column to our toronto_venues df

In [29]:
df_ra = pd.read_csv('ratings.csv')

df_ra.head(10)

Unnamed: 0.1,Unnamed: 0,ratings
0,0,7.0
1,1,6.0
2,2,not rated
3,3,not rated
4,4,7.4
5,5,6.3
6,6,6.2
7,7,not rated
8,8,not rated
9,9,7.8


In [30]:
#toronto_venues['rating']=rating_list
toronto_venues['rating']=df_ra['ratings']


In [31]:
toronto_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,id,Venue Latitude,Venue Longitude,Venue Category,rating
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,4e8d9dcdd5fbbbb6b3003c7b,43.751976,-79.33214,Park,7.0
1,Parkwoods,43.753259,-79.329656,KFC,4e6696b6d16433b9ffff47c3,43.754387,-79.333021,Fast Food Restaurant,6.0
2,Parkwoods,43.753259,-79.329656,Variety Store,4cb11e2075ebb60cd1c4caad,43.751974,-79.333114,Food & Drink Shop,not rated
3,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,53622a89498ed84d6853265e,43.755402,-79.333741,Bus Stop,not rated
4,Victoria Village,43.725882,-79.315572,Victoria Village Arena,4c633acb86b6be9a61268e34,43.723481,-79.315635,Hockey Arena,7.4


#### removing Rows where venue is not rated

In [32]:
final_data = toronto_venues[toronto_venues['rating'] != 'not rated']
final_data.reset_index(inplace=True)
final_data=final_data.drop(['index'],axis=1)
final_data

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,id,Venue Latitude,Venue Longitude,Venue Category,rating
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,4e8d9dcdd5fbbbb6b3003c7b,43.751976,-79.33214,Park,7.0
1,Parkwoods,43.753259,-79.329656,KFC,4e6696b6d16433b9ffff47c3,43.754387,-79.333021,Fast Food Restaurant,6.0
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,4c633acb86b6be9a61268e34,43.723481,-79.315635,Hockey Arena,7.4
3,Victoria Village,43.725882,-79.315572,Portugril,4f3ecce6e4b0587016b6f30d,43.725819,-79.312785,Portuguese Restaurant,6.3
4,Victoria Village,43.725882,-79.315572,Tim Hortons,4bbe904a85fbb713420d7167,43.725517,-79.313103,Coffee Shop,6.2
5,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,Roots,4b16e8b6f964a52051bf23e3,43.718476,-79.466869,Boutique,7.8
6,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,Lac Vien Vietnamese Restaurant,4ccc5aebee23a14370591ea8,43.721259,-79.468472,Vietnamese Restaurant,7.9
7,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,Kitchen Stuff Plus (Clearance Outlet),4b12e300f964a520299023e3,43.719096,-79.462675,Furniture / Home Store,7.0
8,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,Orfus Road Shopping Outlets,4bf5b47594b2a593c623acee,43.719045,-79.460849,Clothing Store,7.0
9,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,Tim Hortons,4fa4ae40e4b02443361f27c4,43.719427,-79.467995,Coffee Shop,6.7


##  Analysing Each Neighborhood

In [33]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot.head()

Unnamed: 0,Accessories Store,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Store,Bike Shop,Boutique,Bridal Shop,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Stop,Butcher,Cafeteria,Café,Candy Store,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Electronics Store,Empanada Restaurant,Event Space,Fast Food Restaurant,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,General Entertainment,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hockey Arena,Home Service,Hotel,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Liquor Store,Lounge,Luggage Store,Massage Studio,Mediterranean Restaurant,Metro Station,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Moving Target,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salon / Barbershop,Sandwich Place,Shoe Store,Shopping Mall,Smoothie Shop,Spa,Sporting Goods Shop,Steakhouse,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store,Neighbourhood
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Parkwoods
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Parkwoods
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Parkwoods
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Parkwoods
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Victoria Village


In [34]:
Neig_grouped=toronto_onehot.groupby(['Neighbourhood'], sort=False).sum()

## This is data showing which neigbourhood has what and will be used for recommendation later

In [35]:
Neig_grouped

Unnamed: 0_level_0,Accessories Store,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Store,Bike Shop,Boutique,Bridal Shop,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Stop,Butcher,Cafeteria,Café,Candy Store,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Electronics Store,Empanada Restaurant,Event Space,Fast Food Restaurant,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,General Entertainment,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hockey Arena,Home Service,Hotel,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Liquor Store,Lounge,Luggage Store,Massage Studio,Mediterranean Restaurant,Metro Station,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Moving Target,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salon / Barbershop,Sandwich Place,Shoe Store,Shopping Mall,Smoothie Shop,Spa,Sporting Goods Shop,Steakhouse,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1
Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"Lawrence Heights, Lawrence Manor",1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,3,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0
Don Mills North,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Glencairn,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"Flemingdon Park, Don Mills South",0,0,0,0,2,0,0,0,0,0,0,2,1,0,0,0,0,0,0,0,0,1,0,0,1,1,2,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,2,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0
Hillcrest Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"Bathurst Manor, Downsview North, Wilson Heights",0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0
"Fairview, Henry Farm, Oriole",0,0,1,0,1,0,2,1,0,1,0,0,0,1,0,0,1,1,0,0,0,0,1,0,0,11,4,0,0,1,1,2,0,0,0,0,1,0,0,6,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,3,1,0,1,1,1,1,1,0,0,1,0,1,1,1,0,0,1,1
"Northwood Park, York University",0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [36]:
Neig_grouped.shape

(24, 104)

<a id="ref3"></a>
# Content-Based recommendation system

Now, let's take a look at how to implement __Content-Based__ or __Item-Item recommendation systems__. This technique attempts to figure out what a user's favourite aspects of an item is, and then recommends items that present those aspects. In our case, we're going to try to figure out the input's favorite venues from the places he has visited and ratings given.

Let's begin by creating an input user to recommend places to:

## getting User data (creating dummy data for user)

In [37]:
Central_toronto_data = df_toronto[df_toronto['Borough'] == 'Central Toronto']
Central_toronto_data = Central_toronto_data.reset_index()
Central_toronto_data.drop('index', axis=1, inplace=True)
Central_toronto_data

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
1,M5N,Central Toronto,Roselawn,43.711695,-79.416936
2,M4P,Central Toronto,Davisville North,43.712751,-79.390197
3,M5P,Central Toronto,"Forest Hill North, Forest Hill West",43.696948,-79.411307
4,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
5,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678
6,M4S,Central Toronto,Davisville,43.704324,-79.38879
7,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
8,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049


#### Get User rated venues

In [38]:
user_rated_venues = getNearbyVenues(names=Central_toronto_data['Neighbourhood'],
                                   latitudes=Central_toronto_data['Latitude'],
                                   longitudes=Central_toronto_data['Longitude']
                                  )

In [39]:
user_rated_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,id,Venue Latitude,Venue Longitude,Venue Category
0,Lawrence Park,43.72802,-79.38879,Lawrence Park Ravine,50e6da19e4b0d8a78a0e9794,43.726963,-79.394382,Park
1,Lawrence Park,43.72802,-79.38879,Dim Sum Deluxe,57813d57498e0991ffe4b720,43.726953,-79.39426,Dim Sum Restaurant
2,Lawrence Park,43.72802,-79.38879,Zodiac Swim School,5082ef77e4b0a7491cf7b022,43.728532,-79.38286,Swim School
3,Lawrence Park,43.72802,-79.38879,TTC Bus #162 - Lawrence-Donway,50ed9da8e4b081eabee12672,43.728026,-79.382805,Bus Line
4,Roselawn,43.711695,-79.416936,Rosalind's Garden Oasis,4e6e176c45dd293273b74e3c,43.712189,-79.411978,Garden


In [40]:
rating_list=[]
result_list=[]
c=0
for i in user_rated_venues['id']:
    if c<10:
        #print(i)
        url1='https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(i, CLIENT_ID, CLIENT_SECRET, VERSION)
        results1 = requests.get(url1).json()#['response']['venue']['rating']
        
        result_list.append(results1)
        try:
            rating_list.append(results1['response']['venue']['rating'])
        except:
            rating_list.append("not rated")
        #print("\n")
    else:
        break

In [41]:
user_rated_venues['rating']=rating_list

In [42]:
final_user_rated = user_rated_venues[user_rated_venues['rating'] != 'not rated']
final_user_rated.reset_index(inplace=True)
final_user_rated=final_user_rated.drop(['index'],axis=1)

In [43]:
final_user_rated.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,id,Venue Latitude,Venue Longitude,Venue Category,rating
0,Lawrence Park,43.72802,-79.38879,Lawrence Park Ravine,50e6da19e4b0d8a78a0e9794,43.726963,-79.394382,Park,8.5
1,Davisville North,43.712751,-79.390197,Sherwood Park,4ba011c2f964a5204a5737e3,43.716551,-79.387776,Park,9.2
2,Davisville North,43.712751,-79.390197,Summerhill Market North,4e8e73c30cd6209590ae7be4,43.715499,-79.392881,Food & Drink Shop,8.0
3,Davisville North,43.712751,-79.390197,Homeway Restaurant & Brunch,4adb2fd3f964a520c42421e3,43.712641,-79.391557,Breakfast Spot,6.8
4,Davisville North,43.712751,-79.390197,Best Western Roehampton Hotel & Suites,4b7810c3f964a52030b42ee3,43.708878,-79.39088,Hotel,6.4


In [44]:
# one hot encoding
final_user_rated_onehot = pd.get_dummies(final_user_rated[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
final_user_rated_onehot['Neighbourhood'] = final_user_rated['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [final_user_rated_onehot.columns[-1]] + list(final_user_rated_onehot.columns[:-1])
final_user_rated_onehot.head()

Unnamed: 0,American Restaurant,BBQ Joint,Bagel Shop,Breakfast Spot,Brewery,Burger Joint,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Dessert Shop,Diner,Farmers Market,Fast Food Restaurant,Food & Drink Shop,Fried Chicken Joint,Gourmet Shop,Greek Restaurant,Gym,History Museum,Hotel,Indian Restaurant,Italian Restaurant,Jewish Restaurant,Liquor Store,Mexican Restaurant,Park,Pharmacy,Pizza Place,Pub,Restaurant,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Spa,Sporting Goods Shop,Sports Bar,Supermarket,Sushi Restaurant,Thai Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yoga Studio,Neighbourhood
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Lawrence Park
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Davisville North
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Davisville North
3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Davisville North
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Davisville North


In [45]:
final_grouped=final_user_rated_onehot.groupby(['Neighbourhood'], sort=False).sum()

In [46]:
final_grouped.head(10)

Unnamed: 0_level_0,American Restaurant,BBQ Joint,Bagel Shop,Breakfast Spot,Brewery,Burger Joint,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Dessert Shop,Diner,Farmers Market,Fast Food Restaurant,Food & Drink Shop,Fried Chicken Joint,Gourmet Shop,Greek Restaurant,Gym,History Museum,Hotel,Indian Restaurant,Italian Restaurant,Jewish Restaurant,Liquor Store,Mexican Restaurant,Park,Pharmacy,Pizza Place,Pub,Restaurant,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Spa,Sporting Goods Shop,Sports Bar,Supermarket,Sushi Restaurant,Thai Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yoga Studio
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1
Lawrence Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Davisville North,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
"Forest Hill North, Forest Hill West",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
North Toronto West,0,0,0,0,0,0,0,1,1,2,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,0,1,2,0,0,0,0,0,0,0,0,1
"The Annex, North Midtown, Yorkville",1,1,0,0,0,1,3,0,0,3,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,1,1,0,1,1,2,1,0,0,3,0,0,0,0,0,0,0,0,0,1,0,0
Davisville,0,0,0,0,1,1,2,0,0,2,0,0,3,1,1,0,0,1,1,1,1,0,0,1,2,0,0,0,1,1,2,0,1,0,3,2,0,0,0,0,2,1,1,0,0,0,0
"Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West",1,0,1,0,0,0,0,0,0,2,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,0,0,0,1,1,1,0,0,0,0,1,0


### This is user matrix used for recommendation

In [47]:
#Resetting the index to avoid future issues
usermatrix = final_grouped.reset_index(drop=True)
usermatrix

Unnamed: 0,American Restaurant,BBQ Joint,Bagel Shop,Breakfast Spot,Brewery,Burger Joint,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Dessert Shop,Diner,Farmers Market,Fast Food Restaurant,Food & Drink Shop,Fried Chicken Joint,Gourmet Shop,Greek Restaurant,Gym,History Museum,Hotel,Indian Restaurant,Italian Restaurant,Jewish Restaurant,Liquor Store,Mexican Restaurant,Park,Pharmacy,Pizza Place,Pub,Restaurant,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Spa,Sporting Goods Shop,Sports Bar,Supermarket,Sushi Restaurant,Thai Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
3,0,0,0,0,0,0,0,1,1,2,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,0,1,2,0,0,0,0,0,0,0,0,1
4,1,1,0,0,0,1,3,0,0,3,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,1,1,0,1,1,2,1,0,0,3,0,0,0,0,0,0,0,0,0,1,0,0
5,0,0,0,0,1,1,2,0,0,2,0,0,3,1,1,0,0,1,1,1,1,0,0,1,2,0,0,0,1,1,2,0,1,0,3,2,0,0,0,0,2,1,1,0,0,0,0
6,1,0,1,0,0,0,0,0,0,2,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,0,0,0,1,1,1,0,0,0,0,1,0


### getting user ratings

In [48]:
rating_df=final_user_rated[['Neighbourhood','rating']]

In [49]:
rating_df.head(15)

Unnamed: 0,Neighbourhood,rating
0,Lawrence Park,8.5
1,Davisville North,9.2
2,Davisville North,8.0
3,Davisville North,6.8
4,Davisville North,6.4
5,Davisville North,6.0
6,Davisville North,5.8
7,"Forest Hill North, Forest Hill West",8.2
8,North Toronto West,7.9
9,North Toronto West,7.7


In [50]:
rating_df.dtypes

Neighbourhood    object
rating           object
dtype: object

In [51]:
rating_df['rating']=rating_df['rating'].astype('float')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [52]:
rating_df.dtypes

Neighbourhood     object
rating           float64
dtype: object

In [53]:
rating_grouped=rating_df.groupby('Neighbourhood', sort=False)['rating'].mean()

In [54]:
rating_df_new=pd.DataFrame(rating_grouped)

In [55]:
rating_df_new=rating_df_new.reset_index()

In [56]:
rating_df_new['rating']

0    8.500000
1    7.033333
2    8.200000
3    6.900000
4    7.326087
5    7.031250
6    6.561538
Name: rating, dtype: float64

In [57]:
l1 = pd.DataFrame(rating_df_new['rating'])
l1.to_csv('Userratings_agg.csv')

## Making User profile using usermatrix and his ratings

In [58]:
#Dot produt to get weights
userProfile = usermatrix.transpose().dot(rating_df_new['rating'])
#The user profile
userProfile

American Restaurant              13.887625
BBQ Joint                         7.326087
Bagel Shop                        6.561538
Breakfast Spot                    7.033333
Brewery                           7.031250
Burger Joint                     21.390670
Café                             36.040761
Chinese Restaurant                6.900000
Clothing Store                    6.900000
Coffee Shop                      62.963838
Convenience Store                 6.561538
Cosmetics Shop                    7.326087
Dessert Shop                     27.993750
Diner                            13.931250
Farmers Market                    7.031250
Fast Food Restaurant              6.900000
Food & Drink Shop                 7.033333
Fried Chicken Joint              13.592788
Gourmet Shop                      7.031250
Greek Restaurant                  7.031250
Gym                               7.031250
History Museum                    7.326087
Hotel                             7.033333
Indian Rest

#### We can observe above that user is more inclied toward coffee shops, cafe, park etc

## Creating Recommendation table for user using user Profile and all the neigbourhood data

In [59]:
recommendationTable_df = ((Neig_grouped*userProfile).sum(axis=1))/(userProfile.sum())
recommendationTable_df.head()

Neighbourhood
Parkwoods                           0.069090
Victoria Village                    0.154879
Lawrence Heights, Lawrence Manor    0.164000
Don Mills North                     0.056820
Glencairn                           0.079363
dtype: float64

In [60]:
#Sort our recommendations in descending order
recommendationTable_df = recommendationTable_df.sort_values(ascending=False)
#Just a peek at the values
top3=pd.DataFrame(recommendationTable_df)
recommendationTable_df.head()

Neighbourhood
Fairview, Henry Farm, Oriole                       0.771194
Willowdale South                                   0.734735
Bedford Park, Lawrence Manor East                  0.643655
Flemingdon Park, Don Mills South                   0.476793
Bathurst Manor, Downsview North, Wilson Heights    0.464521
dtype: float64

### We will select top three Neighbourhood that have been recommended to user and further recommend particular venues to visit

In [61]:
# Lets select top 3 Neihbourhood
top3=top3.reset_index()
top3.columns=['Neighbourhood','recommendation']
top3=top3.head(3)
top3

Unnamed: 0,Neighbourhood,recommendation
0,"Fairview, Henry Farm, Oriole",0.771194
1,Willowdale South,0.734735
2,"Bedford Park, Lawrence Manor East",0.643655


Now here's the recommendation table!

#### Recommendation for Fairview, Henry Farm, Oriole

In [62]:
f=final_data[final_data['Neighbourhood']=='Fairview, Henry Farm, Oriole']
f.sort_values(by=['rating'],ascending=False).head(3)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,id,Venue Latitude,Venue Longitude,Venue Category,rating
56,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,The LEGO Store,4e848fbb5c5c9240de8e6a80,43.778207,-79.343483,Toy / Game Store,8.0
57,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,CF Fairview Mall,4ada3af3f964a520482021e3,43.777674,-79.344402,Shopping Mall,7.7
58,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,Michel's Baguette,4bbaa0f17421a5937311c440,43.777082,-79.344557,Bakery,7.5


#### Recommendation for Willowdale South

In [63]:
f1=final_data[final_data['Neighbourhood']=='Willowdale South']
f1.sort_values(by=['rating'],ascending=False).head(3)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,id,Venue Latitude,Venue Longitude,Venue Category,rating
129,Willowdale South,43.77012,-79.408493,The Keg,5a35b4443abcaf37eb1a0d88,43.766579,-79.412131,Steakhouse,8.3
130,Willowdale South,43.77012,-79.408493,Starbucks,4aedfeadf964a52005d121e3,43.768192,-79.413021,Coffee Shop,7.9
132,Willowdale South,43.77012,-79.408493,Konjiki Ramen,5a02789d0a464d3112a58785,43.766998,-79.412222,Ramen Restaurant,7.9


#### Recommendation for Bedford Park, Lawrence Manor East

In [64]:
f2=final_data[final_data['Neighbourhood']=='Bedford Park, Lawrence Manor East']
f2.sort_values(by=['rating'],ascending=False).head(3)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,id,Venue Latitude,Venue Longitude,Venue Category,rating
105,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,Aroma Espresso Bar,502bb730e4b01590f997803d,43.735975,-79.420391,Café,8.4
106,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,Dickson Home Hardware,4c27a22a905a0f473e3b6560,43.735593,-79.420089,Hardware Store,8.1
108,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,The Copper Chimney,4d796616542ab1f75eb87c41,43.736195,-79.420271,Indian Restaurant,7.6


# 4. Results section

#### We can see that users are recommended best places in different neighbourhood according his rating profile

# 5. Discussion section

### Advantages and Disadvantages of Content-Based Filtering

##### Advantages
* Learns user's preferences
* Highly personalized for the user

##### Disadvantages
* Doesn't take into account what others think of the item, so low quality item recommendations might happen
* Extracting data is not always intuitive
* Determining what characteristics of the item the user dislikes or likes is not always obvious

# 6. Conclusion section

#### We conclude by saying that, this system has lot of scope and can be applied in many different fields. We concentrated on particular neighbourhood of Toronto due to limited Foursquare API calls and we can scale this project by buying more api calls from foursquare.
#### In the end, I want to thank IBM Applied Data Science Capstone Instructors for making this possible