# Destinations Recommender based on Content-Based Filtering

Collaborative filtering relies solely on user-item interactions within the utility matrix. The issue with this approach is that brand new users or items with no interactions get excluded from the recommendation system. This is called the "cold start" problem. Content-based filtering is a way to handle this problem by generating recommendations based on user and item features.

### Step 1: Import Dependencies

In [20]:
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt

### Step 2: Load the Data

In [21]:
destinations = pd.read_csv("sample_destinations.csv")
destinations.head()

Unnamed: 0,destination_id,title,genre,history,art_and_architecture,nature,adventure,entertainment,health_and_lifestyle,food,industries,religious
0,0,Boudhanath Stupa,['Religious Sites'],True,True,False,False,False,False,False,False,True
1,1,Phewa Tal (Fewa Lake),['Bodies of Water'],False,False,True,False,False,False,False,False,False
2,2,Sarangkot,['Mountains'],False,False,True,False,False,False,False,False,False
3,3,Swayambhunath Temple,['Religious Sites'],True,True,False,False,False,False,False,False,True
4,4,Poon Hill,['Mountains'],False,False,True,False,False,False,False,False,False


### Step 3: Data Cleaning and Exploration


In [22]:
#convert True/False categories to 1/0
converted=destinations.iloc[:,3:12].astype(int)

In [23]:
# drop columns in dataframe 
destinations.drop(destinations.columns[2:12],axis=1,inplace=True)

In [24]:
destinations.head()

Unnamed: 0,destination_id,title
0,0,Boudhanath Stupa
1,1,Phewa Tal (Fewa Lake)
2,2,Sarangkot
3,3,Swayambhunath Temple
4,4,Poon Hill


In [25]:
#concat the converted and original dataframe 
new_df=pd.concat([destinations,converted],axis=1)
new_df

Unnamed: 0,destination_id,title,history,art_and_architecture,nature,adventure,entertainment,health_and_lifestyle,food,industries,religious
0,0,Boudhanath Stupa,1,1,0,0,0,0,0,0,1
1,1,Phewa Tal (Fewa Lake),0,0,1,0,0,0,0,0,0
2,2,Sarangkot,0,0,1,0,0,0,0,0,0
3,3,Swayambhunath Temple,1,1,0,0,0,0,0,0,1
4,4,Poon Hill,0,0,1,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...
195,195,Aarya Ghat,1,1,1,0,0,0,0,0,0
196,196,Eternal Peace Flame,1,1,1,0,0,0,0,0,0
197,197,Tengboche Gompa,1,1,0,0,0,0,0,0,1
198,198,Bhat-Bhateni Supermarket and Departmental Store,0,0,0,0,1,0,0,0,0


#### How many tourism genres are there?

In [26]:
new_df.iloc[:,2:12].sum()

history                 108
art_and_architecture    109
nature                   90
adventure                16
entertainment            13
health_and_lifestyle      4
food                      0
industries                2
religious                58
dtype: int64

From the above results, we saw that art_and_architecture and history are the most popular tourism genres out of 200 destinations in Nepal. And, food is the least popular genre. 

#### Make a 'destinations_feature' dataframe 

In [27]:
#set 'destination_id' as index
new_df.set_index('destination_id',inplace=True)

In [28]:
new_df.head()

Unnamed: 0_level_0,title,history,art_and_architecture,nature,adventure,entertainment,health_and_lifestyle,food,industries,religious
destination_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,Boudhanath Stupa,1,1,0,0,0,0,0,0,1
1,Phewa Tal (Fewa Lake),0,0,1,0,0,0,0,0,0
2,Sarangkot,0,0,1,0,0,0,0,0,0
3,Swayambhunath Temple,1,1,0,0,0,0,0,0,1
4,Poon Hill,0,0,1,0,0,0,0,0,0


In [29]:
# drop the 'title' column
new_df.drop(['title'],axis=1,inplace=True)

In [30]:
destination_features=new_df
destination_features.head()

Unnamed: 0_level_0,history,art_and_architecture,nature,adventure,entertainment,health_and_lifestyle,food,industries,religious
destination_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,1,1,0,0,0,0,0,0,1
1,0,0,1,0,0,0,0,0,0
2,0,0,1,0,0,0,0,0,0
3,1,1,0,0,0,0,0,0,1
4,0,0,1,0,0,0,0,0,0


### Step 4: Building a "Similar Destinations" Recommender Using Cosine Similarity

We're going to build our item-item recommender using a similarity metric called [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity). 

Cosine similarity looks at the cosine angle between two vectors (e.g., $A$ and $B$). The smaller the cosine angle, the higher the degree of similarity between $A$ and $B$. You can calculate the similarity between $A$ and $B$ with this equation:

$$\cos(\theta) = \frac{A\cdot B}{||A|| ||B||}$$

In this tutorial, we're going to use scikit-learn's cosine similarity [function](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html) to generate a cosine similarity matrix of shape $(n_{\text{movies}}, n_{\text{movies}})$. With this cosine similarity matrix, we'll be able to extract movies that are most similar to the movie of interest.

In [31]:
from sklearn.metrics.pairwise import cosine_similarity

cosine_sim = cosine_similarity(destination_features, destination_features)
print(f"Dimensions of our destination features cosine similarity matrix: {cosine_sim.shape}")

Dimensions of our destination features cosine similarity matrix: (200, 200)


As expected, after passing the `destination_features` dataframe into the `cosine_similarity()` function, we get a cosine similarity matrix of shape $(n_{\text{movies}}, n_{\text{movies}})$.

This matrix is populated with values between 0 and 1 which represent the degree of similarity between movies along the x and y axes.

### Let's create a movie finder function

Use [fuzzywuzzy](https://pypi.org/project/fuzzywuzzy/) so the exact names of the destinations do not have to be typed.

In [32]:
from fuzzywuzzy import process

def destination_finder(title):
    all_titles = destinations['title'].tolist()
    closest_match = process.extractOne(title,all_titles)
    return closest_match[0]

Let's test this out with our Bouddhanath example. 

In [33]:
title = destination_finder('bouddhanath')
title

'Boudhanath Stupa'

To get relevant recommendations for Jumanji, we need to find its index in the cosine simialrity matrix. To identify which row we should be looking at, we can create a movie index mapper which maps a movie title to the index that it represents in our matrix. 

Let's create a destination index dictionary called `destination_idx` where the keys are destination titles and values are destination indices:

In [34]:
destination_idx = dict(zip(destinations['title'], list(destinations.index)))
idx = destination_idx[title]
idx

0

Using this handy `destination_idx` dictionary, we know that Boudhanath Stupa is represented by index 0 in our matrix. Let's get the top 10 most similar sites to Boudhanath Stupa.

In [35]:
n_recommendations=10
sim_scores = list(enumerate(cosine_sim[idx]))
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
sim_scores = sim_scores[1:(n_recommendations+1)]
similar_destinations = [i[0] for i in sim_scores]

`similar_movies` is an array of indices that represents boudhanath stupa's top 10 recommendations. We can get the corresponding destination titles by either creating an inverse `destination_idx` mapper or using `iloc` on the title column of the `destinations` dataframe.

In [36]:
print(f"Because you watched {title}:")
destinations['title'].iloc[similar_destinations]

Because you watched Boudhanath Stupa:


3                        Swayambhunath Temple
5                                Peace Temple
6                        Pashupatinath Temple
11    Golden Temple (Hiranya Varna Mahavihar)
16                            Kopan Monastery
30                           Muktinath Temple
31                              Barahi temple
32                           Maya Devi Temple
35                      Bindhya Basini Temple
41                          Mahaboudha Temple
Name: title, dtype: object

Now, create a single function to accomodate all these steps.

In [37]:
def get_content_based_recommendations(title_string, n_recommendations=10):
    title = destination_finder(title_string)
    idx = destination_idx[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:(n_recommendations+1)]
    similar_destinations = [i[0] for i in sim_scores]
    return similar_destinations
    print(f"Recommendations for {title}:")
    print(destinations['title'].iloc[similar_destinations])

In [38]:
recommendation_ids=get_content_based_recommendations('fewa lake', 10)

In [39]:
print(recommendation_ids)

[2, 4, 8, 9, 10, 15, 19, 25, 26, 27]
