# Introduction to Recommender Systems



## Introduction

Recommender systems are a foundational part of the digital world today. **They are used to personalize the experience a user has with a website or application, thus making it more useful to the user.** Some of today's most prominent companies have recommender systems at the core of the customer experience.

* Amazon recommends you products based on past product purchases, ratings, and the purchases/ratings of other customers similar to you.
* Netflix recommends you movies based on the movies you've watched, your rating of movies, and the behavior of other users similar to you.
*  Facebook, Twitter, & LinkedIn recommend people you might know based on your connections and the connections of those you are connected to and also use this information to personalize your newsfeed.

In this lesson, we will learn the fundamentals of recommender systems, how they work, and how to create a basic user-based filtering system using Python.

## Collaborative Filtering

Before recommender systems existed, the primary way to get recommendations about things like movies or products was to ask your friends. As you followed recommendations from them, you would get a sense of which friends had tastes that were most similar to yours and rely on their recommendations more frequently. However, as the number of options available to us has increased drastically over the last few decades, it became increasingly difficult to rely on recommendations from a small group of friends because it also became increasingly likely that they were not aware of all the available options. The solution for this was a method called collaborative filtering.

Collaborative filtering provides us with a way of making automatic predictions (filtering) about the interests of a user by collecting preferences from many users (collaborating). **The underlying assumption is that if two people have the same opinion on one issue, they are likely to have a similar opinion on other issues as well.** There are a few different ways to approach collaborative filtering, but generally speaking, they involve finding a group of people similar to a user, analyzing the things they like, and coming up with a ranked list of recommendations for the user.

### User Similarity

**User similarity** is at the heart of collaborative filtering. In order to make good recommendations, *we need to know how alike two users are*. The way to do this is by comparing how similar their ratings for the same product have been. For example, let's import the movie_ratings.csv file.

In [None]:
import pandas as pd

In [None]:
ratings = pd.read_csv('movie_ratings.csv').set_index('Movie')

In [None]:
ratings

In [None]:
import numpy as np
name_list = ['Sreelatha', 'Sara', 'Eva', 'Maaike', 'Victor', 'Zuzanna']
np.random.choice(name_list)

In [None]:
import numpy as np
name_list = ['Sara', 'Eva']
np.random.choice(name_list)

If we wanted to take a look at their user ratings across the preference space of two movies, we could define a function that accepts two movie titles and scatter plots the user ratings as follows.

In [None]:
import matplotlib.pyplot as plt

In [None]:
#@title
def ratings_scatter(movie1, movie2):
    x = ratings.T[movie1]
    y = ratings.T[movie2]
    n = list(ratings.T.index)

    fig, ax = plt.subplots()
    ax.scatter(x, y, s=0)
    fig.set_figwidth(12)
    fig.set_figheight(8)
    plt.title("Preference Space for "+ movie1 + " vs. " + movie2, fontsize=20)
    ax.set_xlabel(movie1, fontsize=16)
    ax.set_ylabel(movie2, fontsize=16) 

    for i, txt in enumerate(n):
        ax.annotate(txt, (x[i], y[i]), fontsize=12)

Then, we would just need to choose two titles and generate the plot to visualize the preference space for users across those two movies. For example, below is the preference space across the movies Venom and Incredibles 2.


In [None]:
ratings_scatter('Venom', 'Incredibles 2')

In [None]:
import numpy as np
name_list = ['Sreelatha', 'Sara', 'Eva', 'Maaike', 'Victor', 'Zuzanna']
np.random.choice(name_list)

We can see that across this preference space, Rusty is more similar to Brandon than he is Emily. However, this can vary across different sets of movies. If we instead scatter plot the ratings for Bohemian Rhapsody and Jurassic World, Rusty is more similar to Emily than to Brandon in this preference space.

In [None]:
ratings_scatter('Bohemian Rhapsody', 'Jurassic World: Fallen Kingdom')

Here we see that the preferences are differently aligned. To get the aggregate distance across all the preference spaces, we can use squareform and pdist from the Scipy library to create a matrix containing the Euclidean distances between all our users. **This will provide us with a value between all data points in higher dimensional space. Keep in mind though that it is simply a generalisation of the plots above.**

In [None]:
from scipy.spatial.distance import pdist, squareform  #Pairwise distances between observations in n-dimensional space. 

In [None]:
squareform(pdist(ratings.T, 'euclidean'))

In [None]:
pdist(ratings.T, 'euclidean')

![alt text](https://bigsnarf.files.wordpress.com/2012/03/distance.jpg?w=584)

These distances are smaller for users that are more similar, *but for our purposes it would be preferable to have a higher score for users that are more similar and a lower score for users that are less similar.* We can easily achieve this by adding 1 to the scores and inverting them. Let's also import them into a data frame so that we can more easily view and analyze them.

In [None]:
1/(1 + squareform(pdist(ratings.T, 'euclidean')))

In [None]:
distances = pd.DataFrame(1/(1 + squareform(pdist(ratings.T, 'euclidean'))), 
                         index=ratings.columns, columns=ratings.columns)

In [None]:
distances

### Generating Recommendations for a User

Suppose that we had a new user named Tom who came to us with the following ratings for each movie.


In [None]:
tom = {'Aquaman': 2,
 'Avengers: Infinity War': 1,
 'Black Panther': 5,
 'Bohemian Rhapsody': 5,
 'Deadpool 2': 2,
 'Fantastic Beasts: The Crimes of Grindelwald': 3,
 'Incredibles 2': 3,
 'Jurassic World: Fallen Kingdom': 4,
 'Mission: Impossible Fallout': 3,
 'Venom': 3}

We could factor Tom into our similarity matrix and then determine which other users are most similar to him.

In [None]:
ratings['Tom'] = pd.Series(tom)

In [None]:
ratings

In [None]:
distances = pd.DataFrame(1/(1 + squareform(pdist(ratings.T, 'euclidean'))), 
                         index=ratings.columns, columns=ratings.columns)

In [None]:
distances

In [None]:
similarities = distances['Tom'].sort_values(ascending=False)[1:]
similarities

It looks like Cleo is the user that is most similar to Tom, followed by Brandon, Emily, Rusty, and then Samantha. **These similarity scores should be taken into consideration when recommending movies to Tom that he has not yet seen.**

Let's say that *everyone but Tom* also rated the movies in the movie_ratings2.csv file. Let's import those ratings and then combine them with the similarity scores to generate movie recommendations to Tom.

In [None]:
new_ratings = pd.read_csv('movie_ratings2.csv').set_index('Movie')

In [None]:
new_ratings

Once we have our new_ratings data frame, we are going to copy it to a new recommendations data frame which we will perform our calculations on.

In [None]:
recommendations = new_ratings.copy()

In [None]:
recommendations

We are going to iterate through each of the other users' similarity scores with Tom and weight their recommendations of these new movies by their similarity score. We are then going to create a new Total column that sums up the total scores for each movie and sort on that so that the movies are in the order that they should be recommended to Tom.

In [None]:
recommendations['Brandon'] * similarities['Brandon']

In [None]:
for name, score in dict(similarities).items():
    recommendations[name] = recommendations[name] * score

In [None]:
recommendations['Total'] = recommendations.sum(axis=1)
recommendations.sort_values('Total', ascending=False)

From these results, it looks like Tom should like Despicable Me 3, Wonder Woman, and Thor but perhaps not Star Wars or The Fate of the Furious.

## Using different Similarity Matrix

Now that we have generated an initial set of recommendations, there is an important topic we would like to circle back and discuss. When we computed our user similarity metrics, we used Euclidean distance as our distance metric. It is important to note that there are a number of other distance metrics in Scipy that we could potentially use to get different results. A full list of all the ones available can be found here.

To see what the recommendations look like using a different distance metric, all we need to do is swap out the name if the metric in the first line of code below. For example, if we wanted to use cosine distance instead, the results would look like this.

In [None]:
distances = pd.DataFrame(1/(1 + squareform(pdist(ratings.T, 'cosine'))), 
                         index=ratings.columns, columns=ratings.columns)

![alt text](https://www.researchgate.net/publication/320914786/figure/fig2/AS:558221849841664@1510101868614/The-difference-between-Euclidean-distance-and-cosine-similarity.png)

In [None]:
similarities = distances['Tom'].sort_values(ascending=False)[1:]

In [None]:
similarities

In [None]:
recommendations = new_ratings.copy()

In [None]:
for name, score in dict(similarities).items():
    recommendations[name] = recommendations[name] * score

In [None]:
recommendations['Total'] = recommendations.sum(axis=1)
recommendations.sort_values('Total', ascending=False)

We can see that Wonder Woman is now at the top of the list, Guardians of the Galaxy dropped a few places, and Wolf Warrior crept up to the number 4 spot.

If we tried cityblock distance instead, Guardians of the Galaxy shoots up to the number 3 spot and Spider-Man creeps up into the top 5.

## Summary 

In this lesson, we provided an introduction to recommender systems and took a look at how we could create a user-based filtering recommender. Along the way, we visualized user preference spaces for pairs of movies, calculated similarity scores between groups of users, and combined those similarity scores with ratings to rank movies so that we could recommend them to a particular user. This is just the tip of the iceberg when it comes to recommender systems, but hopefully this lesson gave you a sense of how these types of systems work and how you might put one together of your very own.