# Recommender Systems

![Image](./img/goal_recommender_systems.jpg)

# Data  and methods

- __Explicit Feedback:__ direct and quantitative data collected from users.

- __Implicit Feedback:__ data collected indirectly from user interactions, and they act as a proxy for user preference.

![Image](./img/recommender_systems_methods.png)

---

## Collaborative filtering methods

- Based on past user-item iteractions.

- Detect similar users or similar items.

- Memory based (nearest neighbours) and Model based (underlying generative model).

- Require no info about the users or items.

- More interactions => More accuracy.

In [None]:
# imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import cufflinks as cf
cf.go_offline()

# Scipy
from scipy.spatial.distance import pdist, squareform   # conda install -c anaconda scipy

[SciPy](https://scipy.github.io/devdocs/reference/index.html) contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.

### Dataset EDA

In [None]:
movie_ratings = pd.read_csv('./datasets/movie_ratings.csv').set_index('Movie')
movie_ratings

In [None]:
users_ratings = movie_ratings.T
users_ratings

In [None]:
movies = list(users_ratings.columns)
users_ratings.iplot(y=movies,
                    kind='line',
                    width=10.0,
                    subplots=True,
                    fill=True,
                    title='Ratings per movie');

---

## User-user memory based method

It represent users based on their interactions with items and evaluate distances between users.

In [None]:
# Ploting function

def ratings_scatter(movie1, movie2):
    x = users_ratings[movie1]
    y = users_ratings[movie2]
    n = list(users_ratings.index)

    fig, ax = plt.subplots()
    ax.scatter(x, y, s=100)
    fig.set_figwidth(12)
    fig.set_figheight(8)
    plt.title("Preference Space for "+ movie1 + " vs. " + movie2, fontsize=20)
    ax.set_xlabel(movie1, fontsize=16)
    ax.set_ylabel(movie2, fontsize=16) 

    for i, txt in enumerate(n):
        ax.annotate(txt, (x[i], y[i]), fontsize=12)

In [None]:
# 2 dimensional space

ratings_scatter('Venom', 'Incredibles 2')
ratings_scatter('Bohemian Rhapsody', 'Jurassic World: Fallen Kingdom')
ratings_scatter('Fantastic Beasts: The Crimes of Grindelwald', 'Mission: Impossible – Fallout')
ratings_scatter('Black Panther', 'Deadpool 2')

---

### scipy.spatial.distance.pdist

Pairwise distances between observations in n-dimensional space.

https://scipy.github.io/devdocs/reference/generated/scipy.spatial.distance.pdist.html

In [None]:
dist_calculation = pdist(X=users_ratings, metric='euclidean')
dist_calculation

In [None]:
# Squareform method: https://scipy.github.io/devdocs/reference/generated/scipy.spatial.distance.squareform.html

dist_distribution = squareform(dist_calculation)
dist_distribution

In [None]:
euclid_dist = pd.DataFrame(dist_distribution,
                           index=movie_ratings.columns, 
                           columns=movie_ratings.columns)

euclid_dist

In [None]:
# Frame of reference

euclid_dist_norm = pd.DataFrame(1/(1 + squareform(pdist(users_ratings, 'euclidean'))),
                                index=movie_ratings.columns,
                                columns=movie_ratings.columns)

euclid_dist_norm

---

### Full Pipeline

In [None]:
# New User into the system

Tom = {'Aquaman': 2,
       'Avengers: Infinity War': 1,
       'Black Panther': 5,
       'Bohemian Rhapsody': 5,
       'Deadpool 2': 2,
       'Fantastic Beasts: The Crimes of Grindelwald': 3,
       'Incredibles 2': 3,
       'Jurassic World: Fallen Kingdom': 4,
       'Mission: Impossible – Fallout': 3,
       'Venom': 3}

In [None]:
movie_ratings['Tom'] = pd.Series(Tom)
movie_ratings

In [None]:
# Distances calculation in the n-dimensional space

euclid_dist_norm = pd.DataFrame(1/(1 + squareform(pdist(movie_ratings.T, 'euclidean'))),
                                index=movie_ratings.columns,
                                columns=movie_ratings.columns)

euclid_dist_norm

In [None]:
# Similarities to the new User

euclid_simil_norm = euclid_dist_norm['Tom'].sort_values(ascending=False)[1:]
euclid_simil_norm

In [None]:
# Movies that the new User user hasn't watched

movie_ratings_test = pd.read_csv('./datasets/movie_ratings_test.csv').set_index('Movie')
recommend_euclid = movie_ratings_test.copy()
recommend_euclid

In [None]:
# Similarities to the new User (we need to reassemble the data)

euclid_simil_items = dict(euclid_simil_norm).items()
euclid_simil_items

In [None]:
# Movies matrix weights by user

for name, score in euclid_simil_items:
    recommend_euclid[name] = recommend_euclid[name] * score
recommend_euclid

In [None]:
# Movies matrix weights cosidering all users (i.e.: Total Weight per Movie)

recommend_euclid['Total Weight per Movie'] = recommend_euclid.sum(axis=1)
recommend_euclid = recommend_euclid.sort_values('Total Weight per Movie', ascending=False)
recommend_euclid

In [None]:
# Our recommendation for Tom!!!

top5_movies_euclid = recommend_euclid[['Total Weight per Movie']].head()
top5_movies_euclid

---

### Cosine similarity

Cosine similarity is generally used as a metric for measuring distance when the magnitude of the vectors does not matter and low complexity is required.

![Image](./img/cosine_similarity.jpg)

In [None]:
# 2 dimensional space

ratings_scatter('Venom', 'Incredibles 2')

In [None]:
# Cosine similarity calculation in the n-dimensional space 

movie_ratings = pd.read_csv('./datasets/movie_ratings.csv').set_index('Movie')
movie_ratings['Tom'] = pd.Series(Tom)
cosine_dist_norm = pd.DataFrame(1/(1 + squareform(pdist(movie_ratings.T, 'cosine'))),
                                index=movie_ratings.columns,
                                columns=movie_ratings.columns)

cosine_dist_norm

In [None]:
# Our recommendation for Tom using Cosine Similarity

movie_ratings_test = pd.read_csv('./datasets/movie_ratings_test.csv').set_index('Movie')
recommend_cosine = movie_ratings_test.copy()
cosine_simil_norm = cosine_dist_norm['Tom'].sort_values(ascending=False)[1:]
cosine_simil_items = dict(cosine_simil_norm).items()
for name, score in cosine_simil_items:
    recommend_cosine[name] = recommend_cosine[name] * score
recommend_cosine['Total Weight per Movie'] = recommend_cosine.sum(axis=1)
recommend_cosine = recommend_cosine.sort_values('Total Weight per Movie', ascending=False)
top5_movies_cosine = recommend_cosine[['Total Weight per Movie']].head()
top5_movies_cosine

---