### Recommender Systems

A recommender system is a simple algorithm whose aim is to provide the most relevant information to a user by discovering patterns in a dataset. The algorithm rates the items and shows the user the items that they would rate highly. An example of recommendation in action is when you visit Amazon and you notice that some items are being recommended to you or when Netflix recommends certain movies to you. They are also used by Music streaming applications such as Spotify and Deezer to recommend music that you might like. 

***

#### Collaborative filtering recommender systems

In collaborative filtering the behavior of a group of users is used to make recommendations to other users. Recommendation is based on the preference of other users. A simple example would be recommending a movie to a user based on the fact that their friend liked the movie.

***

<img src="images\recom.png" width="450">

In [483]:
#importing packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import keras
import warnings
warnings.filterwarnings('ignore')

#### MovieLens 1M Dataset

The dataset contains 1,000,209 anonymous ratings of approximately 3,900 movies 
made by 6,040 MovieLens users who joined MovieLens in 2000.

In [484]:
# Importing the dataset
movies = pd.read_csv('ml-1m/movies.dat', sep='::', header=None, engine='python', encoding='latin-1',
                     names=['movie_id', 'title', 'category'])
users = pd.read_csv('ml-1m/users.dat', sep='::', header=None, engine='python', encoding='latin-1',
                    names=['user_id', 'gender', 'age', 'user_job_id', 'zip_code'])
ratings = pd.read_csv('ml-1m/ratings.dat', sep='::', header=None, engine='python', encoding='latin-1',
                      names=['user_id', 'movie_id', 'rating', 'timestamp'])

In [485]:
print('Movies')
print(movies.head(5))
print('\nUsers')
print(users.head(5))
print('\nRatings')
print(ratings.head(5))

Movies
   movie_id                               title                      category
0         1                    Toy Story (1995)   Animation|Children's|Comedy
1         2                      Jumanji (1995)  Adventure|Children's|Fantasy
2         3             Grumpier Old Men (1995)                Comedy|Romance
3         4            Waiting to Exhale (1995)                  Comedy|Drama
4         5  Father of the Bride Part II (1995)                        Comedy

Users
   user_id gender  age  user_job_id zip_code
0        1      F    1           10    48067
1        2      M   56           16    70072
2        3      M   25           15    55117
3        4      M   45            7    02460
4        5      M   25           20    55455

Ratings
   user_id  movie_id  rating  timestamp
0        1      1193       5  978300760
1        1       661       3  978302109
2        1       914       3  978301968
3        1      3408       4  978300275
4        1      2355       5  978824291

We need to pivot the ratings dataframe such that the rows are users, <br>
columns are movies and the values are the ratings

In [486]:
ratings_pivot=pd.pivot_table(ratings.iloc[:,[0,1,2]], index='user_id', columns='movie_id',
                             values='rating', fill_value=0)

In [487]:
ratings_pivot.head(5)

movie_id,1,2,3,4,5,6,7,8,9,10,...,3943,3944,3945,3946,3947,3948,3949,3950,3951,3952
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,2,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [488]:
#splitting the data into training set and test set
from sklearn.model_selection import train_test_split
X_train, X_test = train_test_split(ratings_pivot, train_size=0.8)

In [489]:
print(f'X_train shape is {X_train.shape}\nX_test shape is {X_test.shape}')

X_train shape is (4832, 3706)
X_test shape is (1208, 3706)


### Auto Encoder

Autoencoders (AE) are neural networks that aim to copy their inputs to their outputs. They work by compressing the input into a latent-space representation, and then reconstructing the output from this representation. This kind of network is composed of two parts :

Encoder: This is the part of the network that compresses the input into a latent-space representation.

Decoder: This part aims to reconstruct the input from the latent space representation. 

***

<img src="images\architecture.png">

***

#### What are autoencoders used for ?

Today data denoising and dimensionality reduction for data visualization are considered as two main interesting practical applications of autoencoders. 
In our current use-case we are using an Autoencoder to learn complex underlying patterns in the data.

***

<img src="images\example.png">

***
#### Structure of Auto Encoder

Architecturally, the Autoencoder is a neural network having an input layer, hidden layers and an output layer. The output layer has the same number of neurons as the input layer for the purpose of reconstructing it’s own inputs. A compressed representation of the data is formed in the hidden layers by learning correlations in the data.
An Autoencoder is a form of unsupervised learning, which means no labelled data are necessary — only a set of input data instead of input-output pairs.
***
<img src="images\autoencoder.png" width="650">

In [502]:
#Auto Encoder
from keras.layers import Input, Dense
from keras.models import Model, Sequential

input_data= Input(shape=(3706,))
encoded = Dense(units=256, activation='relu')(input_data)
encoded = Dense(units=128, activation='relu')(encoded)
encoded = Dense(units=64, activation='relu')(encoded)
encoded = Dense(units=32, activation='relu')(encoded)
decoded = Dense(units=64, activation='relu')(encoded)
decoded = Dense(units=128, activation='relu')(decoded)
decoded = Dense(units=256, activation='relu')(decoded)
decoded = Dense(units=3706, activation='sigmoid')(decoded)

autoencoder=Model(input_data, decoded)
autoencoder.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_12 (InputLayer)        (None, 3706)              0         
_________________________________________________________________
dense_73 (Dense)             (None, 256)               948992    
_________________________________________________________________
dense_74 (Dense)             (None, 128)               32896     
_________________________________________________________________
dense_75 (Dense)             (None, 64)                8256      
_________________________________________________________________
dense_76 (Dense)             (None, 32)                2080      
_________________________________________________________________
dense_77 (Dense)             (None, 64)                2112      
_________________________________________________________________
dense_78 (Dense)             (None, 128)               8320      
__________

In [503]:
autoencoder.compile(optimizer='adam', loss='mse')
model=autoencoder.fit(X_train, X_train, epochs=10, batch_size=100, shuffle=True, 
                validation_data=(X_test, X_test))

Train on 4832 samples, validate on 1208 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [505]:
sample=X_test
sample.head()

movie_id,1,2,3,4,5,6,7,8,9,10,...,3943,3944,3945,3946,3947,3948,3949,3950,3951,3952
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2657,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1539,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5800,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3446,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4712,0,0,0,0,0,0,0,3,0,0,...,0,0,0,0,0,0,0,0,0,0


In [506]:
#make predictions
pred=autoencoder.predict(sample)

In [507]:
#converting predictions to ratings
fin_pred=(pred*5).round(2)
fin_pred

array([[4.67, 0.02, 0.01, ..., 0.  , 0.  , 0.22],
       [0.  , 0.  , 0.04, ..., 0.01, 0.04, 0.14],
       [1.22, 0.37, 0.16, ..., 0.01, 0.11, 0.3 ],
       ...,
       [0.01, 0.  , 0.02, ..., 0.  , 0.01, 0.78],
       [0.  , 0.  , 0.01, ..., 0.02, 0.45, 4.65],
       [4.63, 0.  , 0.  , ..., 0.  , 0.  , 0.76]], dtype=float32)

In [508]:
#choosing a random user out of the test set
i=3
fin_pred[i]

array([0.72, 0.03, 0.05, ..., 0.  , 0.06, 0.05], dtype=float32)

In [509]:
#creating a list of movie_ids with predicted rating equal to 5
recom=[]
for index,value in enumerate(fin_pred[i]):
    if value==5:
        recom.append(index+1)
print(recom)

[1125, 2456, 2652]


In [510]:
#mapping movie_ids to movie names
recommended_movies=movies[movies.movie_id.isin(recom)]
print('Recommended Movies')
recommended_movies.head(15)

Recommended Movies


Unnamed: 0,movie_id,title,category
1109,1125,"Return of the Pink Panther, The (1974)",Comedy
2387,2456,"Fly II, The (1989)",Horror|Sci-Fi
2583,2652,"Curse of Frankenstein, The (1957)",Horror


In [511]:
#movies the user gave 5 star rating originally
original=(X_test.iloc[i][X_test.iloc[i]==5].index)
original=list(original)
originally_liked_movies=movies[movies.movie_id.isin(original)]
print('Movies the user gave 5 star rating originally')
originally_liked_movies

Movies the user gave 5 star rating originally


Unnamed: 0,movie_id,title,category
257,260,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi
589,593,"Silence of the Lambs, The (1991)",Drama|Thriller
789,799,"Frighteners, The (1996)",Comedy|Horror
1196,1214,Alien (1979),Action|Horror|Sci-Fi|Thriller
1201,1219,Psycho (1960),Horror|Thriller
1235,1255,Bad Taste (1987),Comedy|Horror
1274,1294,M*A*S*H (1970),Comedy|War
1319,1340,Bride of Frankenstein (1935),Horror
1326,1347,"Nightmare on Elm Street, A (1984)",Horror
1366,1387,Jaws (1975),Action|Horror


Sources:
1. https://towardsdatascience.com/how-to-build-a-simple-recommender-system-in-python-375093c3fb7d
    
2. https://towardsdatascience.com/deep-autoencoders-for-collaborative-filtering-6cf8d25bbf1d
    
3. https://medium.com/@connectwithghosh/recommender-system-on-the-movielens-using-an-autoencoder-using-tensorflow-in-python-f13d3e8d600d