### What is a recommender system?

A recommender system is a simple algorithm whose aim is to provide the most relevant information to a user by discovering patterns in a dataset. The algorithm rates the items and shows the user the items that they would rate highly. An example of recommendation in action is when you visit Amazon and you notice that some items are being recommended to you or when Netflix recommends certain movies to you. They are also used by Music streaming applications such as Spotify and Deezer to recommend music that you might like. 

***

#### Collaborative filtering recommender systems

In collaborative filtering the behavior of a group of users is used to make recommendations to other users. Recommendation is based on the preference of other users. A simple example would be recommending a movie to a user based on the fact that their friend liked the movie.

***

![alt text](recom.png "Recommender Systems")

In [328]:
#importing packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import keras
import warnings
warnings.filterwarnings('ignore')

#### MovieLens 1M Dataset

The dataset contains 1,000,209 anonymous ratings of approximately 3,900 movies 
made by 6,040 MovieLens users who joined MovieLens in 2000.

In [329]:
# Importing the dataset
movies = pd.read_csv('ml-1m/movies.dat', sep='::', header=None, engine='python', encoding='latin-1',
                     names=['movie_id', 'title', 'category'])
users = pd.read_csv('ml-1m/users.dat', sep='::', header=None, engine='python', encoding='latin-1',
                    names=['user_id', 'gender', 'age', 'user_job_id', 'zip_code'])
ratings = pd.read_csv('ml-1m/ratings.dat', sep='::', header=None, engine='python', encoding='latin-1',
                      names=['user_id', 'movie_id', 'rating', 'timestamp'])

In [330]:
print('Movies')
print(movies.head(5))
print('\nUsers')
print(users.head(5))
print('\nRatings')
print(ratings.head(5))

Movies
   movie_id                               title                      category
0         1                    Toy Story (1995)   Animation|Children's|Comedy
1         2                      Jumanji (1995)  Adventure|Children's|Fantasy
2         3             Grumpier Old Men (1995)                Comedy|Romance
3         4            Waiting to Exhale (1995)                  Comedy|Drama
4         5  Father of the Bride Part II (1995)                        Comedy

Users
   user_id gender  age  user_job_id zip_code
0        1      F    1           10    48067
1        2      M   56           16    70072
2        3      M   25           15    55117
3        4      M   45            7    02460
4        5      M   25           20    55455

Ratings
   user_id  movie_id  rating  timestamp
0        1      1193       5  978300760
1        1       661       3  978302109
2        1       914       3  978301968
3        1      3408       4  978300275
4        1      2355       5  978824291

We need to pivot the ratings dataframe such that the rows are users, <br>
columns are movies and the values are the ratings

In [331]:
ratings_pivot=pd.pivot_table(ratings.iloc[:,[0,1,2]], index='user_id', columns='movie_id',
                             values='rating', fill_value=0)

In [332]:
ratings_pivot.head(5)

movie_id,1,2,3,4,5,6,7,8,9,10,...,3943,3944,3945,3946,3947,3948,3949,3950,3951,3952
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,2,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [333]:
#splitting the data into training set and test set
from sklearn.model_selection import train_test_split
X_train, X_test = train_test_split(ratings_pivot, train_size=0.8)

In [334]:
print(f'X_train shape is {X_train.shape}\nX_test shape is {X_test.shape}')

X_train shape is (4832, 3706)
X_test shape is (1208, 3706)


### Auto Encoder

Autoencoders (AE) are neural networks that aim to copy their inputs to their outputs. They work by compressing the input into a latent-space representation, and then reconstructing the output from this representation. This kind of network is composed of two parts :

Encoder: This is the part of the network that compresses the input into a latent-space representation.

Decoder: This part aims to reconstruct the input from the latent space representation. 

***

![alt text](architecture.png "Auto Encoder Architecture")

***

#### What are autoencoders used for ?

Today data denoising and dimensionality reduction for data visualization are considered as two main interesting practical applications of autoencoders. 

***

![alt text](example.png "Auto Encoder Application")

***
#### Structure of Auto Encoder

***

![alt text](autoencoder.png "Auto Encoder")

In [335]:
#Auto Encoder
from keras.layers import Input, Dense
from keras.models import Model, Sequential

input_data= Input(shape=(3706,))
encoded = Dense(units=256, activation='relu')(input_data)
encoded = Dense(units=64, activation='relu')(encoded)
encoded = Dense(units=16, activation='relu')(encoded)
decoded = Dense(units=64, activation='relu')(encoded)
decoded = Dense(units=256, activation='relu')(decoded)
decoded = Dense(units=3706, activation='sigmoid')(decoded)

autoencoder=Model(input_data, decoded)
autoencoder.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_7 (InputLayer)         (None, 3706)              0         
_________________________________________________________________
dense_39 (Dense)             (None, 256)               948992    
_________________________________________________________________
dense_40 (Dense)             (None, 64)                16448     
_________________________________________________________________
dense_41 (Dense)             (None, 16)                1040      
_________________________________________________________________
dense_42 (Dense)             (None, 64)                1088      
_________________________________________________________________
dense_43 (Dense)             (None, 256)               16640     
_________________________________________________________________
dense_44 (Dense)             (None, 3706)              952442    
Total para

In [336]:
autoencoder.compile(optimizer='adam', loss='mse')
history=autoencoder.fit(X_train, X_train, epochs=15, batch_size=100, shuffle=True, 
                validation_data=(X_test, X_test))

Train on 4832 samples, validate on 1208 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [337]:
sample=X_test

In [338]:
sample.shape

(1208, 3706)

In [339]:
pred=autoencoder.predict(sample)

In [345]:
fin_pred=(pred*5).round(2)
fin_pred

array([[5.  , 0.16, 0.01, ..., 0.  , 0.  , 0.01],
       [5.  , 0.09, 0.01, ..., 0.  , 0.  , 0.02],
       [5.  , 4.98, 4.07, ..., 4.98, 1.43, 5.  ],
       ...,
       [5.  , 5.  , 5.  , ..., 0.68, 0.02, 4.99],
       [5.  , 5.  , 4.99, ..., 2.94, 0.  , 4.59],
       [0.41, 0.  , 0.02, ..., 0.  , 0.  , 0.  ]], dtype=float32)

In [358]:
i=20
fin_pred[i]

array([5.  , 0.29, 0.06, ..., 0.  , 0.  , 0.02], dtype=float32)

In [359]:
recom=[]
for index,value in enumerate(fin_pred[i]):
    if value==5:
        recom.append(index+1)
len(recom)

64

In [360]:
recommended_movies=movies[movies.movie_id.isin(recom)]

In [362]:
recommended_movies.head(10)

Unnamed: 0,movie_id,title,category
0,1,Toy Story (1995),Animation|Children's|Comedy
5,6,Heat (1995),Action|Crime|Thriller
33,34,Babe (1995),Children's|Comedy|Drama
49,50,"Usual Suspects, The (1995)",Crime|Thriller
60,61,Eye for an Eye (1996),Drama|Thriller
105,107,Muppet Treasure Island (1996),Adventure|Children's|Comedy|Musical
143,145,Bad Boys (1995),Action
154,156,Blue in the Face (1995),Comedy
251,254,Jefferson in Paris (1995),Drama
285,288,Natural Born Killers (1994),Action|Thriller


Sources:
1. https://towardsdatascience.com/how-to-build-a-simple-recommender-system-in-python-375093c3fb7d
    
2. https://towardsdatascience.com/deep-autoencoders-for-collaborative-filtering-6cf8d25bbf1d
    
3. https://medium.com/@connectwithghosh/recommender-system-on-the-movielens-using-an-autoencoder-using-tensorflow-in-python-f13d3e8d600d