# Introduction to Recommender Systems

<p align="center">
    <img width="721" alt="cover-image" src="https://user-images.githubusercontent.com/49638680/204351915-373011d3-75ac-4e21-a6df-99cd1c552f2c.png">
</p>

---

## Non-Personalised Recommendations

Let's introduce a non-personalised recommendation system. Before proceeding let's point out the reasons for the non-personalised recommendations:

1. New users: we know little about them (_cold-start_ problem).
2. Simple implementation but beneficial.
3. Communities share the same behaviour, hence same recommendations.
4. Applications to cases where personalised recommendations are impossible.

As an example, think about the weekly review of books or restaurants on newspaper articles.
These are also called __stereotyped recommender system__.

## A code implementation of a non-personalised recommendation system

The aim of this notebook is to provide an implementation of a non-personalised recommendation system.


In [1]:
# Import libraries
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

# set plot size
plt.rcParams['figure.figsize'] = (20, 13)
%matplotlib inline
%config InlineBackend.figure_format = "retina"

We are going to use only numpy to build the recommender for users that have not rated any movie, by suggesting the most popular movie in the catalogue.

### Import data

The data comes from the famous [movielens dataset](https://grouplens.org/datasets/movielens/), here we import the data into dataframes in order to make a nice data analysis before digging into the model.

In [3]:
df_rating = pd.read_csv('http://files.grouplens.org/datasets/movielens/ml-100k/u1.base', sep = '\t', engine='python', header=None)
df_rating.columns = ['UserId', 'MovieId', 'Rating', 'Timestamp']
df_rating_test = pd.read_csv('http://files.grouplens.org/datasets/movielens/ml-100k/u1.test', sep = '\t', engine='python', header=None)

df_users = pd.read_csv('http://files.grouplens.org/datasets/movielens/ml-100k/u.user', sep = '|', engine='python', header=None)
df_users.columns = ['UserId', 'Age', 'Gender', 'Occupation', 'ZipCode']
df_users.set_index('UserId', inplace = True)
df_items = pd.read_csv('http://files.grouplens.org/datasets/movielens/ml-100k/u.item', sep = '|', engine='python', encoding ='ISO-8859-1', header=None)
df_items.columns = ['MovieId', 'Title', 'Date', 'VideoReleaseDate', 'Url', 'unknown', 'Action','Adventure', 'Animation', 
                    'Children', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy','Film-Noir', 'Horror','Musical', 'Mystery', 'Romance', 
                    'Sci-Fi', 'Thriller', 'War', 'Western']
df_items.set_index('MovieId', inplace = True)

In [4]:
df_rating

Unnamed: 0,UserId,MovieId,Rating,Timestamp
0,1,1,5,874965758
1,1,2,3,876893171
2,1,3,4,878542960
3,1,4,3,876893119
4,1,5,3,889751712
...,...,...,...,...
79995,943,1067,2,875501756
79996,943,1074,4,888640250
79997,943,1188,3,888640250
79998,943,1228,3,888640275


In order to compose the training data as we have seen in lectures, we need to create a matrix having users in rows and movies in columns (or its equivalent transpose matrix).

This can be easily done by a pivot operator in pandas.

In [6]:
df_matrix = df_rating.pivot(index='UserId', columns='MovieId', values='Rating')

n_users = len(df_users)
n_items = len(df_items)

In [7]:
df_matrix

MovieId,1,2,3,4,5,6,7,8,9,10,...,1673,1674,1675,1676,1677,1678,1679,1680,1681,1682
UserId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,5.0,3.0,4.0,3.0,3.0,,4.0,1.0,5.0,,...,,,,,,,,,,
2,4.0,,,,,,,,,2.0,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
939,,,,,,,,,5.0,,...,,,,,,,,,,
940,,,,2.0,,,4.0,5.0,3.0,,...,,,,,,,,,,
941,5.0,,,,,,4.0,,,,...,,,,,,,,,,
942,,,,,,,,,,,...,,,,,,,,,,


As one can see, the matrix is full of nan values.

### Calculate the most popular movie

Let's look at the movie colums. We can create a vector, summing over the colums, to get a _total score_ measuring how much that movie has been rated.

Thanks to pandas methods, this can be done by one line of code.

In [16]:
movie_scores = df_matrix.sum(axis=0, skipna=True)
movie_scores

MovieId
1       1491.0
2        334.0
3        225.0
4        603.0
5        228.0
         ...  
1678       1.0
1679       3.0
1680       2.0
1681       3.0
1682       3.0
Length: 1650, dtype: float64

Now we need to store the movie score in the corresponding dataframe.
Since the movie score is a pandas series, whose index is the same as the index of the movie dataframe, we can simply add a column as follows.

In [17]:
df_items["score"] = movie_scores
df_items

Unnamed: 0_level_0,Title,Date,VideoReleaseDate,Url,unknown,Action,Adventure,Animation,Children,Comedy,...,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,score
MovieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,...,0,0,0,0,0,0,0,0,0,1491.0
2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,0,...,0,0,0,0,0,0,1,0,0,334.0
3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,225.0
4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,1,...,0,0,0,0,0,0,0,0,0,603.0
5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,228.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1678,Mat' i syn (1997),06-Feb-1998,,http://us.imdb.com/M/title-exact?Mat%27+i+syn+...,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1.0
1679,B. Monkey (1998),06-Feb-1998,,http://us.imdb.com/M/title-exact?B%2E+Monkey+(...,0,0,0,0,0,0,...,0,0,0,0,1,0,1,0,0,3.0
1680,Sliding Doors (1998),01-Jan-1998,,http://us.imdb.com/Title?Sliding+Doors+(1998),0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,2.0
1681,You So Crazy (1994),01-Jan-1994,,http://us.imdb.com/M/title-exact?You%20So%20Cr...,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,3.0


#### Giving recommendations

Hence, we are ready to give recommendations. 

We split the problem into four steps of growing difficulty.

1. We always recommend the most popular movie.
2. We recommend the most popular movie in the same genre of the best rated movie per each user.
3. We look at the last rated movie (whose rating is over 4) for each user and we recommend the most popular movie in the same genre.
4. We do as in $3.$ but with a random distribution, we pick uniformly amongst the first $10$ most popular movies.

##### Most popular movie recommendation

The first point is the easiest one, we only need to look at the movie whose score is the highest one.

In [57]:
# The +1 is due to the fact that pandas indices start from 1 rather than from 0.
df_items.loc[[df_items.score.argmax() + 1]]

Unnamed: 0_level_0,Title,Date,VideoReleaseDate,Url,unknown,Action,Adventure,Animation,Children,Comedy,...,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,score
MovieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
50,Star Wars (1977),01-Jan-1977,,http://us.imdb.com/M/title-exact?Star%20Wars%2...,0,1,1,0,0,0,...,0,0,0,0,1,1,0,1,0,2110.0


Unnamed: 0,UserId,MovieId,Rating,Timestamp
4,1,5,3,889751712
127,1,256,4,889751712
