In [1]:
# Move to the directory where the project is located
import os
os.chdir('/work/notebooks/enstit/RecommenderSystem')

<div align="center">
  <img src="./assets/steflix.png" width="225px">
</div>

# üçø STEFLIX: A Movie-based Recommender System

The aim of this repository is to build a **Recommender System** that uses Matrix Factorization to learn user and items embeddings from a (sparse) review matrix, and uses them to perform user-specific suggestions.

## Reading data

The data that we are going to use to populate our Recommender System is located in the `data` folder. IN particular, there is the `movies.csv` file that contains the list of all the movies in the collection, and the `ratings.csv` file that contains all the reviews left by users to the movies (in our system, a particular *user* can leave a score from 0 to 5 to any *movie*. It is possible to leave half points, e.g. 4.5 for an almost-perfect movie). 

In [2]:
import pandas as pd

# Read movies and ratings from the related CSV files
movies = pd.read_csv('./data/movies.csv', usecols=['movieId', 'title'])
ratings = pd.read_csv('./data/ratings.csv', usecols=['userId', 'movieId', 'rating'])

users_list = ratings.userId.unique().astype(str).tolist()
movies_list = movies.title.unique().astype(str).tolist()

C = pd.pivot_table(ratings, values='rating', index=['userId'], columns=['movieId']).values[:, :]

Now we have:
* the `users_list`, that contains the unique names of all the users that reviewed a movie,
* the `movies_list`, with the name of all the movies in the reviewed collection, and
* the matrix `C`, that represents the feedback that any user gave to any movie in the collection. If no review for a specific movie has been left by a user, the related cell will contain a `nan` value.

## Building the Recommender System

The `utils` folder contains all the class definition for our system to work. In aprticular, the `recsys` package contains the definition of the `RecommenderSystem` class.

When we initialize the object, it automatically use Weighted Matrix Factorization to compute *users* and *items* embeddings.

In [3]:
from utils.recsys import RecommenderSystem

#rs = RecommenderSystem(name="Steflix", reviews=C, users=users_list, items=movies_list)
#rs.save()

rs = RecommenderSystem(name="Steflix")
rs = rs.load()

2024-01-18 18:29:37,335 - DEBUG - Loading Steflix from filesystem...
2024-01-18 18:29:37,336 - DEBUG - No filename provided, using default (object_name.pkl)...


In [64]:
rs.print_user_chart(user="26", first_n=100)

  Position  Movie Name                           Rating
         1  Seven (a.k.a. Se7en) (1995)               4
         2  Die Hard: With a Vengeance (1995)         4
         3  Pulp Fiction (1994)                       4
         4  Firm, The (1993)                          4
         5  Fugitive, The (1993)                      4
         6  Batman (1989)                             4
         7  Silence of the Lambs, The (1991)          4
         8  GoldenEye (1995)                          3
         9  Babe (1995)                               3
        10  Apollo 13 (1995)                          3
        11  Batman Forever (1995)                     3
        12  Net, The (1995)                           3
        13  Disclosure (1994)                         3
        14  Natural Born Killers (1994)               3
        15  Quiz Show (1994)                          3
        16  Ace Ventura: Pet Detective (1994)         3
        17  Clear and Present Danger (1994)     

In [65]:
indexes = rs.contentbased_filtering(user_index=26, top_k=10)
[rs.items[index] for index in indexes]

['Saving Private Ryan (1998)',
 'Sommersby (1993)',
 'Alaska (1996)',
 'Guinevere (1999)',
 'Gods Must Be Crazy, The (1980)',
 'Spawn (1997)',
 'Blood Feast (1963)',
 'Candidate, The (1972)',
 'Rules of Engagement (2000)',
 'Far From Home: The Adventures of Yellow Dog (1995)']