In [1]:
# Move to the directory where the project is located
import os
os.chdir('/work/notebooks/enstit/RecommenderSystem')

# 💬 Recommender System

The aim of this repository is to build a **Recommender System** that uses Matrix Factorization to learn user and items embeddings from a (sparse) review matrix, and uses them to perform user-specific suggestions.

## Reading data

The data that we are going to use to populate our Recommender System is located in the `data` folder. IN particular, there is the `movies.csv` file that contains the list of all the movies in the collection, and the `ratings.csv` file that contains all the reviews left by users to the movies (in our system, a particular *user* can leave a score from 0 to 5 to any *movie*. It is possible to leave half points, e.g. 4.5 for an almost-perfect movie). 

In [2]:
import pandas as pd

# Read movies and ratings from the related CSV files
movies = pd.read_csv('./data/movies.csv', usecols=['movieId', 'title'])
ratings = pd.read_csv('./data/ratings.csv', usecols=['userId', 'movieId', 'rating'])

users_list = ratings.userId.unique().astype(str).tolist()
movies_list = movies.title.unique().astype(str).tolist()[:1000]

C = pd.pivot_table(ratings, values='rating', index=['userId'], columns=['movieId']).values[:, :300]

Now we have:
* the `users_list`, that contains the unique names of all the users that reviewed a movie,
* the `movies_list`, with the name of all the movies in the reviewed collection, and
* the matrix `C`, that represents the feedback that any user gave to any movie in the collection. If no review for a specific movie has been left by a user, the related cell will contain a `nan` value.

## Building the Recommender System

The `utils` folder contains all the class definition for our system to work. In aprticular, the `recsys` package contains the definition of the `RecommenderSystem` class.

When we initialize the object, it automatically use Weighted Matrix Factorization to compute *users* and *items* embeddings.

In [3]:
from utils.recsys import RecommenderSystem

rs = RecommenderSystem(reviews=C, users=users_list, items=movies_list)

2024-01-18 15:29:52,884 - DEBUG - Building embeddings...
2024-01-18 15:29:55,677 - DEBUG - Iteration: 1 -> Loss: 591.8055561791718
2024-01-18 15:29:58,291 - DEBUG - Iteration: 2 -> Loss: 146.84881658382682
2024-01-18 15:30:00,917 - DEBUG - Iteration: 3 -> Loss: 113.28918339441566


In [17]:
rs.print_user_chart(user="20")

  Position  Movie Name                       Rating
         1  Pocahontas (1995)                   5
         2  Goofy Movie, A (1995)               4.5
         3  Little Princess, A (1995)           4.5
         4  Balto (1995)                        4
         5  Babe (1995)                         4
         6  Muppet Treasure Island (1996)       3.5
         7  Santa Clause, The (1994)            3.5
         8  Jumanji (1995)                      3
         9  Casper (1995)                       3
        10  Tom and Huck (1995)                 1
