# Recommender Systems Project

## 0. Quick Start
To run this notebook you just need to have [pipenv](https://github.com/pypa/pipenv) installed.
Then run these 3 commands:
- first install the dependencies with: `pipenv install`
- launch the virtual env: `pipenv shell`
- finally start jupyter and open the notebook: `jupyter-lab`

In [16]:
import sys
sys.path.append("../src")
from tqdm import tqdm
import numpy as np
import pandas as pd

from surprise import NormalPredictor
from surprise import Dataset
from surprise import Reader
from surprise.model_selection import cross_validate

## 1. Introduction
Recommender systems goal is to push *relevant* items to a given user. Understanding and modelling the user's preferences is required to reach this goal. In this project you will learn how to model the user's preferences with the [Surprise library](http://surpriselib.com/) to build different recommender systems. The first one will be a pure *collaborative filtering* approach, and the second one will rely on item attributes in a *content-based* way.

## 2. Loading Data
We use here the [MovieLens dataset](https://grouplens.org/datasets/movielens/). It contains 25 millions of users ratings. the data are in the `./data/raw` folder. We could load directly the .csv file with [a built-in Surprise function](https://github.com/NicolasHug/Surprise/blob/ef3ed6e98304dbf8d033c8eee741294b05b5ba07/surprise/dataset.py#L105), but it's more convenient to load it through a Pandas dataframe for later flexibility purpose.

In [2]:
RATINGS_DATA_FILE = './data/raw/ratings.csv'

In [9]:
# load the raw csv into a data_frame
df_ratings = pd.read_csv(RATINGS_DATA_FILE)

# drop the timestamp column since we dont need it now
df_ratings = df_ratings.drop(columns="timestamp")

In [17]:
# check we have 25M users
df_ratings.userId.count()

25000095

## 3. Collaborative Filtering