Make your own recommender system
--------------------------------

In the following, you will implement the formulas from section 3 of the handouts, and use them to create a simple recommender system.

### Load the data
The file `u.data` contains the ratings of 100,000 movies by many users. It has 4 fields : `user_id`, `movie_id`, `rating`, and `timestamp`. Load the data file into a pandas DataFrame:

In [None]:
import pandas as pd
ratings = pd.read_csv ('u.data') 

# We don't need the column timestamp, so we drop it
ratings.drop('timestamp', axis=1 , inplace=True)
print ('Ratings data loaded, size=', ratings.shape)

ratings.head()

### Enter the profile

We need the previous ratings given by a user to predict a new rating for that user. Using IDs from this [collection](https://people.cs.kuleuven.be/~albrecht.zimmermann/movie-names-well-formatted.html), enter ratings for 20 movies into a list named `profile`. You will later use this list to predict the rating for a new movie. Remember that these ratings should be all given by the same individual.

Each item in `profile` is a `(movie_id, rating)` tuple.

In [None]:
profile = [
#   (movie1, rating1),
#   (movie2, rating2),
#   (movie3, rating3),
#   (movie4, rating4),
#   ...    
]

# convert to Pandas DataFrame
profile = pd.DataFrame(profile, columns=['item_id', 'rating'])
profile

### Breaking down the job

Our goal in the following sections is to implement a function that predicts the rating for an unrated movie using collaborative filtering.According to the formulas in section 3, we can decompose this into three tasks:

- To find all users that have rated the queried movie. This is the set $\{v : i \in I_v \}$ in equation (1) of exercise (3). We will call these users *co-raters*. 
- To compute the similarity of two users, that is to compute $w(u, v)$ in equation (2) of exercise (3).
- To predict the rating for an unrated movie.

You will implement three functions corresponding to these three tasks. 

### Find co-raters

Implement a function that given an item and a database of ratings, finds all users that have rated that item. Test your futnction on `ratings` data.

In [None]:
def co_raters (movie_id, rating_data):
    # return the co-raters as a list or Pandas Series

### Compute the similarity

In general, we need a function that given two users $u, v$ and a database of ratings, finds the Pearson correlation between the two users. However, the goal of this session is to give recommendations based on ratings that you have entered in the list `profile`. 

Implement a function that computes the similarity between the given `profile` and a user $v$. 

**Hint for Pandas users:** *The weights are computed based on ratings of movies rated by both $u$ and $v$. An inner join can be useful to conveniently isolate the relevant entries. See [Merging](http://pandas.pydata.org/pandas-docs/stable/merging.html).*

In [None]:
def similarity (profile, v, rating_data):
    # return the similarity between u and v

### Predict the rating for a movie

Implement a function that given a database of ratings, a user profile, and an unrated movie, predicts a rating for that movie. This function computes $\hat{R}_{(u,i)}$ in equation (1) of exercise (3). 

In [None]:
def predict (movie_id, rating_data, profile):
    # return the predicted rating for movie_id

Use this function and the profile that you created earlier to predict the rating for an unrated movied. 

Try your program on the movies that you earlier entered in the `profile` dataframe.