# Item Based CF

The objective of this notebook is to predict which movies someone likes, based on ratings given by the user in other movies. 

This dataset includes 4 files, namely:
* [Ratings](assets/movie_ratings/dataset/ratings.csv)
* [Movies](assets/movie_ratings/dataset/movies.csv)
* [Links](assets/movie_ratings/dataset/links.csv)
* [Tags](assets/movie_ratings/dataset/tags.csv)

Important information about this dataset

* Dataset: https://grouplens.org/datasets/movielens/latest/
* Additional Information: https://en.wikipedia.org/wiki/Item-item_collaborative_filtering

# Imports

In [1]:
import numpy as np
import pandas as pd
import matplotlib as plt
import seaborn as sns
%matplotlib inline

# Get Data

##### Ratings CSV

In [2]:
ratings = pd.read_csv('assets/movie_ratings/dataset/ratings.csv')

In [3]:
ratings.head()

Unnamed: 0,user_id,movie_id,rating,timestamp
0,0,5952,5.0,964982703
1,0,4993,5.0,964982703
2,0,2959,5.0,964982703
3,0,588,1.0,964982703
4,1,1,4.0,964982703


##### Links CSV

In [4]:
links = pd.read_csv('assets/movie_ratings/dataset/links.csv')

In [5]:
links.head()

Unnamed: 0,movie_id,imdb_id,tmdb_id
0,1,114709,862.0
1,2,113497,8844.0
2,3,113228,15602.0
3,4,114885,31357.0
4,5,113041,11862.0


##### Movies CSV

In [6]:
movies = pd.read_csv('assets/movie_ratings/dataset/movies.csv')

In [7]:
movies.head()

Unnamed: 0,movie_id,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


##### Tags CSV

In [8]:
tags = pd.read_csv('assets/movie_ratings/dataset/tags.csv')

In [9]:
tags.head()

Unnamed: 0,user_id,movie_id,tag,timestamp
0,2,60756,funny,1445714994
1,2,60756,Highly quotable,1445714996
2,2,60756,will ferrell,1445714992
3,2,89774,Boxing story,1445715207
4,2,89774,MMA,1445715200


# Merge datasets

In [10]:
ratings = pd.merge(movies, ratings)
ratings.head()

Unnamed: 0,movie_id,title,genres,user_id,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1,4.0,964982703
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,5,4.0,847434962
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,7,4.5,1106635946
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,15,2.5,1510577970
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,17,4.5,1305696483


# Check Movie with more reviews

In [12]:
ratings['number_reviews'] = ratings.groupby('title')['title'].cumcount()
ratings.sort_values(by=['number_reviews', 'title'], inplace=True, ascending=False)

top_ratings = ratings.groupby('title')[['movie_id','title','number_reviews']].max()
top_ratings.sort_values(by=['number_reviews'], inplace=True, ascending=False)
ratings.drop('number_reviews', axis=1, inplace=True)

top_ratings.head(30)

Unnamed: 0_level_0,movie_id,title,number_reviews
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Forrest Gump (1994),356,Forrest Gump (1994),328
"Shawshank Redemption, The (1994)",318,"Shawshank Redemption, The (1994)",316
Pulp Fiction (1994),296,Pulp Fiction (1994),306
"Silence of the Lambs, The (1991)",593,"Silence of the Lambs, The (1991)",278
"Matrix, The (1999)",2571,"Matrix, The (1999)",277
Star Wars: Episode IV - A New Hope (1977),260,Star Wars: Episode IV - A New Hope (1977),250
Jurassic Park (1993),480,Jurassic Park (1993),237
Braveheart (1995),110,Braveheart (1995),236
Terminator 2: Judgment Day (1991),589,Terminator 2: Judgment Day (1991),223
Schindler's List (1993),527,Schindler's List (1993),219


# Movie Selection

######  Let's select 3 movies as our favorite.

<img style="float: left;margin: auto;" src="assets/movie_ratings/images/club.jpg" alt="fight_club" width="200"/> 
<img style="float: left;margin: auto;" src="assets/movie_ratings/images/lotr1.jpg" alt="lotr1" width="200"/>
<img style="float: left;margin: auto;" src="assets/movie_ratings/images/lotr2.jpg" alt="lotr2" width="200"/>

###### And select one movie as least favorite.

<img style="float: center;margin: auto;" src="assets/movie_ratings/images/aladdin.jpg" alt="fight_club" width="200" /> 

Note that both **Aladdin** and **Lord of The Rings** have various genres in common.

# My ratings

In [13]:
userRatings = ratings.pivot_table(index=['user_id'],columns=['title'],values='rating')
userRatings.head()

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...,Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,,,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,4.0,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,


In [14]:
corrMatrix = userRatings.corr(method='pearson', min_periods=100)
corrMatrix.head()

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...,Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'71 (2014),,,,,,,,,,,...,,,,,,,,,,
'Hellboy': The Seeds of Creation (2004),,,,,,,,,,,...,,,,,,,,,,
'Round Midnight (1986),,,,,,,,,,,...,,,,,,,,,,
'Salem's Lot (2004),,,,,,,,,,,...,,,,,,,,,,
'Til There Was You (1997),,,,,,,,,,,...,,,,,,,,,,


These are the movies I have given a review to

In [15]:
myRatings = userRatings.loc[0].dropna()
myRatings

title
Aladdin (1992)                                               1.0
Fight Club (1999)                                            5.0
Lord of the Rings: The Fellowship of the Ring, The (2001)    5.0
Lord of the Rings: The Two Towers, The (2002)                5.0
Name: 0, dtype: float64

# Evaluation

##### Ratings CSV

In [16]:
simCandidates = pd.Series(dtype=pd.Float64Dtype)
for i in range(0, len(myRatings.index)):
    print ("Adding sims for " + myRatings.index[i] + "...")
    # Retrieve similar movies to this one that user ID 0 rated
    sims = corrMatrix[myRatings.index[i]].dropna()
    # Now scale its similarity by how well user ID 0 rated this movie
    sims = sims.map(lambda x: x * myRatings[i])
    # Add the score to the list of similarity candidates
    simCandidates = simCandidates.append(sims)
    
#Glance at our results so far:
print ("sorting...")
simCandidates.sort_values(inplace = True, ascending = False)
print (simCandidates.head(10))

Adding sims for Aladdin (1992)...
Adding sims for Fight Club (1999)...
Adding sims for Lord of the Rings: The Fellowship of the Ring, The (2001)...
Adding sims for Lord of the Rings: The Two Towers, The (2002)...
sorting...
Fight Club (1999)                                            5.000000
Lord of the Rings: The Fellowship of the Ring, The (2001)    5.000000
Lord of the Rings: The Two Towers, The (2002)                5.000000
Lord of the Rings: The Fellowship of the Ring, The (2001)    4.440004
Lord of the Rings: The Two Towers, The (2002)                4.440004
Lord of the Rings: The Return of the King, The (2003)        4.249695
Lord of the Rings: The Return of the King, The (2003)        4.107515
Pulp Fiction (1994)                                          2.717325
Seven (a.k.a. Se7en) (1995)                                  2.553510
Memento (2000)                                               2.406009
dtype: float64


##### Ratings CSV

In [17]:
simCandidates = simCandidates.groupby(simCandidates.index).sum()

In [18]:
simCandidates.sort_values(inplace = True, ascending = False)
simCandidates.head(10)

Lord of the Rings: The Fellowship of the Ring, The (2001)                         10.789803
Lord of the Rings: The Two Towers, The (2002)                                     10.278109
Lord of the Rings: The Return of the King, The (2003)                              9.535907
Fight Club (1999)                                                                  7.187905
Memento (2000)                                                                     5.331686
Sixth Sense, The (1999)                                                            4.666846
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)     4.653554
Matrix, The (1999)                                                                 4.564628
Star Wars: Episode V - The Empire Strikes Back (1980)                              4.502416
Silence of the Lambs, The (1991)                                                   4.134995
dtype: float64

##### Ratings CSV

In [19]:
filteredSims = simCandidates.drop(myRatings.index)
filteredSims.head(10)

Lord of the Rings: The Return of the King, The (2003)                             9.535907
Memento (2000)                                                                    5.331686
Sixth Sense, The (1999)                                                           4.666846
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)    4.653554
Matrix, The (1999)                                                                4.564628
Star Wars: Episode V - The Empire Strikes Back (1980)                             4.502416
Silence of the Lambs, The (1991)                                                  4.134995
Pulp Fiction (1994)                                                               3.934003
Saving Private Ryan (1998)                                                        3.525138
Shawshank Redemption, The (1994)                                                  3.504978
dtype: float64

<img src="assets/movie_ratings/images/lotr3.jpg" alt="lotr3" width="200"/>

##### Results

The top 10 movies are all good suggestions for the user in question(me).

With just 4 reviews from the user, it was possible to get a quality prediction of movies for this user tastes.

It is expected to get even better results if the user had a larger set of movies reviewed.