# Movie Recommendation System

In this project, we will be building a Movie Recommendation System based of **movie rating** that will produce very good results in very less lines of code.

([source link](https://machinelearningprojects.net/movie-recommendation-system/))

## Importing libraries

In [1]:
import pandas as pd

## Loading the datasets

### Reading input data

In [5]:
df1 = pd.read_csv("data/u.data",sep="\t")
df1.columns = ["user_id","item_id","rating","timestamp"]
df1.head()

Unnamed: 0,user_id,item_id,rating,timestamp
0,0,172,5,881250949
1,0,133,1,881250949
2,196,242,3,881250949
3,186,302,3,891717742
4,22,377,1,878887116


In [6]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100002 entries, 0 to 100001
Data columns (total 4 columns):
 #   Column     Non-Null Count   Dtype
---  ------     --------------   -----
 0   user_id    100002 non-null  int64
 1   item_id    100002 non-null  int64
 2   rating     100002 non-null  int64
 3   timestamp  100002 non-null  int64
dtypes: int64(4)
memory usage: 3.1 MB


### Reading movie titles

In [7]:
df2 = pd.read_csv("data/Movie_Id_Titles")
df2.head()

Unnamed: 0,item_id,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


In [8]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1682 entries, 0 to 1681
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   item_id  1682 non-null   int64 
 1   title    1682 non-null   object
dtypes: int64(1), object(1)
memory usage: 26.4+ KB


## Merging movie data and movie titles

In [10]:
df = pd.merge(df1, df2, on="item_id")
df.head(10)

Unnamed: 0,user_id,item_id,rating,timestamp,title
0,0,172,5,881250949,"Empire Strikes Back, The (1980)"
1,213,172,5,878955442,"Empire Strikes Back, The (1980)"
2,92,172,4,875653271,"Empire Strikes Back, The (1980)"
3,77,172,3,884752562,"Empire Strikes Back, The (1980)"
4,194,172,3,879521474,"Empire Strikes Back, The (1980)"
5,230,172,4,880484523,"Empire Strikes Back, The (1980)"
6,244,172,4,880605665,"Empire Strikes Back, The (1980)"
7,295,172,4,879516986,"Empire Strikes Back, The (1980)"
8,56,172,5,892737191,"Empire Strikes Back, The (1980)"
9,95,172,4,879196847,"Empire Strikes Back, The (1980)"


## Grouping same movie entries

In [12]:
rating_and_no_of_rating = pd.DataFrame(df.groupby("title")["rating"].mean().sort_values(ascending=False))
rating_and_no_of_rating

Unnamed: 0_level_0,rating
title,Unnamed: 1_level_1
They Made Me a Criminal (1939),5.0
Marlene Dietrich: Shadow and Light (1996),5.0
"Saint of Fort Washington, The (1993)",5.0
Someone Else's America (1995),5.0
Star Kid (1997),5.0
...,...
"Eye of Vichy, The (Oeil de Vichy, L') (1993)",1.0
King of New York (1990),1.0
Touki Bouki (Journey of the Hyena) (1973),1.0
"Bloody Child, The (1996)",1.0


## Adding a column of no. of ratings

In [13]:
rating_and_no_of_rating["no_of_ratings"] = df.groupby("title")["rating"].count()
rating_and_no_of_rating

Unnamed: 0_level_0,rating,no_of_ratings
title,Unnamed: 1_level_1,Unnamed: 2_level_1
They Made Me a Criminal (1939),5.0,1
Marlene Dietrich: Shadow and Light (1996),5.0,1
"Saint of Fort Washington, The (1993)",5.0,2
Someone Else's America (1995),5.0,1
Star Kid (1997),5.0,3
...,...,...
"Eye of Vichy, The (Oeil de Vichy, L') (1993)",1.0,1
King of New York (1990),1.0,1
Touki Bouki (Journey of the Hyena) (1973),1.0,1
"Bloody Child, The (1996)",1.0,1


## Sorting on no. of ratings

In [14]:
rating_and_no_of_rating = rating_and_no_of_rating.sort_values("no_of_ratings", ascending=False)
rating_and_no_of_rating.head()

Unnamed: 0_level_0,rating,no_of_ratings
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Star Wars (1977),4.358491,583
Contact (1997),3.803536,509
Fargo (1996),4.155512,508
Return of the Jedi (1983),4.00789,507
Liar Liar (1997),3.156701,485


## Creating a pivot table

In [15]:
pt = df.pivot_table(index="user_id", columns="title", values="rating")
pt.head()

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,,,,,,,,,,,...,,,,,,,,,,
1,,,2.0,5.0,,,3.0,4.0,,,...,,,,5.0,3.0,,,,4.0,
2,,,,,,,,,1.0,,...,,,,,,,,,,
3,,,,,2.0,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,


## Checking movie names

In [9]:
rating_and_no_of_rating.index

Index(['Star Wars (1977)', 'Contact (1997)', 'Fargo (1996)',
       'Return of the Jedi (1983)', 'Liar Liar (1997)',
       'English Patient, The (1996)', 'Scream (1996)', 'Toy Story (1995)',
       'Air Force One (1997)', 'Independence Day (ID4) (1996)',
       ...
       'Angela (1995)', 'Century (1993)', 'Johns (1996)',
       'Love Is All There Is (1996)', 'B. Monkey (1998)',
       'Land and Freedom (Tierra y libertad) (1995)', 'Big One, The (1997)',
       'Cyclo (1995)', 'Mirage (1995)', 'Crude Oasis, The (1995)'],
      dtype='object', name='title', length=1664)

## Live Prediction

In [55]:
def live_prediction(movie_name, rating_threshold=100):
    movie_vector = pt[movie_name].dropna()           # pick movie vector from the pivot table.
    similar_movies = pt.corrwith(movie_vector)       # correlate movie vector with other movies
    corr_df = pd.DataFrame(similar_movies, columns=["Correlation"])
    corr_df = corr_df.join(rating_and_no_of_rating["no_of_ratings"])

    corr_df = corr_df[corr_df["no_of_ratings"] > rating_threshold].sort_values("Correlation", ascending=False).dropna()
    print(corr_df.head(10))

In [58]:
import warnings
warnings.filterwarnings('ignore') # Suppress FutureWarning messages
movie_name="Scream (1996)"
live_prediction(movie_name)

                                   Correlation  no_of_ratings
title                                                        
Scream (1996)                         1.000000            478
Scream 2 (1997)                       0.706028            106
Seven (Se7en) (1995)                  0.435188            236
Starship Troopers (1997)              0.419322            211
Nightmare on Elm Street, A (1984)     0.410796            111
Cape Fear (1991)                      0.397245            171
Interview with the Vampire (1994)     0.386182            137
Natural Born Killers (1994)           0.383332            128
Young Guns (1988)                     0.381230            101
Happy Gilmore (1996)                  0.375235            149
