#       Movie Recommendation System

## Objective

To utilize movieLens Dataset(https://grouplens.org/datasets/movielens/latest/) and Perform Analytics and Machine Learning for Following Tasks:

-    **Task 1: To Recommend Movies based on Ratings Provided by the users**
-    **Task 2: To Recommend Movies based on other similar user's viewing experience**

we will break these tasks into further sub-tasks

### Importing Necessary Libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import re


%matplotlib inline

### Reading csv files into project

In [6]:
ratingData = pd.read_csv('Dataset/ratings.csv')

moviesData = pd.read_csv('Dataset/movies.csv')

linksData = pd.read_csv('Dataset/links.csv')

In [3]:
ratingData.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [4]:
moviesData.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [8]:
linksData.head()

Unnamed: 0,movieId,imdbId,tmdbId
0,1,114709,862.0
1,2,113497,8844.0
2,3,113228,15602.0
3,4,114885,31357.0
4,5,113041,11862.0


## Task 1: To Recommend Movies based on Ratings Provided by the users

To achieve This Task we will Perform some sub-tasks:
-    **Merging Both Ratings and Movies Dataset**
-    **Grouping Merged Dataset Based on movieId**
-    **Getting Rating count and mean Ratings**
-    **Calculating Recommendation Score**
-    **Top 10 Recommended Movies**
-    **Top 10 Movies based on Ratings** 

### Merging Both Ratings and Movies Datasets
to achieve this we will use **Merge** Function of Pandas Library

In [53]:
wholeData = pd.merge(ratingData,moviesData,on="movieId")


wholeData.head()

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,5,1,4.0,847434962,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,7,1,4.5,1106635946,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
3,15,1,2.5,1510577970,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
4,17,1,4.5,1305696483,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy


In [30]:
wholeData.dtypes

userId         int64
movieId        int64
rating       float64
timestamp      int64
title         object
genres        object
imdbId         int64
tmdbId       float64
dtype: object

### Grouping Merged Dataset Based on movieId

Generating new Dataset Based on grouping.

In [54]:
newData = pd.DataFrame()

newData['title'] = wholeData.groupby('movieId')['title'].unique().astype(str)
newData['movieId'] =[movieId for movieId, df in newData.groupby(['movieId'])]

newData.head(10)

Unnamed: 0_level_0,title,movieId
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,['Toy Story (1995)'],1
2,['Jumanji (1995)'],2
3,['Grumpier Old Men (1995)'],3
4,['Waiting to Exhale (1995)'],4
5,['Father of the Bride Part II (1995)'],5
6,['Heat (1995)'],6
7,['Sabrina (1995)'],7
8,['Tom and Huck (1995)'],8
9,['Sudden Death (1995)'],9
10,['GoldenEye (1995)'],10


### Getting Rating count and mean Ratings
to achieve this we will use **count** and **mean** functions

In [55]:
newData['count'] = wholeData.groupby('movieId')['rating'].count() 
newData['avg.rating'] = wholeData.groupby('movieId')['rating'].mean()

newData.head()

Unnamed: 0_level_0,title,movieId,count,avg.rating
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,['Toy Story (1995)'],1,215,3.92093
2,['Jumanji (1995)'],2,110,3.431818
3,['Grumpier Old Men (1995)'],3,52,3.259615
4,['Waiting to Exhale (1995)'],4,7,2.357143
5,['Father of the Bride Part II (1995)'],5,49,3.071429


### Calculating Recommendation Score
to achieve this we will multiply **count** with **avg. rating**

In [56]:
newData['recommendScore'] = newData['count'] * newData['avg.rating']

newData.head()

Unnamed: 0_level_0,title,movieId,count,avg.rating,recommendScore
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,['Toy Story (1995)'],1,215,3.92093,843.0
2,['Jumanji (1995)'],2,110,3.431818,377.5
3,['Grumpier Old Men (1995)'],3,52,3.259615,169.5
4,['Waiting to Exhale (1995)'],4,7,2.357143,16.5
5,['Father of the Bride Part II (1995)'],5,49,3.071429,150.5


### Top 10 Recommended Movies
to achieve this we sort Movies based on recommend Score by using **sort_values** Function

In [58]:
top = newData.sort_values('recommendScore',ascending=False)
print("Below are the Top 10 Recommended Movies:")
top.head(10)['title']


Below are the Top 10 Recommended Movies:


movieId
318              ['Shawshank Redemption, The (1994)']
356                           ['Forrest Gump (1994)']
296                           ['Pulp Fiction (1994)']
2571                           ['Matrix, The (1999)']
593              ['Silence of the Lambs, The (1991)']
260     ['Star Wars: Episode IV - A New Hope (1977)']
110                             ['Braveheart (1995)']
2959                            ['Fight Club (1999)']
527                       ["Schindler's List (1993)"]
480                          ['Jurassic Park (1993)']
Name: title, dtype: object

### Top 10 Movies based on Ratings
to achieve this we will first drop all data with rating count less then 100 then sorting on the basis of avg. ratings

In [59]:
newData = newData[newData['count']>100]
top2 = newData.sort_values('avg.rating',ascending=False)

top2.head(10)['title']

movieId
318               ['Shawshank Redemption, The (1994)']
858                          ['Godfather, The (1972)']
2959                             ['Fight Club (1999)']
1221                ['Godfather: Part II, The (1974)']
48516                         ['Departed, The (2006)']
1213                             ['Goodfellas (1990)']
58559                      ['Dark Knight, The (2008)']
50                      ['Usual Suspects, The (1995)']
1197                    ['Princess Bride, The (1987)']
260      ['Star Wars: Episode IV - A New Hope (1977)']
Name: title, dtype: object