# Movie Recommendation System
Used in OTT Platforms like Netflix, Amazon Prime Etc
## Types of Recommendation Systems:
### 1) Content Based Recommendation System
 Recommend movies based on Content of the movie 
 (This Project is also going to be Content Based Movie Recommendation System )

### 2) Popularity Based Recommendation System
 Recommends movies based on Pupularity of movie script, Watch Time, Reviews Etc.,
 For example, Top 10 Movies of the Week etc.
### 3) Collaborative Recommendation System
 Groups People based on their watching pattern. Recommends According to similar identified tastes. Based on the watching pattern.

## Overview of the STEPS INVOLVED:

#### 1)Data Needed to be collected
#### 2)Data Pre-Processing
#### 3)Feature extraction 
Convert Textual data into Feature vectors in order to convert them into numerical Values
#### 4)Similarity Score
Similarity Confidence Score
#### 5)User Input
Based on this input the user should get the movie recommendation
For this we are using Cosine Similarity.
Each movie -- kind of a vector -- compare similar movies --- generate list of movies
#### 6)List of movies 
as output

##### Importing the dependencies

In [1]:
## Importing the Dependencies

import numpy as np
import pandas as pd
import difflib ## will be used compare and get the closest match to the input value
from sklearn.feature_extraction.text import TfidfVectorizer ## used to transform the textual data into numerical feature vectors to easily find cosine similarity values
from sklearn.metrics.pairwise import cosine_similarity ## used to find cosine similarity values

##### Data collection and Preprocessing

In [2]:
## Loading the data from csv file to a pandas dataframe
movies_data=pd.read_csv("movies.csv")

In [3]:
## Printing the first 5 rows of the dataframe
movies_data.head()

Unnamed: 0,index,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,cast,crew,director
0,0,237000000,Action Adventure Fantasy Science Fiction,http://www.avatarmovie.com/,19995,culture clash future space war space colony so...,en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Sam Worthington Zoe Saldana Sigourney Weaver S...,"[{'name': 'Stephen E. Rivkin', 'gender': 0, 'd...",James Cameron
1,1,300000000,Adventure Fantasy Action,http://disney.go.com/disneypictures/pirates/,285,ocean drug abuse exotic island east india trad...,en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Johnny Depp Orlando Bloom Keira Knightley Stel...,"[{'name': 'Dariusz Wolski', 'gender': 2, 'depa...",Gore Verbinski
2,2,245000000,Action Adventure Crime,http://www.sonypictures.com/movies/spectre/,206647,spy based on novel secret agent sequel mi6,en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Daniel Craig Christoph Waltz L\u00e9a Seydoux ...,"[{'name': 'Thomas Newman', 'gender': 2, 'depar...",Sam Mendes
3,3,250000000,Action Crime Drama Thriller,http://www.thedarkknightrises.com/,49026,dc comics crime fighter terrorist secret ident...,en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,Christian Bale Michael Caine Gary Oldman Anne ...,"[{'name': 'Hans Zimmer', 'gender': 2, 'departm...",Christopher Nolan
4,4,260000000,Action Adventure Science Fiction,http://movies.disney.com/john-carter,49529,based on novel mars medallion space travel pri...,en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,Taylor Kitsch Lynn Collins Samantha Morton Wil...,"[{'name': 'Andrew Stanton', 'gender': 2, 'depa...",Andrew Stanton


In [4]:
## We need to choose the columns and data specific for our column

## Finding the number of rows and columns in the dataset
movies_data.shape

(4803, 24)

In [5]:
## Selecting the relevant features for recommendation
selected_features=["genres","keywords","tagline","cast","director","overview"]
print(selected_features)

['genres', 'keywords', 'tagline', 'cast', 'director', 'overview']


In [6]:
## Replacing the missing/null values with null string

for feature in selected_features:
    movies_data[feature]=movies_data[feature].fillna("")

In [7]:
## Combining all the selected_features together

combined_features=movies_data["genres"]+" "+ movies_data["keywords"]+" "+ movies_data["tagline"]+" "+ movies_data["cast"]+" "+movies_data["director"]+" "+movies_data["overview"] 

In [8]:
print(combined_features)

0       Action Adventure Fantasy Science Fiction cultu...
1       Adventure Fantasy Action ocean drug abuse exot...
2       Action Adventure Crime spy based on novel secr...
3       Action Crime Drama Thriller dc comics crime fi...
4       Action Adventure Science Fiction based on nove...
                              ...                        
4798    Action Crime Thriller united states\u2013mexic...
4799    Comedy Romance  A newlywed couple's honeymoon ...
4800    Comedy Drama Romance TV Movie date love at fir...
4801      A New Yorker in Shanghai Daniel Henney Eliza...
4802    Documentary obsession camcorder crush dream gi...
Length: 4803, dtype: object


In [9]:
## Converting Textual data into feature vectors
vectorizer=TfidfVectorizer()

In [10]:
## Fitting and transforming the data

feature_vectors=vectorizer.fit_transform(combined_features)

In [11]:
print(feature_vectors)

  (0, 5245)	0.16506865163441017
  (0, 1014)	0.12285843797047788
  (0, 1243)	0.05270709034262742
  (0, 21523)	0.16650106431669984
  (0, 1281)	0.0319519027005067
  (0, 19650)	0.16120557402371172
  (0, 10368)	0.138482776543016
  (0, 2974)	0.09974129089792808
  (0, 27515)	0.1456922041909068
  (0, 2685)	0.09805341831299118
  (0, 4129)	0.06195738171764886
  (0, 18025)	0.1093940665047336
  (0, 28597)	0.15490753406550084
  (0, 19541)	0.04908322232312577
  (0, 18249)	0.14769810641761005
  (0, 27405)	0.031015019219557106
  (0, 7827)	0.17979882024342952
  (0, 14023)	0.04170092858483918
  (0, 17021)	0.14324534514274756
  (0, 20104)	0.19260031442011957
  (0, 4768)	0.12501550204808318
  (0, 239)	0.19716936546022965
  (0, 13474)	0.036478409857958735
  (0, 4288)	0.13213124585063998
  (0, 14180)	0.08418056281586365
  :	:
  (4802, 27153)	0.05940079357098276
  (4802, 19243)	0.061349419053434766
  (4802, 11941)	0.07829183421073846
  (4802, 29680)	0.07735547381237222
  (4802, 9346)	0.07104321122746965
  (4

#### Cosine Similarity

In [12]:
## Getting the similarity scores using Cosine Similarity

similarity= cosine_similarity(feature_vectors)

In [13]:
print(similarity) ## how a movie is similar to ALL other movies

[[1.         0.05083168 0.0332947  ... 0.02749812 0.0304889  0.0072518 ]
 [0.05083168 1.         0.04356836 ... 0.05077045 0.03100979 0.01521198]
 [0.0332947  0.04356836 1.         ... 0.02646984 0.04751623 0.01372603]
 ...
 [0.02749812 0.05077045 0.02646984 ... 1.         0.03481447 0.03546821]
 [0.0304889  0.03100979 0.04751623 ... 0.03481447 1.         0.03098945]
 [0.0072518  0.01521198 0.01372603 ... 0.03546821 0.03098945 1.        ]]


In [14]:
print(similarity.shape)

(4803, 4803)


In [15]:
## getting the movie name from the user
movie_name=input("Enter your favourite movie name:")

Enter your favourite movie name: Hero


In [16]:
## Creating a list with all the movie names given to the dataset

list_of_all_titles=movies_data["title"].tolist()
print(list_of_all_titles)

['Avatar', "Pirates of the Caribbean: At World's End", 'Spectre', 'The Dark Knight Rises', 'John Carter', 'Spider-Man 3', 'Tangled', 'Avengers: Age of Ultron', 'Harry Potter and the Half-Blood Prince', 'Batman v Superman: Dawn of Justice', 'Superman Returns', 'Quantum of Solace', "Pirates of the Caribbean: Dead Man's Chest", 'The Lone Ranger', 'Man of Steel', 'The Chronicles of Narnia: Prince Caspian', 'The Avengers', 'Pirates of the Caribbean: On Stranger Tides', 'Men in Black 3', 'The Hobbit: The Battle of the Five Armies', 'The Amazing Spider-Man', 'Robin Hood', 'The Hobbit: The Desolation of Smaug', 'The Golden Compass', 'King Kong', 'Titanic', 'Captain America: Civil War', 'Battleship', 'Jurassic World', 'Skyfall', 'Spider-Man 2', 'Iron Man 3', 'Alice in Wonderland', 'X-Men: The Last Stand', 'Monsters University', 'Transformers: Revenge of the Fallen', 'Transformers: Age of Extinction', 'Oz: The Great and Powerful', 'The Amazing Spider-Man 2', 'TRON: Legacy', 'Cars 2', 'Green Lant

In [17]:
## Finding the close match for the movie name given by the user

find_close_match=difflib.get_close_matches(movie_name,list_of_all_titles)
print(find_close_match)

['Hero', 'Her', 'Homefront']


In [18]:
close_match=find_close_match[0]
print(close_match)

Hero


In [19]:
## Finding the index of the movie with title

index_of_the_movie=movies_data[movies_data.title==close_match]["index"].values[0]
print(index_of_the_movie)

1136


In [20]:
## Getting a list of similar movies
similarity_score = list(enumerate(similarity[index_of_the_movie])) 
## Enumerate is used to run a loop within a list with counting
print(similarity_score)

[(0, 0.017319334745642515), (1, 0.03796711849162292), (2, 0.021169547347084917), (3, 0.015331240430517389), (4, 0.023707552360197514), (5, 0.0433053371002934), (6, 0.030260521535577477), (7, 0.03825103449492968), (8, 0.012185000227941261), (9, 0.03846175811336424), (10, 0.025808079836984223), (11, 0.02884506822528948), (12, 0.01122142943418707), (13, 0.03422123544737952), (14, 0.04501015907941584), (15, 0.022438403217924258), (16, 0.0187402696070213), (17, 0.02410363426775501), (18, 0.04070729737875524), (19, 0.03771135523058826), (20, 0.04252655840633091), (21, 0.026333685097964174), (22, 0.01792559026907726), (23, 0.02092727870092044), (24, 0.02150726584018391), (25, 0.011177156130501562), (26, 0.0279494243559782), (27, 0.03483841115384523), (28, 0.009128336175290682), (29, 0.022153095401964813), (30, 0.03230379008772611), (31, 0.030906819537473963), (32, 0.02193596070578629), (33, 0.027545179759580134), (34, 0.0038483858982404687), (35, 0.011573945183241934), (36, 0.0405277805601405

In [21]:
len(similarity_score)

4803

In [22]:
## Sorting the movies based on their similarity score

sorted_similar_movies = sorted(similarity_score, key=lambda x:x[1], reverse =True)
print(sorted_similar_movies)

[(1136, 1.0000000000000002), (2896, 0.2091899659920342), (1304, 0.20687622049601648), (2884, 0.19155416785810636), (2863, 0.18669273617273818), (1357, 0.15917350019214827), (317, 0.15394126802833277), (3300, 0.14646035078950984), (3892, 0.14607449343377948), (2013, 0.14465124759281636), (2592, 0.14370136521788107), (1095, 0.13946619303195565), (1298, 0.132455499924517), (2515, 0.11733125915315012), (404, 0.10990066686742411), (2644, 0.10366520206061974), (1868, 0.09424015539187687), (320, 0.09268121763683652), (836, 0.09017404920353568), (1002, 0.0895332211190987), (627, 0.08947880063414848), (698, 0.08933907806127073), (71, 0.08696609636454138), (681, 0.08157121062078328), (2910, 0.07835644325163364), (2717, 0.07725774080504612), (2066, 0.0764416623969134), (345, 0.07626099948788087), (4441, 0.07561745747204336), (448, 0.07505904089701237), (1837, 0.07370579246835013), (1284, 0.07320063233755421), (274, 0.07249878613656571), (4009, 0.07123040842888018), (2300, 0.06933888669872723), (4

In [23]:
## Print the name of similar movies based on the index

print("Movies suggested for you: \n")
i=1

for movie in sorted_similar_movies:
    index=movie[0]
    title_from_index= movies_data[movies_data.index==index]["title"].values[0]
    if(i<30):
        print(i,'.',title_from_index)
        i+=1

Movies suggested for you: 

1 . Hero
2 . A Woman, a Gun and a Noodle Shop
3 . The Grandmaster
4 . 2046
5 . House of Flying Daggers
6 . Ip Man 3
7 . The Flowers of War
8 . My Lucky Star
9 . Coming Home
10 . Bodyguards and Assassins
11 . Highlander: Endgame
12 . Curse of the Golden Flower
13 . Red Cliff
14 . Crouching Tiger, Hidden Dragon
15 . Memoirs of a Geisha
16 . Ong Bak 2
17 . Cradle 2 the Grave
18 . Black Hawk Down
19 . The Forbidden Kingdom
20 . The One
21 . The Last Legion
22 . The Prince of Egypt
23 . The Mummy: Tomb of the Dragon Emperor
24 . The American President
25 . A Tale of Three Cities
26 . Brokeback Mountain
27 . Firefox
28 . Rush Hour 2
29 . Bambi


#### MOVIE RECOMMENDATION SYSTEM:
Incorporating all the final codes in a single cell

In [25]:
movie_name=input("Enter your favourite movie name:")

list_of_all_titles=movies_data["title"].tolist()

find_close_match=difflib.get_close_matches(movie_name,list_of_all_titles)

close_match=find_close_match[0]

index_of_the_movie=movies_data[movies_data.title==close_match]["index"].values[0]

similarity_score = list(enumerate(similarity[index_of_the_movie])) 


sorted_similar_movies = sorted(similarity_score, key=lambda x:x[1], reverse =True)

print("Movies suggested for you: \n")
i=1

for movie in sorted_similar_movies:
    index=movie[0]
    title_from_index= movies_data[movies_data.index==index]["title"].values[0]
    if(i<21):
        print(i,'.',title_from_index)
        i+=1



Enter your favourite movie name: The Package


Movies suggested for you: 

1 . The Jacket
2 . The Dead Zone
3 . Synecdoche, New York
4 . The Last Time I Committed Suicide
5 . Limbo
6 . Anna Karenina
7 . Tom Jones
8 . Payback
9 . Friday the 13th: A New Beginning
10 . Seeking a Friend for the End of the World
11 . Last Orders
12 . The Battle of Shaker Heights
13 . For Love of the Game
14 . Blade: Trinity
15 . Bringing Out the Dead
16 . Dreamer: Inspired By a True Story
17 . The R.M.
18 . Enter the Void
19 . Road House
20 . Winter's Tale
