---
<center><h1>Movie Recommendation System</h1></center>
<center><h3>Part of 30 Days 30 ML Projects Challenge</h3></center>

---

## 1) Understanding Problem Statement
---

In today's era of digital entertainment, the vast array of available movies and TV shows can overwhelm viewers when choosing what to watch. This project aims to tackle this issue through the development of a movie recommendation system, leveraging the power of data science and machine learning.

The problem can be classified as a **Recommendation System Machine Learning Problem**. The primary goal is **to construct a predictive model capable of suggesting personalized movie recommendations to users**. This model will analyze historical user preferences, movie ratings, and viewing habits to provide tailored movie suggestions. Additionally, it involves the application of **Collaborative Filtering**, **Content-Based Filtering**, or hybrid approaches to enhance recommendation accuracy.

By employing advanced recommendation algorithms and data analysis, this project seeks to simplify the decision-making process for viewers, enriching their entertainment experience while simultaneously demonstrating the practical use of machine learning in content recommendation systems.

## 2) Understanding Data
---

The project uses **Movies Data** which contains several variables (independent variables) and the outcome variable or dependent variable.

## 3) Getting System Ready
---
Importing required libraries


In [34]:
import numpy as np
import pandas as pd

# for text data preprocessing
import re
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import difflib

# for model buidling
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

### Downloading stop words for text preprocessing

In [2]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\manis\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [3]:
# printing the stopwords in English
print(stopwords.words('english'))

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', '

## 4) Data Eyeballing
---

### Laoding Data

In [4]:
movies_data = pd.read_csv('Datasets/Day18_Movies_Data.csv') 

In [5]:
movies_data

Unnamed: 0,index,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,cast,crew,director
0,0,237000000,Action Adventure Fantasy Science Fiction,http://www.avatarmovie.com/,19995,culture clash future space war space colony so...,en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Sam Worthington Zoe Saldana Sigourney Weaver S...,"[{'name': 'Stephen E. Rivkin', 'gender': 0, 'd...",James Cameron
1,1,300000000,Adventure Fantasy Action,http://disney.go.com/disneypictures/pirates/,285,ocean drug abuse exotic island east india trad...,en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Johnny Depp Orlando Bloom Keira Knightley Stel...,"[{'name': 'Dariusz Wolski', 'gender': 2, 'depa...",Gore Verbinski
2,2,245000000,Action Adventure Crime,http://www.sonypictures.com/movies/spectre/,206647,spy based on novel secret agent sequel mi6,en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Daniel Craig Christoph Waltz L\u00e9a Seydoux ...,"[{'name': 'Thomas Newman', 'gender': 2, 'depar...",Sam Mendes
3,3,250000000,Action Crime Drama Thriller,http://www.thedarkknightrises.com/,49026,dc comics crime fighter terrorist secret ident...,en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.312950,...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,Christian Bale Michael Caine Gary Oldman Anne ...,"[{'name': 'Hans Zimmer', 'gender': 2, 'departm...",Christopher Nolan
4,4,260000000,Action Adventure Science Fiction,http://movies.disney.com/john-carter,49529,based on novel mars medallion space travel pri...,en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,Taylor Kitsch Lynn Collins Samantha Morton Wil...,"[{'name': 'Andrew Stanton', 'gender': 2, 'depa...",Andrew Stanton
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4798,4798,220000,Action Crime Thriller,,9367,united states\u2013mexico barrier legs arms pa...,es,El Mariachi,El Mariachi just wants to play his guitar and ...,14.269792,...,81.0,"[{""iso_639_1"": ""es"", ""name"": ""Espa\u00f1ol""}]",Released,"He didn't come looking for trouble, but troubl...",El Mariachi,6.6,238,Carlos Gallardo Jaime de Hoyos Peter Marquardt...,"[{'name': 'Robert Rodriguez', 'gender': 0, 'de...",Robert Rodriguez
4799,4799,9000,Comedy Romance,,72766,,en,Newlyweds,A newlywed couple's honeymoon is upended by th...,0.642552,...,85.0,[],Released,A newlywed couple's honeymoon is upended by th...,Newlyweds,5.9,5,Edward Burns Kerry Bish\u00e9 Marsha Dietlein ...,"[{'name': 'Edward Burns', 'gender': 2, 'depart...",Edward Burns
4800,4800,0,Comedy Drama Romance TV Movie,http://www.hallmarkchannel.com/signedsealeddel...,231617,date love at first sight narration investigati...,en,"Signed, Sealed, Delivered","""Signed, Sealed, Delivered"" introduces a dedic...",1.444476,...,120.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,,"Signed, Sealed, Delivered",7.0,6,Eric Mabius Kristin Booth Crystal Lowe Geoff G...,"[{'name': 'Carla Hetland', 'gender': 0, 'depar...",Scott Smith
4801,4801,0,,http://shanghaicalling.com/,126186,,en,Shanghai Calling,When ambitious New York attorney Sam is sent t...,0.857008,...,98.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,A New Yorker in Shanghai,Shanghai Calling,5.7,7,Daniel Henney Eliza Coupe Bill Paxton Alan Ruc...,"[{'name': 'Daniel Hsia', 'gender': 2, 'departm...",Daniel Hsia


In [6]:
print('The size of Dataframe is: ', movies_data.shape)
print('-'*100)
print('The Column Name, Record Count and Data Types are as follows: ')
movies_data.info()
print('-'*100)

The size of Dataframe is:  (4803, 24)
----------------------------------------------------------------------------------------------------
The Column Name, Record Count and Data Types are as follows: 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 24 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   index                 4803 non-null   int64  
 1   budget                4803 non-null   int64  
 2   genres                4775 non-null   object 
 3   homepage              1712 non-null   object 
 4   id                    4803 non-null   int64  
 5   keywords              4391 non-null   object 
 6   original_language     4803 non-null   object 
 7   original_title        4803 non-null   object 
 8   overview              4800 non-null   object 
 9   popularity            4803 non-null   float64
 10  production_companies  4803 non-null   object 
 11  production_countries  48

In [7]:
# Defining numerical & categorical columns
numeric_features = [feature for feature in movies_data.columns if movies_data[feature].dtype != 'O']
categorical_features = [feature for feature in movies_data.columns if movies_data[feature].dtype == 'O']

# print columns
print('We have {} numerical features : {}'.format(len(numeric_features), numeric_features))
print('\nWe have {} categorical features : {}'.format(len(categorical_features), categorical_features))

We have 8 numerical features : ['index', 'budget', 'id', 'popularity', 'revenue', 'runtime', 'vote_average', 'vote_count']

We have 16 categorical features : ['genres', 'homepage', 'keywords', 'original_language', 'original_title', 'overview', 'production_companies', 'production_countries', 'release_date', 'spoken_languages', 'status', 'tagline', 'title', 'cast', 'crew', 'director']


In [8]:
print('Missing Value Presence in different columns of DataFrame are as follows : ')
print('-'*100)
total=movies_data.isnull().sum().sort_values(ascending=False)
percent=(movies_data.isnull().sum()/movies_data.isnull().count()*100).sort_values(ascending=False)
pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])

Missing Value Presence in different columns of DataFrame are as follows : 
----------------------------------------------------------------------------------------------------


Unnamed: 0,Total,Percent
homepage,3091,64.355611
tagline,844,17.572351
keywords,412,8.577972
cast,43,0.895274
director,30,0.62461
genres,28,0.582969
overview,3,0.062461
runtime,2,0.041641
release_date,1,0.02082
popularity,0,0.0


In [10]:
print('Summary Statistics of numerical features for DataFrame are as follows:')
print('-'*100)
movies_data.describe()

Summary Statistics of numerical features for DataFrame are as follows:
----------------------------------------------------------------------------------------------------


Unnamed: 0,index,budget,id,popularity,revenue,runtime,vote_average,vote_count
count,4803.0,4803.0,4803.0,4803.0,4803.0,4801.0,4803.0,4803.0
mean,2401.0,29045040.0,57165.484281,21.492301,82260640.0,106.875859,6.092172,690.217989
std,1386.651002,40722390.0,88694.614033,31.81665,162857100.0,22.611935,1.194612,1234.585891
min,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0
25%,1200.5,790000.0,9014.5,4.66807,0.0,94.0,5.6,54.0
50%,2401.0,15000000.0,14629.0,12.921594,19170000.0,103.0,6.2,235.0
75%,3601.5,40000000.0,58610.5,28.313505,92917190.0,118.0,6.8,737.0
max,4802.0,380000000.0,459488.0,875.581305,2787965000.0,338.0,10.0,13752.0


In [9]:
print('Summary Statistics of categorical features for DataFrame are as follows:')
print('-'*100)
movies_data.describe(include='object')

Summary Statistics of numerical features for DataFrame are as follows:
----------------------------------------------------------------------------------------------------


Unnamed: 0,genres,homepage,keywords,original_language,original_title,overview,production_companies,production_countries,release_date,spoken_languages,status,tagline,title,cast,crew,director
count,4775,1712,4391,4803,4803,4800,4803,4803,4802,4803,4803,3959,4803,4760,4803,4773
unique,1168,1691,4219,37,4801,4800,3697,469,3280,544,3,3944,4800,4741,4776,2349
top,Drama,http://www.missionimpossible.com/,independent film,en,Out of the Blue,"In the 22nd century, a paraplegic Marine is di...",[],"[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2006-01-01,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Based on a true story.,The Host,William Shatner Leonard Nimoy DeForest Kelley ...,[],Steven Spielberg
freq,370,4,55,4505,2,1,351,2977,10,3171,4795,3,2,6,28,27


## 5) Data Cleaning and Preprocessing
---

### Selecting the relevant features for recommendation

In [11]:
selected_features = ['genres','keywords','tagline','cast','director']
selected_features

['genres', 'keywords', 'tagline', 'cast', 'director']

### Replace the null values with a null string in selected features

In [14]:
for feature in selected_features:
    movies_data[feature] = movies_data[feature].fillna('')

In [15]:
print('Missing Value Presence in different columns of DataFrame are as follows : ')
print('-'*100)
total=movies_data.isnull().sum().sort_values(ascending=False)
percent=(movies_data.isnull().sum()/movies_data.isnull().count()*100).sort_values(ascending=False)
pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])

Missing Value Presence in different columns of DataFrame are as follows : 
----------------------------------------------------------------------------------------------------


Unnamed: 0,Total,Percent
homepage,3091,64.355611
overview,3,0.062461
runtime,2,0.041641
release_date,1,0.02082
revenue,0,0.0
crew,0,0.0
cast,0,0.0
vote_count,0,0.0
vote_average,0,0.0
title,0,0.0


### Combining all the 5 selected features

In [16]:
combined_features = movies_data['genres']+' '+movies_data['keywords']+' '+movies_data['tagline']+' '+movies_data['cast']+' '+movies_data['director']

In [17]:
combined_features

0       Action Adventure Fantasy Science Fiction cultu...
1       Adventure Fantasy Action ocean drug abuse exot...
2       Action Adventure Crime spy based on novel secr...
3       Action Crime Drama Thriller dc comics crime fi...
4       Action Adventure Science Fiction based on nove...
                              ...                        
4798    Action Crime Thriller united states\u2013mexic...
4799    Comedy Romance  A newlywed couple's honeymoon ...
4800    Comedy Drama Romance TV Movie date love at fir...
4801      A New Yorker in Shanghai Daniel Henney Eliza...
4802    Documentary obsession camcorder crush dream gi...
Length: 4803, dtype: object

### Stemming

In [18]:
porter_stemmer = PorterStemmer()

In [19]:
def stemming(content):
    stemmed_content = re.sub('[^a-zA-Z]',' ',content)
    stemmed_content = stemmed_content.lower()
    stemmed_content = stemmed_content.split()
    stemmed_content = [porter_stemmer.stem(word) for word in stemmed_content if not word in stopwords.words('english')]
    stemmed_content = ' '.join(stemmed_content)
    return stemmed_content

In [20]:
combined_features = combined_features.apply(stemming)

In [21]:
combined_features

0       action adventur fantasi scienc fiction cultur ...
1       adventur fantasi action ocean drug abus exot i...
2       action adventur crime spi base novel secret ag...
3       action crime drama thriller dc comic crime fig...
4       action adventur scienc fiction base novel mar ...
                              ...                        
4798    action crime thriller unit state u mexico barr...
4799    comedi romanc newlyw coupl honeymoon upend arr...
4800    comedi drama romanc tv movi date love first si...
4801    new yorker shanghai daniel henney eliza coup b...
4802    documentari obsess camcord crush dream girl dr...
Length: 4803, dtype: object

## 6) Model Building
---

### Feature Extraction

#### Transform the text data to feature vectors that can be used as input to the Logistic regression

In [22]:
vectorizer = TfidfVectorizer()

In [23]:
vectorizer.fit(combined_features)

combined_features = vectorizer.transform(combined_features)

In [24]:
combined_features

<4803x15165 sparse matrix of type '<class 'numpy.float64'>'
	with 110399 stored elements in Compressed Sparse Row format>

In [25]:
print(combined_features)

  (0, 15139)	0.2052522310553095
  (0, 14871)	0.24026470377934847
  (0, 14863)	0.12875501676155404
  (0, 14587)	0.20164827377007877
  (0, 14523)	0.12740774864377094
  (0, 13033)	0.15213516971481178
  (0, 12829)	0.34513124291779446
  (0, 12728)	0.21738843062624963
  (0, 12549)	0.20929853505545373
  (0, 12131)	0.1053209249350964
  (0, 11900)	0.15264686191548815
  (0, 11874)	0.22127568137478468
  (0, 11596)	0.19738383622085057
  (0, 10240)	0.27652273888326423
  (0, 9066)	0.15239002415366618
  (0, 7728)	0.2052522310553095
  (0, 6793)	0.11463157223101918
  (0, 5083)	0.16734367774450565
  (0, 4735)	0.10471198725789553
  (0, 4580)	0.11273156644207886
  (0, 4305)	0.24415195452788352
  (0, 3161)	0.21560402385508065
  (0, 2782)	0.24855895595249378
  (0, 2641)	0.22568268279939493
  (0, 2071)	0.17552313506858697
  :	:
  (4801, 10378)	0.22124713594289835
  (4801, 9715)	0.1385948353997998
  (4801, 6372)	0.31053778940920307
  (4801, 6077)	0.2962036112137938
  (4801, 4209)	0.2536402429374028
  (4801, 3

### Cosine Similarity

#### Getting the similarity scores using cosine similarity

In [28]:
similarity = cosine_similarity(combined_features)

In [29]:
print(similarity)

[[1.         0.04757067 0.04209981 ... 0.         0.         0.        ]
 [0.04757067 1.         0.03791066 ... 0.01223813 0.         0.        ]
 [0.04209981 0.03791066 1.         ... 0.         0.0520925  0.        ]
 ...
 [0.         0.01223813 0.         ... 1.         0.         0.02762031]
 [0.         0.         0.0520925  ... 0.         1.         0.        ]
 [0.         0.         0.         ... 0.02762031 0.         1.        ]]


In [30]:
similarity.shape

(4803, 4803)

### Movie Recommendation Systems Sub-Steps

#### `Step-1` Getting Movie name from the User

In [31]:
movie_name = input(' Enter your favourite movie name : ')

 Enter your favourite movie name : batman


#### `Step-2` Creating a list with all the movie names given in the dataset

In [32]:
list_of_all_titles = movies_data['title'].tolist()
print(list_of_all_titles)

['Avatar', "Pirates of the Caribbean: At World's End", 'Spectre', 'The Dark Knight Rises', 'John Carter', 'Spider-Man 3', 'Tangled', 'Avengers: Age of Ultron', 'Harry Potter and the Half-Blood Prince', 'Batman v Superman: Dawn of Justice', 'Superman Returns', 'Quantum of Solace', "Pirates of the Caribbean: Dead Man's Chest", 'The Lone Ranger', 'Man of Steel', 'The Chronicles of Narnia: Prince Caspian', 'The Avengers', 'Pirates of the Caribbean: On Stranger Tides', 'Men in Black 3', 'The Hobbit: The Battle of the Five Armies', 'The Amazing Spider-Man', 'Robin Hood', 'The Hobbit: The Desolation of Smaug', 'The Golden Compass', 'King Kong', 'Titanic', 'Captain America: Civil War', 'Battleship', 'Jurassic World', 'Skyfall', 'Spider-Man 2', 'Iron Man 3', 'Alice in Wonderland', 'X-Men: The Last Stand', 'Monsters University', 'Transformers: Revenge of the Fallen', 'Transformers: Age of Extinction', 'Oz: The Great and Powerful', 'The Amazing Spider-Man 2', 'TRON: Legacy', 'Cars 2', 'Green Lant

#### `Step 3` Finding the close match for the movie name given by the user

In [35]:
find_close_match = difflib.get_close_matches(movie_name, list_of_all_titles)
print(find_close_match)

['Batman', 'Batman', 'Catwoman']


In [36]:
close_match = find_close_match[0]
print(close_match)

Batman


#### `Step 4` Finding the index of the movie with title

In [37]:
index_of_the_movie = movies_data[movies_data.title == close_match]['index'].values[0]
print(index_of_the_movie)

1359


#### `Step 5` Getting a list of similar movies

In [38]:
similarity_score = list(enumerate(similarity[index_of_the_movie]))
print(similarity_score)

[(0, 0.01776730333487835), (1, 0.02983612020964815), (2, 0.015861499653293514), (3, 0.19936242756789457), (4, 0.006005551433645722), (5, 0.11428458617173043), (6, 0.0), (7, 0.06320396527748612), (8, 0.03385015088184331), (9, 0.11139421613005376), (10, 0.0972421271936699), (11, 0.014218909277605905), (12, 0.04117439643481078), (13, 0.00560458446020678), (14, 0.13831562847592846), (15, 0.011949220902370307), (16, 0.0604308118487812), (17, 0.017661597930726327), (18, 0.026917804038365113), (19, 0.015358236300210345), (20, 0.045420770955566035), (21, 0.005287601681307618), (22, 0.010864775236470952), (23, 0.011107703594273457), (24, 0.027694690404716796), (25, 0.0), (26, 0.03569774834190745), (27, 0.00550492163484813), (28, 0.005935451231148693), (29, 0.006480313974018936), (30, 0.14641371764120226), (31, 0.03410054712821779), (32, 0.08311590658982682), (33, 0.06228171273932271), (34, 0.0), (35, 0.028052331744911542), (36, 0.04921248397028206), (37, 0.012773263326354003), (38, 0.0780311209

In [39]:
len(similarity_score)

4803

#### `Step 6` Sorting the movies based on their similarity score

In [40]:
sorted_similar_movies = sorted(similarity_score, key = lambda x:x[1], reverse = True) 
print(sorted_similar_movies)

[(1359, 1.0000000000000002), (428, 0.4336702001378905), (210, 0.27144255418299584), (3, 0.19936242756789457), (119, 0.1910946925930589), (65, 0.1874264446719564), (1512, 0.1564422050047655), (30, 0.14641371764120226), (2530, 0.14299844640540912), (473, 0.14044540091812419), (14, 0.13831562847592846), (813, 0.13106465838406184), (2313, 0.13035765836260912), (753, 0.12985239112430344), (278, 0.1270836210943836), (2655, 0.12316939595149462), (163, 0.12172176891669018), (72, 0.12084840021058728), (438, 0.11645088380845323), (870, 0.11481456763319714), (1740, 0.11454985125597252), (5, 0.11428458617173043), (9, 0.11139421613005376), (1035, 0.11104004728933728), (1720, 0.11088959105313176), (1017, 0.10977773006970801), (2805, 0.10620074093664161), (1474, 0.10592336182832664), (1803, 0.10548693203812876), (2858, 0.10361292404647203), (1076, 0.1015052276330557), (2753, 0.1007058355405392), (41, 0.09973462960973993), (174, 0.09969517081232442), (1241, 0.09945081977236112), (2381, 0.0990916465202

#### `Step 7` Print the name of similar movies based on the index

In [41]:
print('Movies suggested for you : \n')

i = 1

for movie in sorted_similar_movies:
    index = movie[0]
    title_from_index = movies_data[movies_data.index==index]['title'].values[0]
    if (i<30):
        print(i, '.',title_from_index)
        i+=1

Movies suggested for you : 

1 . Batman
2 . Batman Returns
3 . Batman & Robin
4 . The Dark Knight Rises
5 . Batman Begins
6 . The Dark Knight
7 . A History of Violence
8 . Spider-Man 2
9 . Beetlejuice
10 . Mars Attacks!
11 . Man of Steel
12 . Superman
13 . The Mask
14 . The Sentinel
15 . Planet of the Apes
16 . Dungeons & Dragons: Wrath of the Dragon God
17 . Watchmen
18 . Suicide Squad
19 . Something's Gotta Give
20 . Superman II
21 . Kick-Ass 2
22 . Spider-Man 3
23 . Batman v Superman: Dawn of Justice
24 . Jonah Hex
25 . Kick-Ass
26 . Bedazzled
27 . The Land Before Time
28 . I Dreamed of Africa
29 . Blood and Wine


## 7) Movie Recommendation System Demonstration
---

In [42]:
movie_name = input(' Enter your favourite movie name : ')

list_of_all_titles = movies_data['title'].tolist()

find_close_match = difflib.get_close_matches(movie_name, list_of_all_titles)

close_match = find_close_match[0]

index_of_the_movie = movies_data[movies_data.title == close_match]['index'].values[0]

similarity_score = list(enumerate(similarity[index_of_the_movie]))

sorted_similar_movies = sorted(similarity_score, key = lambda x:x[1], reverse = True) 

print('Movies suggested for you : \n')

i = 1

for movie in sorted_similar_movies:
    index = movie[0]
    title_from_index = movies_data[movies_data.index==index]['title'].values[0]
    if (i<30):
        print(i, '.',title_from_index)
        i+=1

 Enter your favourite movie name : Iron man
Movies suggested for you : 

1 . Iron Man
2 . Iron Man 2
3 . Iron Man 3
4 . Avengers: Age of Ultron
5 . The Avengers
6 . Captain America: Civil War
7 . Captain America: The Winter Soldier
8 . Ant-Man
9 . X-Men
10 . Made
11 . X2
12 . X-Men: Apocalypse
13 . Deadpool
14 . The Incredible Hulk
15 . X-Men: First Class
16 . The Helix... Loaded
17 . Kick-Ass 2
18 . Thor: The Dark World
19 . X-Men: The Last Stand
20 . X-Men: Days of Future Past
21 . Man of Steel
22 . Duets
23 . Mortdecai
24 . Captain America: The First Avenger
25 . Superman II
26 . The Last Airbender
27 . Southland Tales
28 . Guardians of the Galaxy
29 . The Good Night
