# **Movie Recommendation System**

**Recommender System** is a system that seeks to predict or filter preferences according to the user's choices. Recommender systems are utilized in a variety of areas including movies, music, news, books, research articles, search queries, social tags, and products in general. Recommender systems produce a list of recommendations in any of the two ways -

**Collaborative filtering:** Collaborative filtering approaches build a model from the user's past behavior (i.e. items purchased or searched by the user) as well as similar decisions made by other users. This model is then used to predict items(or ratings for items) that users may have an interest in.

**Content-based filtering:** Content-base filtering approaches uses a series of discrete characteristics of an item in order to recommend additional items with similar properties. Content-based filtering methods are totally based on a description of the item and a profile of the user's preferences. It recommends items based on the user's past preferences. Let's develop a basic recommendation system using Python and Pandas.

Let's develop a basic recommendation system by suggesting items that are most similar to a particular item, in this case, movies. It just tells what movies/items are most similar to the user's movie choice.************

# **Import Library**

In [1]:
import pandas as pd

In [2]:
import numpy as np

# **Import Dataset**

In [None]:
ms=pd.read_csv('https://github.com/parinjasani/Movie-recommendation-system/blob/main/IMDB-Movie-Dataset(2024-1951).csv')

In [5]:
ms.head()

Unnamed: 0,id,movie_id,movie_name,year,genre,overview,director,cast
0,0,15354916,Jawan,2023,"Action, Thriller",A high-octane action thriller which outlines t...,Atlee,"Shah Rukh Khan, Nayanthara, Vijay Sethupathi, ..."
1,1,15748830,Jaane Jaan,2023,"Crime, Drama, Mystery",A single mother and her daughter who commit a ...,Sujoy Ghosh,"Kareena Kapoor, Jaideep Ahlawat, Vijay Varma, ..."
2,2,11663228,Jailer,2023,"Action, Comedy, Crime",A retired jailer goes on a manhunt to find his...,Nelson Dilipkumar,"Rajinikanth, Mohanlal, Shivarajkumar, Jackie S..."
3,3,14993250,Rocky Aur Rani Kii Prem Kahaani,2023,"Comedy, Drama, Family",Flamboyant Punjabi Rocky and intellectual Beng...,Karan Johar,"Ranveer Singh, Alia Bha0, Dharmendra, Shabana ..."
4,4,15732324,OMG 2,2023,"Comedy, Drama",An unhappy civilian asks the court to mandate ...,Amit Rai,"Pankaj Tripathi, Akshay Kumar, Yami Gautam, Pa..."


In [6]:
ms.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2199 entries, 0 to 2198
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          2199 non-null   int64 
 1   movie_id    2199 non-null   int64 
 2   movie_name  2199 non-null   object
 3   year        2134 non-null   object
 4   genre       2199 non-null   object
 5   overview    2199 non-null   object
 6   director    2199 non-null   object
 7   cast        2199 non-null   object
dtypes: int64(2), object(6)
memory usage: 137.6+ KB


In [7]:
ms.shape

(2199, 8)

In [8]:
ms.columns

Index(['id', 'movie_id', 'movie_name', 'year', 'genre', 'overview', 'director',
       'cast'],
      dtype='object')

# **Get Feature Selection**

In [None]:
movie_features = ms[[ 'genre', 'overview','director', 'cast']].fillna('')

Selected five existing features to recommend movies. It may vary from one project to another. Like one can add vote counts, budget, language, etc.

In [None]:
movie_features.shape

(2199, 4)

In [None]:
movie_features

Unnamed: 0,genre,overview,director,cast
0,"Action, Thriller",A high-octane action thriller which outlines t...,Atlee,"Shah Rukh Khan, Nayanthara, Vijay Sethupathi, ..."
1,"Crime, Drama, Mystery",A single mother and her daughter who commit a ...,Sujoy Ghosh,"Kareena Kapoor, Jaideep Ahlawat, Vijay Varma, ..."
2,"Action, Comedy, Crime",A retired jailer goes on a manhunt to find his...,Nelson Dilipkumar,"Rajinikanth, Mohanlal, Shivarajkumar, Jackie S..."
3,"Comedy, Drama, Family",Flamboyant Punjabi Rocky and intellectual Beng...,Karan Johar,"Ranveer Singh, Alia Bha0, Dharmendra, Shabana ..."
4,"Comedy, Drama",An unhappy civilian asks the court to mandate ...,Amit Rai,"Pankaj Tripathi, Akshay Kumar, Yami Gautam, Pa..."
...,...,...,...,...
2194,Thriller,Add a Plot,Subhash Ghai,"Shatrughan Sinha, Reena Roy, Ajit Khan, Premna..."
2195,"Drama, Musical, Romance",A renowned music teacher mentors a promising y...,Tanuja Chandra,"Lucky Ali, Simone Singh, Achint Kaur, Ehsan Khan"
2196,"Musical, Romance",When a ballroom dancer's shot at a crucial tou...,Stanley D'Costa,"Sooraj Pancholi, Isabelle Kaif, Waluscha D'Sou..."
2197,"Drama, Family, Fantasy",After the tragic deaths of his son Ajit and da...,Harmesh Malhotra,"Sunny Deol, Sridevi, Anupam Kher, Gulshan Grover"


In [None]:
x=movie_features['genre'] + ' ' + movie_features['overview'] + ' ' + movie_features['director'] + ' ' + movie_features['cast']

In [None]:
x

0       Action, Thriller A high-octane action thriller...
1       Crime, Drama, Mystery A single mother and her ...
2       Action, Comedy, Crime A retired jailer goes on...
3       Comedy, Drama, Family Flamboyant Punjabi Rocky...
4       Comedy, Drama An unhappy civilian asks the cou...
                              ...                        
2194    Thriller Add a Plot Subhash Ghai Shatrughan Si...
2195    Drama, Musical, Romance A renowned music teach...
2196    Musical, Romance When a ballroom dancer's shot...
2197    Drama, Family, Fantasy After the tragic deaths...
2198    Action, Comedy, Drama Raj is a successful lawy...
Length: 2199, dtype: object

In [None]:
x.shape

(2199,)

# **Get Feature Text Conversions to Tokens**

In [None]:
!pip install numpy==1.21.2



In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer



In [None]:
tfidf = TfidfVectorizer()

In [None]:
X = tfidf.fit_transform(x)

In [None]:
X.shape

(2199, 11441)

In [None]:
print(X)

  (0, 7222)	0.18732057124632678
  (0, 2670)	0.18524953584832649
  (0, 9108)	0.244585433104368
  (0, 11002)	0.14681069131570232
  (0, 6842)	0.23547949327798678
  (0, 5452)	0.09405767855896913
  (0, 8640)	0.16167663050581524
  (0, 9142)	0.12673362239821723
  (0, 946)	0.2899771135019883
  (0, 9564)	0.19994085660291214
  (0, 4720)	0.06557063346769715
  (0, 11340)	0.2752857927474026
  (0, 8247)	0.2752857927474026
  (0, 10449)	0.05703667866328602
  (0, 9103)	0.16274239131621018
  (0, 4934)	0.07772512631378324
  (0, 11232)	0.09357348531476826
  (0, 6122)	0.1153226211878715
  (0, 7057)	0.06059844123803038
  (0, 5191)	0.15817552570488105
  (0, 3278)	0.22820854074933225
  (0, 10329)	0.16251068864342313
  (0, 7169)	0.2899771135019883
  (0, 11225)	0.1520493398826944
  (0, 7051)	0.2899771135019883
  :	:
  (2198, 9059)	0.13024414506375684
  (2198, 1228)	0.14243780146433394
  (2198, 5817)	0.20550079070454394
  (2198, 5543)	0.16124853363237318
  (2198, 5741)	0.16253753641693003
  (2198, 5467)	0.127362

# **Get Similarity Score using Cosine Similarity**

cosine_similarity computes the L2-normalized dot product of vectors. Euclidean(L2) normalization projects the vectors onto the unit sphere, and their dot product is then the cosine the angle between the points denoted by the vectors.

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
Similarity_Score = cosine_similarity(X)

In [None]:
Similarity_Score

array([[1.        , 0.03873457, 0.02088166, ..., 0.00456613, 0.02713392,
        0.02675493],
       [0.03873457, 1.        , 0.01286975, ..., 0.02609631, 0.04579643,
        0.00698856],
       [0.02088166, 0.01286975, 1.        , ..., 0.00344653, 0.02984156,
        0.072506  ],
       ...,
       [0.00456613, 0.02609631, 0.00344653, ..., 1.        , 0.02268653,
        0.02220094],
       [0.02713392, 0.04579643, 0.02984156, ..., 0.02268653, 1.        ,
        0.08833593],
       [0.02675493, 0.00698856, 0.072506  , ..., 0.02220094, 0.08833593,
        1.        ]])

In [None]:
Similarity_Score.shape

(2199, 2199)

# **Get Movie Name as Input from User and Validate for Closest Spelling**

In [None]:
Favorite_Movie_Name = input(' Enter your favorite movie name : ')

 Enter your favorite movie name : ram setu


In [None]:
All_Movies_Title_List = ms['movie_name'].tolist()

In [None]:
import difflib

In [None]:
Movie_Recommendation = difflib.get_close_matches(Favorite_Movie_Name, All_Movies_Title_List)
print(Movie_Recommendation)

['Ram Setu']


In [None]:
Close_Match = Movie_Recommendation[0]
print(Close_Match)

Ram Setu


In [None]:
Index_of_Close_Match_Movie = ms[ms.movie_name == Close_Match]['id'].values[0]
print(Index_of_Close_Match_Movie)

374


In [None]:
# getting a list of similar movies
Recommendation_Score = list(enumerate(Similarity_Score[Index_of_Close_Match_Movie]))
print(Recommendation_Score)

[(0, 0.040190435166782446), (1, 0.0), (2, 0.013970773221096593), (3, 0.02746597398821934), (4, 0.039541376686281465), (5, 0.015211594719745349), (6, 0.013614627636999364), (7, 0.03038557522489396), (8, 0.06976726291748218), (9, 0.01883649650694396), (10, 0.028760560859570523), (11, 0.005383226286674362), (12, 0.012512265751948363), (13, 0.007424201798612333), (14, 0.021616602815139656), (15, 0.045788338728601545), (16, 0.04485811141669578), (17, 0.05280430034032697), (18, 0.014345406784050305), (19, 0.051378590710753985), (20, 0.02048578817625222), (21, 0.03598817238731216), (22, 0.01984912277146899), (23, 0.01651494817118996), (24, 0.02240935021315525), (25, 0.01651806467412219), (26, 0.018451962466923405), (27, 0.02571441485864527), (28, 0.034574514036101574), (29, 0.025536214642831092), (30, 0.013157611109620021), (31, 0.060383000792029885), (32, 0.03014930631392418), (33, 0.060035557357533284), (34, 0.017636149194255015), (35, 0.057980978062726896), (36, 0.05962742423680381), (37, 

In [None]:
len(Recommendation_Score)

2199

# **Get All Movies Sort Based on Recommendation Score wrt Favourite Movie**

In [None]:
# sorting the movies based on their similarity score

Sorted_Similar_Movies = sorted(Recommendation_Score, key = lambda x:x[1], reverse = True)
print(Sorted_Similar_Movies)

[(374, 0.9999999999999999), (700, 0.14832366129032962), (1493, 0.13624905113589372), (479, 0.13100922092610776), (1417, 0.13066847217982858), (1228, 0.12689297460605986), (1908, 0.12622034705182886), (1581, 0.12594771482953293), (573, 0.1244574103173888), (563, 0.11441078639986221), (1409, 0.10830472396068083), (775, 0.1077498301505456), (166, 0.10722655639692862), (962, 0.1061125034481359), (692, 0.10461119579890912), (435, 0.10372850124775751), (1091, 0.10355925129491102), (437, 0.10198427047427817), (622, 0.10146138301726727), (413, 0.09994129417935084), (556, 0.09945896746121294), (1642, 0.09918903316729652), (558, 0.09913270339331301), (2058, 0.09825736230436959), (1314, 0.0963997846500445), (732, 0.09367149226925575), (1492, 0.09317295982285707), (684, 0.0930352167176002), (1926, 0.09294038107595652), (561, 0.09254822208593322), (62, 0.09215751141140288), (1620, 0.09209136601076513), (2107, 0.09152666591358413), (144, 0.09130562757657314), (68, 0.09082828338756778), (1373, 0.0902

In [None]:
# print the name of similar movies based on the index

print('Top 30 Movies Suggested for you: \n')

i = 1

for movie in Sorted_Similar_Movies:
    index = movie[0]
    title_from_index = ms[ms.index==index]['movie_name'].values[0]
    if (i<31):
        print(i, '.',title_from_index)
        i+=1

Top 30 Movies Suggested for you: 

1 . Ram Setu
2 . Dishoom
3 . Aayirathil Oruvan
4 . A Flying Ja0
5 . Fateh
6 . Chhorii
7 . Akaash Vani
8 . Bhuj: The Pride of India
9 . Rowdy Rathore
10 . Untitled Blumhouse Productions Film
11 . Dus
12 . Hurdang
13 . Selfiee
14 . Baaghi 2
15 . Bachchhan Paandey
16 . Housefull 2
17 . Kick 2
18 . Code Name: Tiranga
19 . Kesari
20 . Mrs. Serial Killer
21 . Janhit Mein Jaari
22 . Suraj Pe Mangal Bhari
23 . Vinaya Vidheya Rama
24 . D Company
25 . Phoonk
26 . Shool
27 . Total Dhamaal
28 . 800
29 . Zameen
30 . Dhamaal


# **Top 10 Movie Recommendation System**

In [None]:
Movie_Name = input('Enter your favorite movie name: ')

list_of_all_titles = ms['movie_name'].tolist()

Find_Close_Match = difflib.get_close_matches(Movie_Name, list_of_all_titles)

Close_Match = Find_Close_Match[0]

Index_of_Movie = ms[ms.movie_name == Close_Match]['id'].values[0]

Recommendation_Score = list(enumerate(Similarity_Score[Index_of_Movie]))

sorted_similar_movies = sorted(Recommendation_Score, key = lambda x:x[1], reverse = True)

print('Top 10 Movies suggested for you : \n')

i = 0

for movie in sorted_similar_movies:
    index = movie[0]
    title_from_index = ms[ms.id==index]['movie_name'].values
    if (i<10):
        print(i,'.',title_from_index)
        i+=1

Enter your favorite movie name: tiger
Top 10 Movies suggested for you : 

0 . ['Liger']
1 . ['JGM (JanaGanaMana)']
2 . ['Kabzaa']
3 . ['Purab Aur Pachhim']
4 . ['Tadap']
5 . ['Khaali Peeli']
6 . ['Aadhi Bhagavan']
7 . ['Don']
8 . ['Pudhu Pe0ai']
9 . ['Hum Paanch']
