**Objective:**

The objective of this movie recommendation system is to provide personalized movie recommendations to users based on their historical ratings and preferences. We'll build a collaborative filtering recommendation system using the Kaggle dataset in Google Colab.

**Data Source:**

We'll use one of the Kaggle dataset for movies, which is a popular dataset for building movie recommendation systems. The dataset contains movie ratings given by users. You can download the dataset from the MovieLens website or use the surprise library to directly load a subset of the dataset.

In [47]:
import pandas as pd
import numpy as np


In [48]:
path = ("/Top_10000_Movies.csv")

In [49]:
df = pd.read_csv(path,lineterminator='\n')

In [None]:
df.head()

Unnamed: 0.1,Unnamed: 0,id,original_language,original_title,popularity,release_date,vote_average,vote_count,genre,overview,revenue,runtime,tagline
0,0,580489,en,Venom: Let There Be Carnage,5401.308,2021-09-30,6.8,1736,"['Science Fiction', 'Action', 'Adventure']",After finding a host body in investigative rep...,424000000,97.0,
1,1,524434,en,Eternals,3365.535,2021-11-03,7.1,622,"['Action', 'Adventure', 'Science Fiction', 'Fa...",The Eternals are a team of ancient aliens who ...,165000000,157.0,In the beginning...
2,2,438631,en,Dune,2911.423,2021-09-15,8.0,3632,"['Action', 'Adventure', 'Science Fiction']","Paul Atreides, a brilliant and gifted young ma...",331116356,155.0,"Beyond fear, destiny awaits."
3,3,796499,en,Army of Thieves,2552.437,2021-10-27,6.9,555,"['Action', 'Crime', 'Thriller']",A mysterious woman recruits bank teller Ludwig...,0,127.0,"Before Vegas, one locksmith became a legend."
4,4,550988,en,Free Guy,1850.47,2021-08-11,7.8,3493,"['Comedy', 'Action', 'Adventure', 'Science Fic...",A bank teller called Guy realizes he is a back...,331096766,115.0,Life's too short to be a background character.


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Unnamed: 0         10000 non-null  int64  
 1   id                 10000 non-null  int64  
 2   original_language  10000 non-null  object 
 3   original_title     10000 non-null  object 
 4   popularity         10000 non-null  float64
 5   release_date       9962 non-null   object 
 6   vote_average       10000 non-null  float64
 7   vote_count         10000 non-null  int64  
 8   genre              10000 non-null  object 
 9   overview           9900 non-null   object 
 10  revenue            10000 non-null  int64  
 11  runtime            9991 non-null   float64
 12  tagline            7080 non-null   object 
dtypes: float64(3), int64(4), object(6)
memory usage: 1015.8+ KB


In [12]:
df.shape

(10000, 13)

In [232]:
df.columns

Index(['Unnamed: 0', 'id', 'original_language', 'original_title', 'popularity',
       'release_date', 'vote_average', 'vote_count', 'genre', 'overview',
       'revenue', 'runtime', 'tagline'],
      dtype='object')

**GET FEATURE SELECTION**

In [50]:
df_features = df[['original_title', 'popularity','release_date', 'vote_average', 'vote_count']].fillna('')

Selected five existing features to recommend movies

In [None]:
df_features.shape

(10000, 5)

In [None]:
df_features

Unnamed: 0,original_title,popularity,release_date,vote_average,vote_count
0,Venom: Let There Be Carnage,5401.308,2021-09-30,6.8,1736
1,Eternals,3365.535,2021-11-03,7.1,622
2,Dune,2911.423,2021-09-15,8.0,3632
3,Army of Thieves,2552.437,2021-10-27,6.9,555
4,Free Guy,1850.470,2021-08-11,7.8,3493
...,...,...,...,...,...
9995,A Grand Day Out,9.266,1990-05-18,7.5,594
9996,El cantante,10.417,2006-09-12,7.0,80
9997,How I Live Now,9.520,2013-09-10,6.6,705
9998,Once,9.267,2007-03-23,7.4,990


In [51]:
x = df_features['original_title'].astype(str) + ' ' + df_features['popularity'].astype(str) + ' ' + df_features['release_date'].astype(str) + ' ' + df_features['vote_average'].astype(str) + ' ' + df_features['vote_count'].astype(str)

In [None]:
x

0       Venom: Let There Be Carnage 5401.308 2021-09-3...
1                    Eternals 3365.535 2021-11-03 7.1 622
2                       Dune 2911.423 2021-09-15 8.0 3632
3             Army of Thieves 2552.437 2021-10-27 6.9 555
4                    Free Guy 1850.47 2021-08-11 7.8 3493
                              ...                        
9995             A Grand Day Out 9.266 1990-05-18 7.5 594
9996                 El cantante 10.417 2006-09-12 7.0 80
9997               How I Live Now 9.52 2013-09-10 6.6 705
9998                        Once 9.267 2007-03-23 7.4 990
9999             Manhattan Night 9.273 2016-05-20 6.0 304
Length: 10000, dtype: object

In [None]:
x.shape

(10000,)

**GET FEATURE TEXT CONVERSION TO TOKENS**

In [52]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [53]:
tfidf = TfidfVectorizer()

In [54]:
x = tfidf.fit_transform(x)

In [None]:
x.shape

(10000, 12617)

In [None]:
print(x)

  (0, 921)	0.3937438563524359
  (0, 1758)	0.17282303888879205
  (0, 91)	0.12443607089052096
  (0, 1196)	0.14302107214551013
  (0, 1797)	0.3132388061037037
  (0, 2571)	0.3937438563524359
  (0, 4540)	0.36506737885534507
  (0, 4007)	0.2876282800458216
  (0, 10322)	0.3315180508782006
  (0, 7330)	0.31630475754291243
  (0, 10773)	0.31630475754291243
  (1, 2775)	0.4490473332219666
  (1, 31)	0.18285804780544523
  (1, 239)	0.15199898157579186
  (1, 2558)	0.40968699060355435
  (1, 1923)	0.5174148941247425
  (1, 5717)	0.5174148941247425
  (1, 1196)	0.19630578413513364
  (2, 2027)	0.5029345734092411
  (2, 702)	0.19544511576649742
  (2, 2228)	0.4010527061050116
  (2, 1709)	0.5253145414492625
  (2, 5490)	0.4561672963818289
  (2, 91)	0.16601675547438044
  (2, 1196)	0.19081199038303606
  :	:
  (9996, 5580)	0.34698098822556095
  (9996, 373)	0.16410039156442607
  (9996, 101)	0.15559316604456505
  (9996, 91)	0.18057269242973017
  (9997, 7390)	0.42991297412617124
  (9997, 8216)	0.46697565653542095
  (9997

**GET SIMILARITY SCORE USING COSINE SIMILARITY**

In [55]:
from sklearn.metrics.pairwise import cosine_similarity

In [56]:
Similarity_Score = cosine_similarity(x)

In [57]:
Similarity_Score

array([[1.        , 0.02807586, 0.04794861, ..., 0.0218112 , 0.        ,
        0.        ],
       [0.02807586, 1.        , 0.0374575 , ..., 0.        , 0.03801994,
        0.        ],
       [0.04794861, 0.0374575 , 1.        , ..., 0.02909948, 0.        ,
        0.        ],
       ...,
       [0.0218112 , 0.        , 0.02909948, ..., 1.        , 0.        ,
        0.        ],
       [0.        , 0.03801994, 0.        , ..., 0.        , 1.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        1.        ]])

In [293]:
Similarity_Score.shape

(10000, 10000)

**GET MOVIE NAME AS INPUT FROM USER AND VALIDATE FOR CLOSEST SPELLING**

In [98]:
Favourite_Movie_Name = input('Enter your favourite movie name : ')

Enter your favourite movie name : Venom


In [99]:
All_Movies_Title_List = df['original_title'].tolist()

In [100]:
import difflib

In [101]:
Movie_Recommenadation = difflib.get_close_matches(Favourite_Movie_Name,All_Movies_Title_List)
print(Movie_Recommenadation)

['Venom', 'Venom', 'Venom']


In [102]:
Close_Match = Movie_Recommenadation[0]
print(Close_Match)

Venom


In [103]:
Index_of_Close_Match_Movie = df[df.original_title==Close_Match]['popularity'].values[0]
print(Index_of_Close_Match_Movie)

1212.352


In [106]:
if isinstance(Index_of_Close_Match_Movie, int) and 0 <= Index_of_Close_Match_Movie < len(Similarity_Scores):
    Recommendation_Score = list(enumerate([Similarity_Scores[Index_of_Close_Match_Movie]]))
    print(Recommendation_Score)
else:
    print("Invalid index")


[(0, [1.0, 0.02807586, 0.04794861, Ellipsis, 0.0218112, 0.0, 0.0])]


In [105]:
len(Recommendation_Score)

1

**GET ALL MOVIES SORT BASED ON RECOMMENDATION SCORE WRT FAVOURITE MOVIE**

In [96]:
Sorted_Similar_Movies= sorted(Recommendation_Score, key = lambda x:x[1], reverse= True)
print(Sorted_Similar_Movies)

[(0, [1.0, 0.02807586, 0.04794861, Ellipsis, 0.0218112, 0.0, 0.0])]


In [97]:
print('Movies Suggested for You: \n')
i=1
for movie in Sorted_Similar_Movies:
  index = movie[0]
  title_from_index=df[df.index==index]['original_title'].values[0]
  if (i<30):
   print(i,'.',title_from_index)
   i+=1

Movies Suggested for You: 

1 . Venom: Let There Be Carnage
