# **Data Loading**

Downloading tmdb movie dataset from Kaggle

In [None]:
!mkdir -p ~/.kaggle
!kaggle datasets download -d tmdb/tmdb-movie-metadata

Dataset URL: https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata
License(s): other
tmdb-movie-metadata.zip: Skipping, found more recently modified local copy (use --force to force download)


Extracting files

In [None]:
import zipfile
zip_ref = zipfile.ZipFile('/content/tmdb-movie-metadata.zip', 'r')
zip_ref.extractall('/content')
zip_ref.close()

Importing required librariesand reading files with the help of pandas dataframe.                        

In [None]:
import pandas as pd
import numpy as np
import ast
from typing import List, Any, Optional, Tuple, Union
movies = pd.read_csv("/content/tmdb_5000_movies.csv")
credits = pd.read_csv("/content/tmdb_5000_credits.csv")
credits.shape


(4803, 4)

In [None]:
movies.shape

(4803, 20)

# **Data Pre-processing**

Merging the credits dataframe on 'title' of the movies dataframe

In [None]:
#merge two csv files on movie_id -- merge two dataframes
movies = movies.merge(credits,on='title')
movies.shape

(4809, 23)

Keeping only required features in our dataframe(Namely : genres, id,keywords, title, overview, cast and crew)

In [None]:
#genere
#id to fetch poster
#keywords
#title
#overview
#cast
#crew
#considering only 7 colums to create tag
movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]
movies.head(2)


Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


Find out if our dataframe has any missing values.

In [None]:
movies.isnull().sum()

movie_id    0
title       0
overview    3
genres      0
keywords    0
cast        0
crew        0
dtype: int64

Drop the missing values if any.

In [None]:
# remove null rows
movies.dropna(inplace=True)
movies.shape

(4806, 7)

Find out if there are any duplicate values.

In [None]:
movies.duplicated().sum()

0

movies.iloc[0].genres retrieves the value in the 'genres' column of the first row of the movies

In [None]:
a=movies.iloc[0].genres
a

'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

In [None]:
ast.literal_eval(a)

[{'id': 28, 'name': 'Action'},
 {'id': 12, 'name': 'Adventure'},
 {'id': 14, 'name': 'Fantasy'},
 {'id': 878, 'name': 'Science Fiction'}]

Function  to convert string text into a List.

In [None]:
# convert the given genres in [genre1,genres2,genres3] form {"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}
def convert(text: str) -> List[str]:
  """
  Convert a string representation of a list of dictionaries to a list of names.

  Parameters:
  text (str): String representation of a list of dictionaries.

  Returns:
  List[str]: List of names.
  """
  L: List[str] = []
  for i in ast.literal_eval(text): # convert str to list
      L.append(i['name'])
  return L


### **Converting string text of columns to list**

The **convert()** function is applied to the 'genres' column of movie dataframe.

In [None]:
movies['genres']=movies['genres'].apply(convert)

In [None]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


The **convert()** function is then applied to the 'keywords' column in the dataframe.

In [None]:
movies['keywords']=movies['keywords'].apply(convert)
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


Converting the string values of the cast members to a list.

In [None]:
movies['cast']=movies['cast'].apply(convert)

In [None]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weave...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley, ...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, Léa Seydoux, R...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman, A...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton,...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [None]:
movies['cast'] = movies['cast'].apply(lambda x:x[0:3])

In [None]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley]","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, Léa Seydoux]","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman]","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton]","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


Function to convert the string to a list where 'job' is equal to 'Director'.

In [None]:
def fetch_director(text: str) -> List[str]:
    """
    Convert a string representation of a list of dictionaries to a list containing the director's name.

    Parameters:
    text (str): String representation of a list of dictionaries.

    Returns:
    List[str]: List containing the director's name.
    """
    L: List[str] = []
    for i in ast.literal_eval(text):
        if i['job'] == 'Director':
            L.append(i['name'])
            break

    return L

In [None]:
movies['crew'] = movies['crew'].apply(fetch_director)
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron]
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley]",[Gore Verbinski]
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, Léa Seydoux]",[Sam Mendes]
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman]",[Christopher Nolan]
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton]",[Andrew Stanton]


### **Convert 'Overview' column to list format**

The lambda function is applied on the 'overview' column of the dataframe. lambda x: x.split() takes a string x as input and returns a list of substrings obtained by splitting x at whitespace.

In [None]:
type(movies['overview'][0])
# to get list of tags all 5 colums should be in list format. overview is in String format so make it list
movies['overview'] = movies['overview'].apply(lambda x:x.split())

In [None]:
movies['overview'][0]

['In',
 'the',
 '22nd',
 'century,',
 'a',
 'paraplegic',
 'Marine',
 'is',
 'dispatched',
 'to',
 'the',
 'moon',
 'Pandora',
 'on',
 'a',
 'unique',
 'mission,',
 'but',
 'becomes',
 'torn',
 'between',
 'following',
 'orders',
 'and',
 'protecting',
 'an',
 'alien',
 'civilization.']

### **Remove space between a string to make it unique**

This function removes spaces from each string in a list of strings.

In [None]:
# remove space between a string to make it unique. eg. SamWorthington - cast and SamMendes - crew in tags sam and mendes will become different. so system will get confused between 2 sams
def remove_space(word: Optional[List[str]]) -> Optional[List[str]]:
    """
    Remove spaces from each string in a list of strings.

    Parameters:
    word (Optional[List[str]]): List of strings to process.

    Returns:
    Optional[List[str]]: List of strings with spaces removed. Returns None if input is None.
    """
    if word is None:
        return None

    l: List[str] = []
    for i in word:
        l.append(i.replace(" ", ""))
    return l


Applying the function to columns 'cast', 'crew', 'genres' and 'overview' of the datframe.

In [None]:
movies['cast'] = movies['cast'].apply(remove_space)
movies['crew'] = movies['crew'].apply(remove_space)
movies['genres'] = movies['genres'].apply(remove_space)
movies['keywords'] = movies['keywords'].apply(remove_space)

In [None]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron]
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski]
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes]
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan]
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton]


### **Combining all columns into one named as 'tags'**

Creating a new column/feature named 'tags' in our dataframe.

In [None]:
movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']

In [None]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron],"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski],"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes],"[A, cryptic, message, from, Bond’s, past, send..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan],"[Following, the, death, of, District, Attorney..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton],"[John, Carter, is, a, war-weary,, former, mili..."


### **Creating a new dataframe with the below mentioned columns from movies.**

In [None]:
new = movies.drop(columns=['overview','genres','keywords','cast','crew'])

In [None]:
new['tags'][0]

['In',
 'the',
 '22nd',
 'century,',
 'a',
 'paraplegic',
 'Marine',
 'is',
 'dispatched',
 'to',
 'the',
 'moon',
 'Pandora',
 'on',
 'a',
 'unique',
 'mission,',
 'but',
 'becomes',
 'torn',
 'between',
 'following',
 'orders',
 'and',
 'protecting',
 'an',
 'alien',
 'civilization.',
 'Action',
 'Adventure',
 'Fantasy',
 'ScienceFiction',
 'cultureclash',
 'future',
 'spacewar',
 'spacecolony',
 'society',
 'spacetravel',
 'futuristic',
 'romance',
 'space',
 'alien',
 'tribe',
 'alienplanet',
 'cgi',
 'marine',
 'soldier',
 'battle',
 'loveaffair',
 'antiwar',
 'powerrelations',
 'mindandsoul',
 '3d',
 'SamWorthington',
 'ZoeSaldana',
 'SigourneyWeaver',
 'JamesCameron']

### **Converting tags into string**

In [None]:

new['tags'] = new['tags'].apply(lambda x: " ".join(x))
new.head()


Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...
4,49529,John Carter,"John Carter is a war-weary, former military ca..."


In [None]:
new['tags'][0]

'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. Action Adventure Fantasy ScienceFiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d SamWorthington ZoeSaldana SigourneyWeaver JamesCameron'

### **Converting 'tags' value to lower case**

In [None]:
new['tags']=new['tags'].apply(lambda x:x.lower())
new.head()
new.iloc[0]['tags']

'in the 22nd century, a paraplegic marine is dispatched to the moon pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. action adventure fantasy sciencefiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d samworthington zoesaldana sigourneyweaver jamescameron'

### **Importing NLTK and Porter Stemmer**

NLTK (Natural Language Toolkit) is imported along with the PorterStemmer class, which is a stemming algorithm that reduces words to their root or base form.

In [None]:
import nltk
from nltk.stem import PorterStemmer
ps = PorterStemmer()

**Steps:**
1. stems() function takes a string text as input and performs stemming on each word using the PorterStemmer (ps).
2. It splits the input text into words, applies stemming to each word, and collects the stemmed words in list T.
3. Finally, it joins the stemmed words back into a single string separated by spaces and returns this string.

In [None]:
def stems(text: str) -> str:
    """
    Perform stemming on a text by applying Porter stemming algorithm.

    Parameters:
    text (str): Input text to be stemmed.

    Returns:
    str: Stemmed text where each word has been reduced to its root form.
    """
    T = []
    for i in text.split():
        T.append(ps.stem(i))
    return " ".join(T)
new['tags'] = new['tags'].apply(stems)
new.iloc[0]['tags']
#make [dancing,dance,danced] -> [danc,danc,danc]

'in the 22nd century, a parapleg marin is dispatch to the moon pandora on a uniqu mission, but becom torn between follow order and protect an alien civilization. action adventur fantasi sciencefict cultureclash futur spacewar spacecoloni societi spacetravel futurist romanc space alien tribe alienplanet cgi marin soldier battl loveaffair antiwar powerrel mindandsoul 3d samworthington zoesaldana sigourneyweav jamescameron'

# **Converting a collection of text documents into a matrix of token counts**

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
def vectorize_tags()-> np.ndarray:

  """
  Vectorizes tags data using CountVectorizer.

  Returns:
  np.ndarray: A numpy array containing the vectorized representation of tags.

  """
  # dont need stop words like and are in to from
  # Initialize CountVectorizer with max_features and stop_words='english'
  cv = CountVectorizer(max_features=5000, stop_words='english')

  # Fit and transform the tags data using CountVectorizer, and convert to dense array
  vector = cv.fit_transform(new['tags']).toarray()

  # Print the feature names (first 5000 words)
  print(cv.get_feature_names_out())

  return vector

vector = vectorize_tags()

# Print the shape of the resulting vector
print("Shape of vector:", vector.shape)

['000' '007' '10' ... 'zone' 'zoo' 'zooeydeschanel']
Shape of vector: (4806, 5000)


# **Calculate Similarity**

Calculating the cosine similarity matrix using the cosine_similarity function


In [None]:
from sklearn.metrics.pairwise import cosine_similarity
def calculate_similarity() -> np.ndarray:
  """
  Calculate cosine similarity matrix for a given vectorized data.

  Returns:
  np.ndarray: Cosine similarity matrix where each element (i, j) represents the cosine similarity
  between vector i and vector j in the input matrix.
  """
  # Calculate cosine similarity
  similarity = cosine_similarity(vector)

  return similarity

similarity=calculate_similarity()
index = new[new['title'] == 'Spectre'].index[0]  # Get index of 'Spectre' title in 'new' DataFrame
similarity_scores = similarity[index]

print("Similarity scores for 'Spectre':", similarity_scores)

Similarity scores for 'Spectre': [0.0860309  0.06063391 1.         ... 0.02451452 0.         0.        ]


In [None]:
similarity.shape

(4806, 4806)

In [None]:
new

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"in the 22nd century, a parapleg marin is dispa..."
1,285,Pirates of the Caribbean: At World's End,"captain barbossa, long believ to be dead, ha c..."
2,206647,Spectre,a cryptic messag from bond’ past send him on a...
3,49026,The Dark Knight Rises,follow the death of district attorney harvey d...
4,49529,John Carter,"john carter is a war-weary, former militari ca..."
...,...,...,...
4804,9367,El Mariachi,el mariachi just want to play hi guitar and ca...
4805,72766,Newlyweds,a newlyw couple' honeymoon is upend by the arr...
4806,231617,"Signed, Sealed, Delivered","""signed, sealed, delivered"" introduc a dedic q..."
4807,126186,Shanghai Calling,when ambiti new york attorney sam is sent to s...


In [None]:
similarity[0].shape

(4806,)

In [None]:
#sorted(list(similarity[0]),reverse=True)
sorted(list(enumerate(similarity[0])),reverse=True,key = lambda x: x[1])[1:6]

[(1216, 0.28676966733820225),
 (2409, 0.26901379342448517),
 (3730, 0.2605130246476754),
 (507, 0.255608593705383),
 (539, 0.25038669783359574)]

# **Recommend movie functionality**

**Steps required for recommend function**

Input: Takes a movie title (movie) as input.

Step 1: Finds the index of the input movie in the dataset.

Step 2: Calculates similarity scores between the input movie and all other movies.

Step 3: Sorts these movies based on similarity scores in descending order.

Step 4: Prints the titles of the top 5 most similar movies.

In [None]:
# avatar has 0 index to it will go to index 0 in similrity matrix
def recommend(movie: str) -> None:
  """
    Recommend similar movies based on a given movie title.

    Parameters:
    movie (str): Title of the movie to find recommendations for.

    Returns:
    None
  """
  index: int = new[new['title'] == movie].index[0]
  #first row in the name. 0 to 4806 val
  movies_list : List[tuple]= sorted(list(enumerate(similarity[index])),reverse=True,key = lambda x: x[1])
  # sort in reverse and dont loose index position beacuse of sorting to do this use enumearate- sorting based on 2nd position
  # fetch only first 5 similar movies
  for i in movies_list[1:6]:
      print(new.iloc[i[0]].title)

The recommend function takes string arguments -> movie name.

In [None]:
recommend('Spider-Man 2')

Spider-Man 3
Spider-Man
The Amazing Spider-Man
Iron Man 2
Superman


Pickle files are generated, they are used in the front-end implementation.

In [None]:
import pickle
pickle.dump(new,open('movie_list.pkl','wb'))
pickle.dump(similarity,open('similarity.pkl','wb'))