# 🎬 Movie Recommender System
This is a content-based movie recommender system using the TMDB 5000 movie dataset. It recommends movies based on plot, genre, and keywords.


## Step 1: Importing Libraries and Loading Dataset
We load the TMDB 5000 dataset to begin our recommendation engine.


In [1]:
import pandas as pd 

In [2]:
df = pd.read_csv("tmdb_5000_movies.csv")

## Step 2: Handling Missing Data
We fill null values in key columns like `overview`, `tagline`, and format the `release_date`.


In [3]:
df.isnull().sum()

id                        0
genres                    0
keywords                  0
original_language         0
original_title            0
overview                  3
popularity                0
production_companies      0
release_date              1
revenue                   0
tagline                 844
vote_average              0
dtype: int64

In [4]:
df['overview']= df['overview'].fillna("")
df['tagline']= df['tagline'].fillna("")
df['release_date'] = df['release_date'].fillna("Unknown")

## Step 3: Cleaning JSON Columns
The `genres`, `keywords` and `production companies` columns contain stringified dictionaries. We extract only the `name` values.


In [5]:
import ast

def extract_names(text):
    try:
        return " ".join([item['name'] for item in ast.literal_eval(text)])
    except:
        return ""

In [6]:
df['genres'] = df['genres'].apply(extract_names)
df['keywords'] = df['keywords'].apply(extract_names)
df['production_companies'] = df['production_companies'].apply(extract_names)

## Step 4: Creating the Tags Column
We combine `overview`, `genres`, `keywords`, and `tagline` into one `tags` column that represents each movie's content.


In [7]:
df['tags'] = df['overview'] + " " + df['genres'] + " " + df['keywords'] + " " + df['tagline']
df['tags'] = df['tags'].str.lower()

## Step 5: Vectorizing the Tags
We use CountVectorizer to convert tags into numerical vectors for similarity calculation.


In [8]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=5000, stop_words='english')
vectors = cv.fit_transform(df['tags']).toarray()

## Step 6: Calculating Cosine Similarity
We create a similarity matrix to find how closely related two movies are based on content.


In [9]:
from sklearn.metrics.pairwise import cosine_similarity

In [10]:
similarity = cosine_similarity(vectors)

## Step 7: Recommendation Function
This function takes a movie name and recommends 5 similar movies using cosine similarity.


In [11]:
def recommend(movie):
    movie = movie.lower()
    if movie not in df['original_title'].str.lower().values:
        return "movie not found."
    index = df[df['original_title'].str.lower() == movie].index[0]
    distances = list(enumerate(similarity[index]))
    distances = sorted(distances, key=lambda x:x[1], reverse = True)[1:6]
    for i in distances:
        print (df.iloc[i[0]].original_title)

In [12]:
df.to_csv("tmdb_cleaned.csv", index=False)

## 🔚 Summary
This content-based recommender suggests movies similar to the one entered by the user. It uses cosine similarity on feature-engineered text data combining plot, genres, keywords, and taglines.
