# Bollywood Movie Recommendation System

### Selected Project Track Content Recommendation System

### Problem Statement
With thousands of movies available on streaming platforms, user often find it difficult to discover movies aligned with their interests.
This project aims to build a content-based movie recommendation system that suggests movies similar to a selected movie based on its 
content features.

### Objective
The objective of this project is to build a content-based recommendation system that suggests Bollywood movies similar to a selected movie using metadata such as genre, overview, director, and cast.

### Real-World Relevance
Content-based recommendation systems are widely used by platforms like Netflix and Amazon Prime to enhance user experience by providing personalized 
movie suggestions.

## Import Required Libraries

The following libraries are used for data handling, text processing, and similarity computation.


In [12]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from ipywidgets import interact, widgets

## Data Loading

The Bollywood movie dataset is loaded from a CSV file.


In [13]:
movies=pd.read_csv('movie_dataset.csv')
movies.head()

Unnamed: 0.1,Unnamed: 0,movie_id,movie_name,year,genre,overview,director,cast
0,0,tt15354916,Jawan,2023,"Action, Thriller",A high-octane action thriller which outlines t...,Atlee,"Shah Rukh Khan, Nayanthara, Vijay Sethupathi, ..."
1,1,tt15748830,Jaane Jaan,2023,"Crime, Drama, Mystery",A single mother and her daughter who commit a ...,Sujoy Ghosh,"Kareena Kapoor, Jaideep Ahlawat, Vijay Varma, ..."
2,2,tt11663228,Jailer,2023,"Action, Comedy, Crime",A retired jailer goes on a manhunt to find his...,Nelson Dilipkumar,"Rajinikanth, Mohanlal, Shivarajkumar, Jackie S..."
3,3,tt14993250,Rocky Aur Rani Kii Prem Kahaani,2023,"Comedy, Drama, Family",Flamboyant Punjabi Rocky and intellectual Beng...,Karan Johar,"Ranveer Singh, Alia Bhatt, Dharmendra, Shabana..."
4,4,tt15732324,OMG 2,2023,"Comedy, Drama",An unhappy civilian asks the court to mandate ...,Amit Rai,"Pankaj Tripathi, Akshay Kumar, Yami Gautam, Pa..."


## Dataset Overview

Basic information about the dataset including shape, column names, and data types.


In [14]:
 movies.shape
 movies.info()
 movies.isnull().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2199 entries, 0 to 2198
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  2199 non-null   int64 
 1   movie_id    2199 non-null   object
 2   movie_name  2199 non-null   object
 3   year        2134 non-null   object
 4   genre       2199 non-null   object
 5   overview    2199 non-null   object
 6   director    2199 non-null   object
 7   cast        2199 non-null   object
dtypes: int64(1), object(7)
memory usage: 137.6+ KB


Unnamed: 0     0
movie_id       0
movie_name     0
year          65
genre          0
overview       0
director       0
cast           0
dtype: int64

## Data Cleaning & Preprocessing

- Removed unnecessary index column
- Handled missing values
- Combined relevant text features into a single column

An extra column (`Unnamed: 0`) was found in the dataset due to CSV indexing and was removed as it does not contribute to the recommendation process.
The year column was not used for similarity computation as it does not contribute to content similarity.


In [15]:
movies = movies.drop(columns=["Unnamed: 0"])
movies.fillna("",inplace=True)

In [16]:
movies["combined_features"]=(movies["genre"]+" "+movies["overview"]+" "+movies["director"]+" "+movies["cast"])
movies[["movie_name","combined_features"]].head()

Unnamed: 0,movie_name,combined_features
0,Jawan,"Action, Thriller A high-octane action thriller..."
1,Jaane Jaan,"Crime, Drama, Mystery A single mother and her ..."
2,Jailer,"Action, Comedy, Crime A retired jailer goes on..."
3,Rocky Aur Rani Kii Prem Kahaani,"Comedy, Drama, Family Flamboyant Punjabi Rocky..."
4,OMG 2,"Comedy, Drama An unhappy civilian asks the cou..."


## Model / System Design

The recommendation system is based on:
- TF-IDF vectorization for text representation
- Cosine similarity to measure similarity between movies


In [17]:
tfidf=TfidfVectorizer(stop_words="english")
feature_matrix=tfidf.fit_transform(movies["combined_features"])
feature_matrix.shape

(2199, 11196)

In [18]:
cosine_sim=cosine_similarity(feature_matrix)
cosine_sim.shape

(2199, 2199)

## User Input Based Recommendation

The system takes a movie name as input from the user and recommends similar Bollywood movies based on content similarity.


In [19]:
def recommended_movies_user_input(movie_name,top_n=5):
    movie_titles=movies["movie_name"].str.lower()
    if movie_name.lower() not in movie_titles.values:
        print("Movie not found in dataset.Please try again")
        return
    idx=movie_titles[movie_titles==movie_name.lower()].index[0]
    similarity_scores=list(enumerate(cosine_sim[idx]))
    similarity_scores=sorted(similarity_scores, key=lambda x:x[1], reverse=True)
    similarity_scores=similarity_scores[1:top_n+1]
    print("\n recommended movies\n")
    for i, score in similarity_scores:
        print("-",movies.iloc[i]["movie_name"])

movies_input_widget=widgets.Text(
    description='Movie Name:',
    placeholder='Type movie name here...'
)

interact(recommended_movies_user_input, movie_name=movies_input_widget)

interactive(children=(Text(value='', description='Movie Name:', placeholder='Type movie name here...'), IntSliâ€¦

<function __main__.recommended_movies_user_input(movie_name, top_n=5)>

## Evaluation & Analysis

The recommendation system provides relevant movie suggestions based on textual similarity.  
Cosine similarity is used as the evaluation metric for measuring similarity between movies.

### Strengths
- No user ratings required
- Simple and explainable
- Efficient and scalable

### Limitations
- No personalization
- Dependent on quality of metadata


## Ethical Considerations & Responsible AI

- No personal or sensitive user data is used
- Dataset bias may affect recommendations
- Recommendations are assistive, not prescriptive


## Conclusion & Future Scope

A content-based Bollywood movie recommendation system was successfully implemented using NLP techniques.

### Future Scope
- Hybrid recommendation system
- User rating integration
- Web application deployment
