###Bollywood Movie Recommendation System

##Selected Project Track Content Recommendation System

#Problem Statement

With thousands of movies available on streaming platforms, user often find it difficult to discover movies aligned with their interests. This project aims to build a content-based movie recommendation system that suggests movies similar to a selected movie based on its content features.

#Objective

The objective of this project is to build a content-based recommendation system that suggests Bollywood movies similar to a selected movie using metadata such as genre, overview, director, and cast.

#Real-World Relevance

Content-based recommendation systems are widely used by platforms like Netflix and Amazon Prime to enhance user experience by providing personalized movie suggestions.

#Import Required Libraries

The following libraries are used for data handling, text processing, and similarity computation

In [None]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

##Data Loading

The Bollywood movie dataset is loaded from a CSV file.

In [None]:
movies=pd.read_csv('movie_dataset.csv')
movies.head()

##Dataset Overview

Basic information about the dataset including shape, column names, and data types.

In [None]:
movies.shape
movies.info()
movies.isnull().sum()

##Data Cleaning & Preprocessing

-Removed unnecessary index column

-Handled missing values

-Combined relevant text features into a single column

-An extra column (Unnamed: 0) was found in the dataset due to CSV indexing and was removed as it does not contribute to the recommendation process.

-The year column was not used for similarity computation as it does not contribute to content similarity.



In [None]:
movies = movies.drop(columns=["Unnamed: 0"], errors="ignore")
movies.fillna(" ", inplace=True)

In [None]:
movies["combined_features"]=(movies["genre"]+" "+movies["overview"]+" "+movies["director"]+" "+movies["cast"])
movies[["movie_name","combined_features"]].head()

##Model / System Design
The recommendation system is based on:

TF-IDF vectorization for text representation

Cosine similarity to measure similarity between movies

In [None]:
tfidf=TfidfVectorizer(stop_words="english")
feature_matrix=tfidf.fit_transform(movies["combined_features"])
feature_matrix.shape

In [None]:
cosine_sim=cosine_similarity(feature_matrix)
cosine_sim.shape

##User Input Based Recommendation
The system takes a movie name as input from the user and recommends similar Bollywood movies based on content similarity.

#Instructions :
The dataset contains movies only up to the year 2023; therefore, newly released movies are not included in the CSV file.
To ensure correct results, all code cells must be executed sequentially from top to bottom before providing any input.

In [None]:
def recommended_movies_user_input(movie_name,top_n=5):
    movie_titles=movies["movie_name"].str.lower()
    if movie_name.lower() not in movie_titles.values:
        print("Movie not found in dataset.Please try again")
        return
    idx=movie_titles[movie_titles==movie_name.lower()].index[0]
    similarity_scores=list(enumerate(cosine_sim[idx]))
    similarity_scores=sorted(similarity_scores, key=lambda x:x[1], reverse=True)
    similarity_scores=similarity_scores[1:top_n+1]
    print("\n recommended movies\n")
    for i, score in similarity_scores:
        print("-",movies.iloc[i]["movie_name"])
movie_name = input("Enter movie name: ")
recommended_movies_user_input(movie_name)

##Evaluation & Analysis
The recommendation system provides relevant movie suggestions based on
textual similarity.

Cosine similarity is used as the evaluation metric for measuring similarity between movies.


##Strengths
No user ratings required

Simple and explainable

Efficient and scalable

##Limitations
No personalization

Dependent on quality of metadata

##Ethical Considerations & Responsible AI

No personal or sensitive user data is used

Dataset bias may affect recommendations

Recommendations are assistive, not prescriptive
##Conclusion & Future Scope
A content-based Bollywood movie recommendation system was successfully implemented using NLP techniques.

##Future Scope
Hybrid recommendation system

User rating integration

Web application deployment