#Movie Recommendation System Using Collaborative Filtereing

##Introduction
A recommendation system is a technology that provides personalized suggestions or recommendations to users based on their preferences, interests, and past behaviors. It is widely used in various industries, including e-commerce, streaming platforms, social media and more to enhance user experience, increase engagement and drive sales.
Recommendation systems utilize various techniques and algorithms to analyze and understand user data. Here are a few commonly used approaches:
1. Collaborative Filtering
2. Content-Based Filtering
3. Hybrid

**Here in this system we use Collaborative filtering technique.**

**Collaborative Filtering** is a popular technique used in recommendation systems that aims to provide personalized recommendations to users based on their preferences and behaviors. It leverages the collective information from a group of users to make predictions and suggest items that the users might be interested in.


The core idea behind collaborative filtering is that if two users have similar tastes and preferences, the items they like or dislike are likely to be similar as well. By analyzing the historical data of user-item interactions, such as ratings, reviews, or purchase history, collaborative filtering algorithms can identify patterns and similarities between users to make recommendations.

##Problem Statement

The movie industry has witnessed a tremendous growth in content, with a vast array of movies available across different genres and languages.But there are few works on Bangla Movie Recommendation System. To enhance our local user experience and improve movie discovery, there is a need to develop an effective recommendation system that can provide personalized movie recommendations to users based on other user rating for a specific movie.


The key challenges that need to be addressed in developing a collaborative filtering recommendation system for **Bangla Movie** recommendations are:


**Data Sparsity:** The user-movie interaction data available for training the recommendation system is often sparse, as users typically only rate or interact with a small fraction of the vast movie catalog. This sparsity poses a challenge in accurately capturing user preferences and finding similar users or movies.


**Cold-Start Problem:** The system needs to handle the cold-start problem, wherein new users or movies have limited or no historical data. The recommendation system must be able to make relevant movie recommendations even for these users or movies with limited information.


**Scalability and Performance:** As the movie recommendation system needs to handle a large number of users and movies, it should be scalable and capable of processing real-time recommendation requests. It should efficiently process large volumes of data to provide timely and accurate recommendations.



**Overcoming Genre Bias:** The existing recommendation system may have a bias towards popular genres, resulting in a limited diversity of movie recommendations. The collaborative filtering system should mitigate this bias and provide a balance between popular genres and niche or personalized movie recommendations.




The objective is to develop a collaborative filtering recommendation system that leverages user-movie interaction data to provide accurate and personalized movie recommendations, overcoming challenges such as data sparsity, cold-start problem, scalability, and genre bias. The system aims to enhance user satisfaction, improve movie discovery, and increase user engagement on the movie platform.

###Data Collection and Preprocessing

####Importing Library
Here we import our required libray for our system

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

###Dataset
We collect our dataset from [Kaggle](https://www.kaggle.com/datasets/codernob/bangla-movie-userrating-dataset)

This dataset contains 4987 reviews by users on bangla movies in the IMDb website. There are 8 columns in the dataset which are:

1.   User_No.
Signifies order at which user reviews were collected.
2.   User_name
Username of the user's IMDb account
3.  Review title
Title of the textual review.
4. Review Rating
Numerical rating of the movie given by a specific user.
5. .Review date
Date at which the review was published.
6. Review_body
The textual review given by a user to a certain movie.
7. Movie_name
Name of the movie which was given a review.
8. Movie_ID
IMDb ID of the movie. This can be used to create a link to the movie's IMDb page.
9. Image_ID


We collect the dataset as a CSV (comma-separated values) file and load it in our notebook. Here in the dataset there were no Image link for the movies. We provide it in addition to visualize our system more beautifully.

In [None]:
df = pd.read_csv('DataSet - DataSet(1).csv')

In [None]:
df.head()

Unnamed: 0,User_No.,User_name,Review title,Review Rating,Review date,Review_body,Movie_name,Movie_ID,Image_ID
0,0,adnannizhum,One Time Watch,7,1-Nov-21,It is an average film. One time watch. But Faz...,Khachar Bhitor Ochin Pakhi,tt15756034,https://m.media-amazon.com/images/M/MV5BN2NlZT...
1,1,SoumikBanerjee25,These kinds of subjects demand a rather seriou...,3,20-Nov-21,Another sub-par attempt from the Bengali Indus...,F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...
2,2,MandalBros-5,A GOOD Bengali Film. ðŸ‘Œ,8,30-Oct-21,"The story of this film is very simple, a polic...",F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...
3,3,shovonbhattachrjee,Good Movie,6,26-Oct-21,The storyline of this movie is good and the ac...,F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...
4,4,anandolodh-96284,Good watch,7,23-Oct-21,Actually this plot is really strong but direct...,F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...


In [None]:
df.shape

(4432, 9)

In [None]:
df.columns

Index(['User_No.', 'Review Rating', 'Review date', 'User_name', 'Movie_name',
       'Movie_ID', 'Image_ID'],
      dtype='object')

Here we create a new frame including the required column for the system.

In [None]:
df1=df[['User_No.', 'Review Rating', 'User_name', 'Movie_name',
       'Movie_ID', 'Image_ID']]

In [None]:
df1

Unnamed: 0,User_No.,Review Rating,User_name,Movie_name,Movie_ID,Image_ID
0,0,7,adnannizhum,Khachar Bhitor Ochin Pakhi,tt15756034,https://m.media-amazon.com/images/M/MV5BN2NlZT...
1,1,3,SoumikBanerjee25,F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...
2,2,8,MandalBros-5,F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...
3,3,6,shovonbhattachrjee,F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...
4,4,7,anandolodh-96284,F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...
...,...,...,...,...,...,...
4427,4982,1,yash-mahendra,The River,tt0043972,https://m.media-amazon.com/images/M/MV5BMzZiYz...
4428,4983,9,Pierre_Christen,The River,tt0043972,https://m.media-amazon.com/images/M/MV5BMzZiYz...
4429,4984,1,miguelopp,The River,tt0043972,https://m.media-amazon.com/images/M/MV5BMzZiYz...
4430,4985,5,dgerroll,The River,tt0043972,https://m.media-amazon.com/images/M/MV5BMzZiYz...


In [None]:
df['Movie_name'].value_counts()

Gumnaami               70
A Death in the Gunj    67
Asur                   50
Tiki Taka              50
Extraction             47
                       ..
Prem Kaa Game           1
Idiot Box               1
Jodi Ekdin              1
Dui Prithibi            1
Kankal                  1
Name: Movie_name, Length: 761, dtype: int64

In [None]:
count_series = df['Movie_name'].value_counts()
count_of_apple = count_series.get('Tiki Taka', 0)
print(count_series)
print(count_of_apple)

Gumnaami               70
A Death in the Gunj    67
Asur                   50
Tiki Taka              50
Extraction             47
                       ..
Prem Kaa Game           1
Idiot Box               1
Jodi Ekdin              1
Dui Prithibi            1
Kankal                  1
Name: Movie_name, Length: 761, dtype: int64
50


In [None]:
df1['User_name'].value_counts()

smkbsws                         64
mysonamartya                    36
msunando                        28
sumankumarganguly-454-264875    26
MandalBros-5                    23
                                ..
koushikgd                        1
soli-873-85671                   1
musicguha                        1
joannewritesfor                  1
a_la_bakwaas                     1
Name: User_name, Length: 3115, dtype: int64

We select only the unique value for User_Name column.

In [None]:
df1['User_name'].unique().shape

(3115,)

We select the user who give more than 1 rating for a movie for better result.

In [None]:
x = df1['User_name'].value_counts() > 1


In [None]:
x[x].shape

(520,)

In [None]:
y = x[x].index

In [None]:
y

Index(['smkbsws', 'mysonamartya', 'msunando', 'sumankumarganguly-454-264875',
       'MandalBros-5', 'SAMTHEBESTEST', 'sarkarsarbartha',
       'shovonbhattachrjee', 'rupak_speaking', 'iamaditisengupta',
       ...
       'azharmubin-194-580711', 'bkwrmgrl1', 'drsaicat', 'pradiptadas1982',
       'amirahul', 'tanvirahmedfahim', 'amrkroy', 'abhisheksaha-619', 'sol-',
       'proud_luddite'],
      dtype='object', length=520)

Here we compress our total data into our required way

In [None]:
dfSort = df1[df1['User_name'].isin(y)]

In [None]:
dfSort

Unnamed: 0,User_No.,Review Rating,User_name,Movie_name,Movie_ID,Image_ID
1,1,3,SoumikBanerjee25,F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...
2,2,8,MandalBros-5,F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...
3,3,6,shovonbhattachrjee,F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...
7,7,8,MandalBros-5,Hobu Chandra Raja Gobu Chandra Montri,tt15380598,https://m.media-amazon.com/images/M/MV5BN2Y0Y2...
9,9,3,senanindya,Munshigiri,tt15245506,https://m.media-amazon.com/images/M/MV5BMzg0Mz...
...,...,...,...,...,...,...
4398,4953,7,evanston_dad,The River,tt0043972,https://m.media-amazon.com/images/M/MV5BMzZiYz...
4400,4955,7,lasttimeisaw,The River,tt0043972,https://m.media-amazon.com/images/M/MV5BMzZiYz...
4401,4956,1,howard.schumann,The River,tt0043972,https://m.media-amazon.com/images/M/MV5BMzZiYz...
4403,4958,9,Amyth47,The River,tt0043972,https://m.media-amazon.com/images/M/MV5BMzZiYz...


In [None]:
dfSort['User_name'].value_counts()

smkbsws                         64
mysonamartya                    36
msunando                        28
sumankumarganguly-454-264875    26
MandalBros-5                    23
                                ..
souravray-kol                    2
ketgup83                         2
jahidhasan2009                   2
thirdvantagepoint                2
evanston_dad                     2
Name: User_name, Length: 520, dtype: int64

In [None]:
num_rating = dfSort.groupby('Movie_name')['Review Rating'].count().reset_index()

In [None]:
num_rating

Unnamed: 0,Movie_name,Review Rating
0,22 Shey Shraban,6
1,27 B Beadon Street,4
2,36 Chowringhee Lane,3
3,80 te Asio Na,1
4,A Death in the Gunj,15
...,...,...
568,Yuddha,1
569,Yugant,1
570,Zero Degree,1
571,Zombiesthaan,3


In [None]:
num_rating.rename(columns={"Review Rating": "num_of_rating"}, inplace=True)

In [None]:
num_rating

Unnamed: 0,Movie_name,num_of_rating
0,22 Shey Shraban,6
1,27 B Beadon Street,4
2,36 Chowringhee Lane,3
3,80 te Asio Na,1
4,A Death in the Gunj,15
...,...,...
568,Yuddha,1
569,Yugant,1
570,Zero Degree,1
571,Zombiesthaan,3


In [None]:
dfSort_F = dfSort.merge(num_rating, on='Movie_name')

In [None]:
dfSort_F

Unnamed: 0,User_No.,Review Rating,User_name,Movie_name,Movie_ID,Image_ID,num_of_rating
0,1,3,SoumikBanerjee25,F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...,3
1,2,8,MandalBros-5,F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...,3
2,3,6,shovonbhattachrjee,F.I.R NO. 339/07/06,tt15698592,https://m.media-amazon.com/images/M/MV5BZTA3MT...,3
3,7,8,MandalBros-5,Hobu Chandra Raja Gobu Chandra Montri,tt15380598,https://m.media-amazon.com/images/M/MV5BN2Y0Y2...,1
4,9,3,senanindya,Munshigiri,tt15245506,https://m.media-amazon.com/images/M/MV5BMzg0Mz...,2
...,...,...,...,...,...,...,...
1832,4953,7,evanston_dad,The River,tt0043972,https://m.media-amazon.com/images/M/MV5BMzZiYz...,9
1833,4955,7,lasttimeisaw,The River,tt0043972,https://m.media-amazon.com/images/M/MV5BMzZiYz...,9
1834,4956,1,howard.schumann,The River,tt0043972,https://m.media-amazon.com/images/M/MV5BMzZiYz...,9
1835,4958,9,Amyth47,The River,tt0043972,https://m.media-amazon.com/images/M/MV5BMzZiYz...,9


We sort the dataframe where there is more than two rating is available for one Movie for better result in the model. Value less than 2 has no significant in model.

In [None]:
dfSort_Final = dfSort_F[dfSort_F['num_of_rating']>2]

In [None]:
dfSort_Final.sample(10)

Unnamed: 0,User_No.,Review Rating,User_name,Movie_name,Movie_ID,Image_ID,num_of_rating
361,1237,5,rupak_speaking,Bornoporichoy: A Grammar of Death,tt10651188,https://m.media-amazon.com/images/M/MV5BNzRkZT...,6
566,1692,9,pritambag,Debi,tt6520954,https://m.media-amazon.com/images/M/MV5BZDMwYm...,13
1755,4835,9,subhodeep,The Philosopher's Stone,tt0052046,https://m.media-amazon.com/images/M/MV5BZTMzMW...,5
586,1730,7,mkm-31183,Swapnajaal,tt5291604,https://m.media-amazon.com/images/M/MV5BODU2MT...,8
135,582,6,shovonbhattachrjee,Rawkto Rawhoshyo,tt12339374,https://m.media-amazon.com/images/M/MV5BOWJkYT...,5
564,1688,6,mmh-wakim32,Poramon 2,tt6750884,https://m.media-amazon.com/images/M/MV5BZDBjNj...,3
1533,4433,1,maharani_md,Distant Thunder,tt0069737,https://m.media-amazon.com/images/M/MV5BODZjNz...,6
342,1165,4,Ring_of_Sun,Gumnaami,tt10834986,https://m.media-amazon.com/images/M/MV5BZTE2OG...,22
898,2564,8,amarufa-92390,Chotushkone,tt4115752,https://m.media-amazon.com/images/M/MV5BYjU4Y2...,15
1724,4783,4,MartinHafer,The World of Apu,tt0052572,https://m.media-amazon.com/images/M/MV5BN2ZhM2...,26


In [None]:
dfSort_Final.shape

(1376, 7)

In [None]:
dfSort_Final.drop_duplicates(['User_name','Movie_name','Movie_ID'], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfSort_Final.drop_duplicates(['User_name','Movie_name','Movie_ID'], inplace=True)


In [None]:
dfSort_Final.shape

(1355, 7)

##Methodology

Here we used clustering model to build our model.
**Clustering models** allow you to categorize records into a certain number of clusters. This can help you identify natural groups in your data.

Clustering models focus on identifying groups of similar records and labeling the records according to the group to which they belong. This is done without the benefit of prior knowledge about the groups and their characteristics.

We choose this model to categorize our user on their rating for a movie.  

##Model Development

###Creating Pivot Table

Pivot table in pandas is an excellent tool to summarize one or more numeric variable based on two other categorical variables. Here we use User_Name as our **Column** and **Row** as Movie_Name

In [None]:
data_pivot = dfSort_Final.pivot_table(columns='User_name', index='Movie_name', values='Review Rating')

In [None]:
data_pivot

User_name,3THEREAL,Aiyana_Rafayet_Zarah,Amyth47,Andy-296,AnonymousbutDilpreet002,Ashiqur_Rahman,Avinava89,Boba_Fett1138,Camoo,Cosmoeticadotcom,...,vinoodada,vishalshinde-03550,yashcyclist,yogesh-sharma888,zetes,ziggaziggaah,zihadmaniruzzaman,zkzuber,zoha-83094,zozaifmohammadalvi
Movie_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
22 Shey Shraban,,,,,,,,,,,...,,,,,,,,,,
27 B Beadon Street,,,,,,,,,,,...,,1.0,,,,,,,,
36 Chowringhee Lane,,,,,,,,,,,...,,,,,,,,,,
A Death in the Gunj,,,4.0,,,,,,,,...,1.0,,,,,,,,,
A River Called Titas,,,,7.0,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Tritio Adhyay,,,,,,,,,,,...,,,,,,,,,,
Unishe April,,,,,,,,,,,...,,,,,,,,,,
Vinci Da,,,,,,,,,,,...,,,,,,,,,,
Zombiesthaan,,,,,,,,,,,...,,,,,,,,,,


In [None]:
data_pivot.shape

(219, 487)

In [None]:
data_pivot.fillna(0, inplace=True)

In [None]:
data_pivot

User_name,3THEREAL,Aiyana_Rafayet_Zarah,Amyth47,Andy-296,AnonymousbutDilpreet002,Ashiqur_Rahman,Avinava89,Boba_Fett1138,Camoo,Cosmoeticadotcom,...,vinoodada,vishalshinde-03550,yashcyclist,yogesh-sharma888,zetes,ziggaziggaah,zihadmaniruzzaman,zkzuber,zoha-83094,zozaifmohammadalvi
Movie_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
22 Shey Shraban,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
27 B Beadon Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
36 Chowringhee Lane,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A Death in the Gunj,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A River Called Titas,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Tritio Adhyay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Unishe April,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Vinci Da,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zombiesthaan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


We use csr sparce matrix to get rid of our zero value from the pivot table

In [None]:
from scipy.sparse import csr_matrix


In [None]:
data_sparse = csr_matrix(data_pivot)

In [None]:
data_sparse


<219x487 sparse matrix of type '<class 'numpy.float64'>'
	with 1355 stored elements in Compressed Sparse Row format>

## Model Training

We use sklearn library for the model. We use NearsestNEighbors model to find the similar movies for one user to another.
Here we use NearsestNEighbors model brute alogorithm as our data is sparse.

In [None]:
from sklearn.neighbors import NearestNeighbors
model = NearestNeighbors(algorithm='brute')


In [None]:
model.fit(data_sparse)

##Results and Discussion

In [None]:
data_pivot.shape[0]

219

In [None]:
data_pivot.iloc[205].name

"The Philosopher's Stone"

In [None]:
data_pivot.index[205]

"The Philosopher's Stone"

In [None]:
distance, suggestion = model.kneighbors(data_pivot.iloc[205,:].values.reshape(1,-1),n_neighbors=5)

In [None]:
distance

array([[ 0.        , 15.09966887, 15.13274595, 15.49193338, 15.49193338]])

In [None]:
suggestion

array([[205,  64, 215,  24,  39]])

In [None]:
for i in range(len(suggestion)):
  print(data_pivot.index[suggestion[i]])

Index(['The Philosopher's Stone', 'Distant Thunder', 'Unishe April',
       'Bancharamer Bagan', 'Branches of the Tree'],
      dtype='object', name='Movie_name')


In [None]:
movie_name = data_pivot.index
movie_name

Index(['22 Shey Shraban', '27 B Beadon Street', '36 Chowringhee Lane',
       'A Death in the Gunj', 'A River Called Titas', 'Abby Sen', 'Abohomaan',
       'After the Night... Dawn', 'Aguner Poroshmoni', 'Ahare Mon',
       ...
       'The Unnamed', 'The World of Apu', 'The Zoo',
       'Third Person Singular Number', 'Three Daughters', 'Tritio Adhyay',
       'Unishe April', 'Vinci Da', 'Zombiesthaan', 'Zulfiqar'],
      dtype='object', name='Movie_name', length=219)

##Testing the model and save the model

In [None]:
import pickle
pickle.dump(model,open('model.pkl', 'wb'))
pickle.dump(movie_name,open('movie_name.pkl', 'wb'))
pickle.dump(dfSort_Final,open('Final_Data.pkl', 'wb'))
pickle.dump(data_pivot,open('data_pivot.pkl', 'wb'))

In [None]:
def recommend_movie(movie_name):
  movie_id = np.where(data_pivot.index == movie_name)[0][0]
  distance, suggestion = model.kneighbors(data_pivot.iloc[movie_id,:].values.reshape(1,-1),n_neighbors=5)

  for i in range(len(suggestion)):
    movies = data_pivot.index[suggestion[i]]
    for j in movies:
      print(j)

In [None]:
movie_name = 'Ahare Mon'
recommend_movie(movie_name)

Ahare Mon
F.I.R NO. 339/07/06
Zombiesthaan
Sabuj Dwiper Raja
Chaamp


##Conclusion

Here after doing the project we realize there is little work on Bangal Movie Recommenation . We will work further in this project and will try to do a hybrid based technique for next project.


I would like to express my deepest gratitude to all the individuals and organizations who have supported me throughout this project.


First and foremost, I would like to thank mentor (Mohammad Rifat Ahmmad Rashid) for his guidance, expertise, and unwavering support. His valuable insights and feedback have been instrumental in shaping this project.


I am also grateful to my teammates who have provided assistance and encouragement at every step of the way. Their collaboration and willingness to share their knowledge have been invaluable.



##Members Details

Zeshan Ahmed
ID: 2019-3-60-081

Fatima Noor
ID: 2020-2-60-129

Israt Jahan Jarin
ID: 2020-2-60-032

#Movie Recommendation System Using Content Based Filtering


##Introduction
A recommendation system is a technology that provides personalized suggestions or recommendations to users based on their preferences, interests, and past behaviors. It is widely used in various industries, including e-commerce, streaming platforms, social media and more to enhance user experience, increase engagement and drive sales. Recommendation systems utilize various techniques and algorithms to analyze and understand user data. Here are a few commonly used approaches:

 1.   Collaborative Filtering
 2.   Content-Based Filtering
 3.   Hybrid

Here in this system we use Content-Based Filtering technique.

##Problem Statement
The problem at hand is to develop an efficient content-based filtering system that can accurately recommend relevant items to users based on their preferences and characteristics.


Currently, users are overwhelmed with a vast amount of content available across various platforms, making it challenging for them to discover items that align with their interests. Traditional recommendation systems often rely on collaborative filtering techniques, which are based on user behavior and often suffer from the "cold start" problem for new users or items.

Content-based filtering aims to tackle this issue by focusing on the inherent characteristics of items, such as their attributes, metadata, or textual descriptions. However, there are several challenges that need to be addressed:


**Data Representation**: Developing an effective method to represent the content of items using suitable features or descriptors that capture their characteristics accurately.

**Feature Extraction**: Designing algorithms and techniques to extract relevant features from item content, including text, images, audio, or video, to create a comprehensive profile.

**User Preference Modeling**: Creating a robust user preference model by analyzing the historical interactions and feedback provided by users to generate personalized recommendations.

**Scalability**: Ensuring that the system can handle large-scale datasets efficiently, as the amount of content and user interactions continues to grow rapidly.

**Diversity and Serendipity**: Balancing the recommendation system's ability to provide personalized recommendations while also introducing serendipitous and diverse items to enhance user satisfaction and exploration.

The goal is to develop an advanced content-based filtering system that can overcome these challenges and provide accurate, personalized, and diverse recommendations to users based on their unique preferences and the inherent characteristics of the content.

Importing Library

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

In [None]:
movies=pd.read_csv('imdb bangla movie dataset.csv')
credits=pd.read_csv('imdb bangla movie dataset.csv')


In [None]:
movies.head(10)


Unnamed: 0,Image_ID,Title,Director,Cast,Year,Genre,Synopsis,image_url
0,tt0043026,Tathapi,Manoj Bhattacharya,"Bhanu Bannerjee, Gangapada Basu, Bijon Bhattac...",1950.0,Drama,,https://m.media-amazon.com/images/S/sash/NapCx...
1,tt0042719,Mashaal,Nitin Bose,"Ashok Kumar, Sumitra Devi, Ruma Guha Thakurta,...",1950.0,Drama,,https://m.media-amazon.com/images/M/MV5BYmJjZj...
2,tt0267730,Mantramugdhu,Bimal Roy,"Jiben Bose, Reba Bose, Tulsi Chakraborty, Jaha...",1949.0,Drama,,https://m.media-amazon.com/images/S/sash/NapCx...
3,tt0231225,Bamuner Meye,Ajoy Kar,"Sunil Das Gupta, Anubha Gupta, Tulsi Lahiri, S...",1949.0,Drama,,https://m.media-amazon.com/images/S/sash/NapCx...
4,tt0214563,Cartoon,Dhirendranath Ganguly,,1949.0,Drama,,https://m.media-amazon.com/images/S/sash/NapCx...
5,tt0156695,Kavi,Debaki Bose,"Robin Majumdar, Nitish Mukherjee, Anubha Gupta...",1949.0,Drama,"Kavi is a 1949 Indian Bengali film, directed ...",https://m.media-amazon.com/images/M/MV5BOWE2MT...
6,tt0152283,Sankalpa,Agradoot,"N.B. Agrami, Bibhuti Laha, Sikharani Bag, Moli...",1949.0,Drama,,https://m.media-amazon.com/images/M/MV5BOGJkZj...
7,tt0243353,Kalo Chhaya,Premendra Mitra,"Gurudas Bannerjee, Dhiraj Bhattacharya, Sipra ...",1948.0,Drama,,https://m.media-amazon.com/images/S/sash/NapCx...
8,tt0157061,Sir Sankarnath,Debaki Bose,"Ajit Bandyopadhyay, Jiben Bose, Tulsi Chakrabo...",1948.0,Drama,,https://m.media-amazon.com/images/S/sash/NapCx...
9,tt0156490,Drishtidan,Nitin Bose,"Asitbaran, Sunanda Banerjee, Biman Bannerjee, ...",1948.0,Drama,,https://m.media-amazon.com/images/M/MV5BZGM1Yj...


In [None]:
credits.head(7)


In [None]:
movies.shape

In [None]:
credits.shape

In [None]:
movies.merge(credits,on='Title')

In [None]:
movies.head(2)

In [None]:
movies.iloc[0]

In [None]:
movies.columns

In [None]:
movies[['Title','Director','Cast', 'Genre']]

In [None]:
movies.isnull().sum()

In [None]:
movies.duplicated().sum()

In [None]:
new_df=movies[['Title','Director','Cast','Year']]


In [None]:
new_df.head()

In [None]:
movies.head()

In [None]:
import ast

In [None]:
def convert(text):
    L = []
    for i in ast.literal_eval(text):
        L.append(i['name'])
    return L

In [None]:
movies.dropna(inplace=True)

In [None]:
import pickle


In [None]:
pickle.dump(movies,open('movie_list.pkl','wb'))


In [None]:

import pickle

In [None]:

pickle.dump(movies,open('movie_list1.pkl','wb'))

In [None]:

movies['Title'].values

array(['Tathapi', 'Mashaal', 'Mantramugdhu', ..., 'Chha-e Chhuti',
       'Ebadat', 'Madly Bangali'], dtype=object)

In [None]:
movies['Title'].values

array(['Tathapi', 'Mashaal', 'Mantramugdhu', ..., 'Chha-e Chhuti',
       'Ebadat', 'Madly Bangali'], dtype=object)

In [None]:
movies.to_dict(),open('movie_dict.pkl','wb')

({'Image_ID': {0: 'tt0043026',
   1: 'tt0042719',
   2: 'tt0267730',
   3: 'tt0231225',
   4: 'tt0214563',
   5: 'tt0156695',
   6: 'tt0152283',
   7: 'tt0243353',
   8: 'tt0157061',
   9: 'tt0156490',
   10: 'tt0152276',
   11: 'tt0152258',
   12: 'tt0156390',
   13: 'tt0156272',
   14: 'tt0152727',
   15: 'tt0396571',
   16: 'tt0243147',
   17: 'tt0037412',
   18: 'tt0156692',
   19: 'tt0345459',
   20: 'tt0156782',
   21: 'tt0155774',
   22: 'tt0156432',
   23: 'tt0156156',
   24: 'tt0154182',
   25: 'tt0236161',
   26: 'tt0032836',
   27: 'tt1267426',
   28: 'tt0157002',
   29: 'tt0156002',
   30: 'tt0390974',
   31: 'tt0242489',
   32: 'tt0156645',
   33: 'tt0030806',
   34: 'tt0156275',
   35: 'tt0155889',
   36: 'tt0028786',
   37: 'tt0028626',
   38: 'tt0157056',
   39: 'tt0361503',
   40: 'tt0156799',
   41: 'tt0154715',
   42: 'tt0259458',
   43: 'tt0154324',
   44: 'tt0022752',
   45: 'tt0392325',
   46: 'tt0344786',
   47: 'tt0316049',
   48: 'tt0214576',
   49: 'tt0154387'

In [None]:
new = movies.drop(columns=['Cast','Year'])

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=5000,stop_words='english')


In [None]:
vector = cv.fit_transform(new['Title']).toarray()

In [None]:
vector.shape


(2298, 2899)

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
similarity = cosine_similarity(vector)


In [None]:
similarity



array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 1.]])

In [None]:
new[new['Title'] == 'Tathapi'].index[0]



0

In [None]:
def recommend(movie):
    index = new[new['Title'] == movie].index[0]
    distances = sorted(list(enumerate(similarity[index])),reverse=True,key = lambda x: x[1])
    for i in distances[1:6]:
       print(i)




In [None]:
recommend('Cartoon')



(0, 0.0)
(1, 0.0)
(2, 0.0)
(3, 0.0)
(5, 0.0)


In [None]:
pickle.dump(similarity,open('similarity.pkl','wb'))