### What is a Recommendation System?

- Recommender systems are software tools and techniques that provide suggestions for items that are most likely of interest to a particular user.

### Three Type of Recommendation System?

1. Content-Based
2. Collaborative Filtering Based
3. Hybrid Recommendation System

### 1. Content-Based

- A content-based recommender system relies on the similarity between items to make recommendations.
- Content-based recommender systems are based on the idea that if you like one item, you’re likely to like other items that are similar to it.
- For Example, if you’re looking for a new movie to watch, a content-based recommender system might recommend movies that are similar to ones you’ve watched in the past.
- Content-based Recommender Systems are commonly used in music, books, and movies.
- They can be used to recommend products, services, or even websites.
- To build a content-based recommender system, you need to first define what similarity means.
- Once ML model learned the similarity between items, it can make recommendations accordingly.

### 2. Collaborative Filtering Based

- A Collaborative Filtering recommender system makes predictions about what a user might want to buy or watch based on the past behavior of other users.
- The algorithm looks at the items that other users with similar taste have purchased or rated highly, and recommends those items to the new user.

### 3. Hybrid Recommendation System

- Hybrid Recommendation System is a combination of both collaborative filtering and content-based filtering to provide recommendations to users.
- For example, the system can first use collaborative filtering to identify movies that similar users have rated highly and then use content-based filtering to further filter the recommendations by identifying movies that share similar attributes with the user's highly rated movies.
- Because it is a combination of both algorithmms, it is a most popular recommandation system. Youtube, Netflix like Entertainment Giants are using Hybrid Recommandation System.

#### Data Source = https://www.imdb.com/search/title/?num_votes=10000,&sort=user_rating,desc&title_type=feature

- Data is collected from above IMDB website through web scraping.

#

In [1]:
import numpy as np
import pandas as pd 

In [2]:
movies = pd.read_csv("D:\\projects\\ml\\movie_recommendation_system\\movies_scraping\\IMDB 10000 Movies Dataset.csv")

In [3]:
movies.head()

Unnamed: 0.1,Unnamed: 0,Movie,Year,Rating,Metascore,Genre,Director,Actor1,Actor2,Actor3,Actor4,Runtime,Votes,Gross,Discription,Poster
0,0,The Shawshank Redemption,1994,9.3,82.0,Drama,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,142 min,2763687,$28.34M,"Over the course of several years, two convicts...",https://m.media-amazon.com/images/S/sash/4Fyxw...
1,1,The Godfather,1972,9.2,100.0,"Crime,Drama",Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,175 min,1923327,$134.97M,"Don Vito Corleone, head of a mafia family, dec...",https://m.media-amazon.com/images/S/sash/4Fyxw...
2,2,Ramayana: The Legend of Prince Rama,1993,9.2,,"Animation,Action,Adventure",Ram Mohan,Yûgô Sakô,Koichi Saski,Arun Govil,Nikhil Kapoor,135 min,12073,,An anime adaptation of the Hindu epic the Rama...,https://m.media-amazon.com/images/S/sash/4Fyxw...
3,3,Hababam Sinifi,1975,9.2,,"Comedy,Drama",Ertem Egilmez,Kemal Sunal,Münir Özkul,Halit Akçatepe,Tarik Akan,87 min,41901,,"Lazy, uneducated students share a very close b...",https://m.media-amazon.com/images/S/sash/4Fyxw...
4,4,DAMaN,2022,9.1,,"Adventure,Drama",Lenka Debiprasad,Vishal Mourya,Karan Kandhapan,Babushan Mohanty,Dipanwit Dashmohapatra,121 min,13337,,"The film is set in 2015. Sid, is a young docto...",https://m.media-amazon.com/images/S/sash/4Fyxw...


In [4]:
movies.shape

(10195, 16)

In [5]:
# Know more about data

movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10195 entries, 0 to 10194
Data columns (total 16 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Unnamed: 0   10195 non-null  int64  
 1   Movie        10195 non-null  object 
 2   Year         10195 non-null  object 
 3   Rating       10195 non-null  float64
 4   Metascore    8065 non-null   float64
 5   Genre        10195 non-null  object 
 6   Director     10195 non-null  object 
 7   Actor1       10184 non-null  object 
 8   Actor2       10183 non-null  object 
 9   Actor3       10178 non-null  object 
 10  Actor4       10169 non-null  object 
 11  Runtime      10195 non-null  object 
 12  Votes        10195 non-null  object 
 13  Gross        7218 non-null   object 
 14  Discription  10195 non-null  object 
 15  Poster       10195 non-null  object 
dtypes: float64(2), int64(1), object(13)
memory usage: 1.2+ MB


In [6]:
movies.isnull().sum()

Unnamed: 0        0
Movie             0
Year              0
Rating            0
Metascore      2130
Genre             0
Director          0
Actor1           11
Actor2           12
Actor3           17
Actor4           26
Runtime           0
Votes             0
Gross          2977
Discription       0
Poster            0
dtype: int64

#### As my personal experience, I recommend movie based on below attributes

- Movie Name
- Movie Year
- Genre
- Director Name
- Actors
- Actor2
- Discription (Story of Movie / Movie Plot)

In [7]:
# taking required attributes 

new_movies = movies[["Movie", "Year", "Genre", "Director", "Actor1", "Actor2", "Actor3", "Actor4", "Discription", "Poster"]]

In [8]:
new_movies.duplicated().sum()

196

In [9]:
new_movies = new_movies.drop_duplicates(keep='first')

In [10]:
new_movies = new_movies.reset_index()

In [11]:
new_movies.shape

(9999, 11)

In [12]:
new_movies.head(5)

Unnamed: 0,index,Movie,Year,Genre,Director,Actor1,Actor2,Actor3,Actor4,Discription,Poster
0,0,The Shawshank Redemption,1994,Drama,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,"Over the course of several years, two convicts...",https://m.media-amazon.com/images/S/sash/4Fyxw...
1,1,The Godfather,1972,"Crime,Drama",Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,"Don Vito Corleone, head of a mafia family, dec...",https://m.media-amazon.com/images/S/sash/4Fyxw...
2,2,Ramayana: The Legend of Prince Rama,1993,"Animation,Action,Adventure",Ram Mohan,Yûgô Sakô,Koichi Saski,Arun Govil,Nikhil Kapoor,An anime adaptation of the Hindu epic the Rama...,https://m.media-amazon.com/images/S/sash/4Fyxw...
3,3,Hababam Sinifi,1975,"Comedy,Drama",Ertem Egilmez,Kemal Sunal,Münir Özkul,Halit Akçatepe,Tarik Akan,"Lazy, uneducated students share a very close b...",https://m.media-amazon.com/images/S/sash/4Fyxw...
4,4,DAMaN,2022,"Adventure,Drama",Lenka Debiprasad,Vishal Mourya,Karan Kandhapan,Babushan Mohanty,Dipanwit Dashmohapatra,"The film is set in 2015. Sid, is a young docto...",https://m.media-amazon.com/images/S/sash/4Fyxw...


In [13]:
# new_movies["Cast"] = new_movies["Actor1"] + "," + new_movies["Actor2"] + "," + new_movies["Actor3"] + "," + new_movies["Actor4"]
new_movies["Cast"] = new_movies["Actor1"] + "," + new_movies["Actor2"] + "," + new_movies["Actor3"]
new_movies["Cast"]

0                   Tim Robbins,Morgan Freeman,Bob Gunton
1                      Marlon Brando,Al Pacino,James Caan
2                       Yûgô Sakô,Koichi Saski,Arun Govil
3                  Kemal Sunal,Münir Özkul,Halit Akçatepe
4          Vishal Mourya,Karan Kandhapan,Babushan Mohanty
                              ...                        
9994            Rob Schneider,Eddie Griffin,Jeroen Krabbé
9995             Robert Englund,Lisa Zane,Shon Greenblatt
9996    Marlon Wayans,Jaime Pressly,Cedric The Enterta...
9997         Abhinay Raj Singh,Ranveer Singh,Varun Sharma
9998          Tyler Perry,Taraji P. Henson,Adam Rodriguez
Name: Cast, Length: 9999, dtype: object

In [14]:
new_movies = new_movies[["Movie", "Year", "Genre", "Director", "Cast", "Discription"]]

In [15]:
new_movies.shape

(9999, 6)

In [16]:
new_movies.head()

Unnamed: 0,Movie,Year,Genre,Director,Cast,Discription
0,The Shawshank Redemption,1994,Drama,Frank Darabont,"Tim Robbins,Morgan Freeman,Bob Gunton","Over the course of several years, two convicts..."
1,The Godfather,1972,"Crime,Drama",Francis Ford Coppola,"Marlon Brando,Al Pacino,James Caan","Don Vito Corleone, head of a mafia family, dec..."
2,Ramayana: The Legend of Prince Rama,1993,"Animation,Action,Adventure",Ram Mohan,"Yûgô Sakô,Koichi Saski,Arun Govil",An anime adaptation of the Hindu epic the Rama...
3,Hababam Sinifi,1975,"Comedy,Drama",Ertem Egilmez,"Kemal Sunal,Münir Özkul,Halit Akçatepe","Lazy, uneducated students share a very close b..."
4,DAMaN,2022,"Adventure,Drama",Lenka Debiprasad,"Vishal Mourya,Karan Kandhapan,Babushan Mohanty","The film is set in 2015. Sid, is a young docto..."


In [17]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import re

def remove_stop_words(text):

    stop_words = set(stopwords.words('english'))  # Set stop words for English language
    tokens = word_tokenize(text)  # Tokenize the input text into words
    filtered_tokens = [word for word in tokens if word.lower() not in stop_words]  # Remove stop words
    filtered_text = ' '.join(filtered_tokens)  # Reconstruct the filtered text
    lowercase_text = filtered_text.lower()
    return filtered_text


def description_processing(text):
    # Remove symbols using regex
    text_without_symbols = re.sub(r'[^a-zA-Z\s]', '', text)
    
    return text_without_symbols

import spacy

def lemmatize_discription(text):
    # Load the English language model in spaCy
    nlp = spacy.load("en_core_web_sm")
    
    # Process the text
    doc = nlp(text)
    
    # Lemmatize each token and join them back into a string
    lemmatized_text = ' '.join([token.lemma_ for token in doc])
    
    return lemmatized_text

In [18]:
new_movies["Discription"] = new_movies["Discription"].apply(remove_stop_words)
new_movies["Discription"] = new_movies["Discription"].apply(description_processing)
new_movies["Discription"] = new_movies["Discription"].apply(lemmatize_discription)

In [19]:
new_movies["Discription"]

0       course several year   two convict form friends...
1       Vito Corleone   head mafia family   decide han...
2       anime adaptation Hindu epic Ramayana   Lord Ra...
3       lazy   uneducated student share close bond   l...
4       film set    Sid   young doctor complete MBBS p...
                              ...                        
9994    deuce trick manwhore TJ   Amsterdam manwhore m...
9995    dreamhaunte Freddy Krueger return prowl nightm...
9996    exorcise demon ex   Malcolm start fresh new gi...
9997    two set identical twin accidentally separate b...
9998    Madea catch sixteenyearold Jennifer two young ...
Name: Discription, Length: 9999, dtype: object

In [20]:
import re

def movie_processing(text):
    # Remove symbols using regex
    text_without_symbols = re.sub(r'[^a-zA-Z\s]', ' ', text)
    
    # Convert to lowercase
    lowercase_text = text_without_symbols.lower()
    
    return lowercase_text

In [21]:
new_movies["Movie_Tag"] = new_movies["Movie"].apply(movie_processing)

In [22]:
new_movies

Unnamed: 0,Movie,Year,Genre,Director,Cast,Discription,Movie_Tag
0,The Shawshank Redemption,1994,Drama,Frank Darabont,"Tim Robbins,Morgan Freeman,Bob Gunton",course several year two convict form friends...,the shawshank redemption
1,The Godfather,1972,"Crime,Drama",Francis Ford Coppola,"Marlon Brando,Al Pacino,James Caan",Vito Corleone head mafia family decide han...,the godfather
2,Ramayana: The Legend of Prince Rama,1993,"Animation,Action,Adventure",Ram Mohan,"Yûgô Sakô,Koichi Saski,Arun Govil",anime adaptation Hindu epic Ramayana Lord Ra...,ramayana the legend of prince rama
3,Hababam Sinifi,1975,"Comedy,Drama",Ertem Egilmez,"Kemal Sunal,Münir Özkul,Halit Akçatepe",lazy uneducated student share close bond l...,hababam sinifi
4,DAMaN,2022,"Adventure,Drama",Lenka Debiprasad,"Vishal Mourya,Karan Kandhapan,Babushan Mohanty",film set Sid young doctor complete MBBS p...,daman
...,...,...,...,...,...,...,...
9994,Deuce Bigalow: European Gigolo,2005,Comedy,Mike Bigelow,"Rob Schneider,Eddie Griffin,Jeroen Krabbé",deuce trick manwhore TJ Amsterdam manwhore m...,deuce bigalow european gigolo
9995,Freddy's Dead: The Final Nightmare,1991,"Fantasy,Horror",Rachel Talalay,"Robert Englund,Lisa Zane,Shon Greenblatt",dreamhaunte Freddy Krueger return prowl nightm...,freddy s dead the final nightmare
9996,A Haunted House 2,2014,"Comedy,Fantasy,Horror",Michael Tiddes,"Marlon Wayans,Jaime Pressly,Cedric The Enterta...",exorcise demon ex Malcolm start fresh new gi...,a haunted house
9997,Cirkus,2022,"Comedy,Drama",Rohit Shetty,"Abhinay Raj Singh,Ranveer Singh,Varun Sharma",two set identical twin accidentally separate b...,cirkus


In [23]:
import re

def genre_processing(text):
    # Remove symbols using regex
    text_without_symbols = re.sub(r'[^a-zA-Z\s]', ' ', text)
    
    # Convert to lowercase
    lowercase_text = text_without_symbols.lower()
    
    return lowercase_text

In [24]:
new_movies["Genre"] = new_movies["Genre"].apply(genre_processing)

In [25]:
new_movies

Unnamed: 0,Movie,Year,Genre,Director,Cast,Discription,Movie_Tag
0,The Shawshank Redemption,1994,drama,Frank Darabont,"Tim Robbins,Morgan Freeman,Bob Gunton",course several year two convict form friends...,the shawshank redemption
1,The Godfather,1972,crime drama,Francis Ford Coppola,"Marlon Brando,Al Pacino,James Caan",Vito Corleone head mafia family decide han...,the godfather
2,Ramayana: The Legend of Prince Rama,1993,animation action adventure,Ram Mohan,"Yûgô Sakô,Koichi Saski,Arun Govil",anime adaptation Hindu epic Ramayana Lord Ra...,ramayana the legend of prince rama
3,Hababam Sinifi,1975,comedy drama,Ertem Egilmez,"Kemal Sunal,Münir Özkul,Halit Akçatepe",lazy uneducated student share close bond l...,hababam sinifi
4,DAMaN,2022,adventure drama,Lenka Debiprasad,"Vishal Mourya,Karan Kandhapan,Babushan Mohanty",film set Sid young doctor complete MBBS p...,daman
...,...,...,...,...,...,...,...
9994,Deuce Bigalow: European Gigolo,2005,comedy,Mike Bigelow,"Rob Schneider,Eddie Griffin,Jeroen Krabbé",deuce trick manwhore TJ Amsterdam manwhore m...,deuce bigalow european gigolo
9995,Freddy's Dead: The Final Nightmare,1991,fantasy horror,Rachel Talalay,"Robert Englund,Lisa Zane,Shon Greenblatt",dreamhaunte Freddy Krueger return prowl nightm...,freddy s dead the final nightmare
9996,A Haunted House 2,2014,comedy fantasy horror,Michael Tiddes,"Marlon Wayans,Jaime Pressly,Cedric The Enterta...",exorcise demon ex Malcolm start fresh new gi...,a haunted house
9997,Cirkus,2022,comedy drama,Rohit Shetty,"Abhinay Raj Singh,Ranveer Singh,Varun Sharma",two set identical twin accidentally separate b...,cirkus


In [26]:
import re

def director_processing(text):
    # Remove symbols using regex
    text_without_symbols = re.sub(r'[^a-zA-Z\s]', '', text)
    
    # Convert to lowercase
    lowercase_text = text_without_symbols.lower()
    
    lowercase_text = lowercase_text.replace(" ", "")
    
    return lowercase_text

In [27]:
new_movies["Director"] = new_movies["Director"].apply(director_processing)

In [28]:
new_movies

Unnamed: 0,Movie,Year,Genre,Director,Cast,Discription,Movie_Tag
0,The Shawshank Redemption,1994,drama,frankdarabont,"Tim Robbins,Morgan Freeman,Bob Gunton",course several year two convict form friends...,the shawshank redemption
1,The Godfather,1972,crime drama,francisfordcoppola,"Marlon Brando,Al Pacino,James Caan",Vito Corleone head mafia family decide han...,the godfather
2,Ramayana: The Legend of Prince Rama,1993,animation action adventure,rammohan,"Yûgô Sakô,Koichi Saski,Arun Govil",anime adaptation Hindu epic Ramayana Lord Ra...,ramayana the legend of prince rama
3,Hababam Sinifi,1975,comedy drama,ertemegilmez,"Kemal Sunal,Münir Özkul,Halit Akçatepe",lazy uneducated student share close bond l...,hababam sinifi
4,DAMaN,2022,adventure drama,lenkadebiprasad,"Vishal Mourya,Karan Kandhapan,Babushan Mohanty",film set Sid young doctor complete MBBS p...,daman
...,...,...,...,...,...,...,...
9994,Deuce Bigalow: European Gigolo,2005,comedy,mikebigelow,"Rob Schneider,Eddie Griffin,Jeroen Krabbé",deuce trick manwhore TJ Amsterdam manwhore m...,deuce bigalow european gigolo
9995,Freddy's Dead: The Final Nightmare,1991,fantasy horror,racheltalalay,"Robert Englund,Lisa Zane,Shon Greenblatt",dreamhaunte Freddy Krueger return prowl nightm...,freddy s dead the final nightmare
9996,A Haunted House 2,2014,comedy fantasy horror,michaeltiddes,"Marlon Wayans,Jaime Pressly,Cedric The Enterta...",exorcise demon ex Malcolm start fresh new gi...,a haunted house
9997,Cirkus,2022,comedy drama,rohitshetty,"Abhinay Raj Singh,Ranveer Singh,Varun Sharma",two set identical twin accidentally separate b...,cirkus


In [29]:
new_movies["Cast"] = new_movies["Cast"].astype(str)

In [30]:
import re

def cast_processing(text):
    # Remove symbols using regex
    text = text.replace(" ", "")
    
    text_without_symbols = re.sub(r'[^a-zA-Z\s]', ' ', text)
    
    # Convert to lowercase
    lowercase_text = text_without_symbols.lower()
    
    return lowercase_text

In [31]:
new_movies["Cast"] = new_movies["Cast"].apply(cast_processing)

In [32]:
new_movies

Unnamed: 0,Movie,Year,Genre,Director,Cast,Discription,Movie_Tag
0,The Shawshank Redemption,1994,drama,frankdarabont,timrobbins morganfreeman bobgunton,course several year two convict form friends...,the shawshank redemption
1,The Godfather,1972,crime drama,francisfordcoppola,marlonbrando alpacino jamescaan,Vito Corleone head mafia family decide han...,the godfather
2,Ramayana: The Legend of Prince Rama,1993,animation action adventure,rammohan,y g sak koichisaski arungovil,anime adaptation Hindu epic Ramayana Lord Ra...,ramayana the legend of prince rama
3,Hababam Sinifi,1975,comedy drama,ertemegilmez,kemalsunal m nir zkul halitak atepe,lazy uneducated student share close bond l...,hababam sinifi
4,DAMaN,2022,adventure drama,lenkadebiprasad,vishalmourya karankandhapan babushanmohanty,film set Sid young doctor complete MBBS p...,daman
...,...,...,...,...,...,...,...
9994,Deuce Bigalow: European Gigolo,2005,comedy,mikebigelow,robschneider eddiegriffin jeroenkrabb,deuce trick manwhore TJ Amsterdam manwhore m...,deuce bigalow european gigolo
9995,Freddy's Dead: The Final Nightmare,1991,fantasy horror,racheltalalay,robertenglund lisazane shongreenblatt,dreamhaunte Freddy Krueger return prowl nightm...,freddy s dead the final nightmare
9996,A Haunted House 2,2014,comedy fantasy horror,michaeltiddes,marlonwayans jaimepressly cedrictheentertainer,exorcise demon ex Malcolm start fresh new gi...,a haunted house
9997,Cirkus,2022,comedy drama,rohitshetty,abhinayrajsingh ranveersingh varunsharma,two set identical twin accidentally separate b...,cirkus


In [33]:
new_movies.isnull().sum()

Movie          0
Year           0
Genre          0
Director       0
Cast           0
Discription    0
Movie_Tag      0
dtype: int64

In [34]:
new_movies.dropna(inplace=True)

In [35]:
new_movies.isnull().sum()

Movie          0
Year           0
Genre          0
Director       0
Cast           0
Discription    0
Movie_Tag      0
dtype: int64

In [36]:
new_movies.shape

(9999, 7)

In [37]:
new_movies["Tags"] = new_movies["Movie_Tag"] + " " + new_movies["Genre"] + " " + new_movies["Director"] + " " + new_movies["Cast"] + " " + new_movies["Discription"]

#####

In [38]:
new_movies = new_movies[["Movie", "Tags"]]

####

In [39]:
final = new_movies

In [40]:
final

Unnamed: 0,Movie,Tags
0,The Shawshank Redemption,the shawshank redemption drama frankdarabont t...
1,The Godfather,the godfather crime drama francisfordcoppola m...
2,Ramayana: The Legend of Prince Rama,ramayana the legend of prince rama animation ...
3,Hababam Sinifi,hababam sinifi comedy drama ertemegilmez kemal...
4,DAMaN,daman adventure drama lenkadebiprasad vishalmo...
...,...,...
9994,Deuce Bigalow: European Gigolo,deuce bigalow european gigolo comedy mikebige...
9995,Freddy's Dead: The Final Nightmare,freddy s dead the final nightmare fantasy hor...
9996,A Haunted House 2,a haunted house comedy fantasy horror michae...
9997,Cirkus,cirkus comedy drama rohitshetty abhinayrajsing...


In [41]:
final["Tags"][1]

'the godfather crime drama francisfordcoppola marlonbrando alpacino jamescaan Vito Corleone   head mafia family   decide hand empire young son Michael   however   decision unintentionally put life love one grave danger'

In [42]:
from sqlalchemy import create_engine

In [43]:
my_conn = create_engine("mysql+mysqldb://root:kalpit@localhost/movies") #fill details

final.to_sql(con=my_conn,name='movies_data',if_exists='append', index=False)

9999