# Recommendation Systems
* Recommendation systems are a class of information filtering systems that predict the preferences or ratings a user would give to a set of items, and recommend the most relevant items to users based on their preferences or behavior. These systems are widely used in various online platforms to personalize user experiences, increase user engagement, and drive revenue.

* There are mainly two types of recommendation systems:

1. **Collaborative Filtering**: This approach recommends items by analyzing interactions between users and items. It identifies similarities between users or items based on their historical interactions (e.g., ratings, purchases) and predicts ratings or preferences for unseen items.

2. **Content-Based Filtering**: This method recommends items to users based on the characteristics of items and user profiles. It analyzes item attributes (e.g., genre, keywords) and user preferences to generate recommendations. Content-based filtering does not rely on user interactions but rather on the similarity between items or the match between item features and user preferences.

**In this Project, the Content-Based Filtering method is used.**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
#Load the Dataset.
df=pd.read_csv('/content/movies.csv')
df

Unnamed: 0,index,genres,title
0,0,Action Adventure Fantasy Science Fiction,Avatar
1,1,Adventure Fantasy Action,Pirates of the Caribbean: At World's End
2,2,Action Adventure Crime,Spectre
3,3,Action Crime Drama Thriller,The Dark Knight Rises
4,4,Action Adventure Science Fiction,John Carter
...,...,...,...
4688,4688,Foreign Thriller,Cavite
4689,4689,Action Crime Thriller,El Mariachi
4690,4690,Comedy Romance,Newlyweds
4691,4691,Comedy Drama Romance TV Movie,"Signed, Sealed, Delivered"


In [3]:
print(f'The Dataset has {df.shape[0]} records')

The Dataset has 4693 records


**Each row in the dataset has information about Index number, Title of the movie and its Genres**

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4693 entries, 0 to 4692
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   index   4693 non-null   int64 
 1   genres  4666 non-null   object
 2   title   4693 non-null   object
dtypes: int64(1), object(2)
memory usage: 110.1+ KB


In [5]:
df.isnull().sum()

index      0
genres    27
title      0
dtype: int64

* The variable Genres has missing values

In [6]:
#Printing the missing records.
missing_rows = df[df.isnull().any(axis=1)]
missing_rows

Unnamed: 0,index,genres,title
3883,3883,,Iguana
3904,3904,,Sardaarji
3978,3978,,Sharkskin
4014,4014,,"The Book of Mormon Movie, Volume 1: The Journey"
4027,4027,,Hum To Mohabbat Karega
4197,4197,,The Algerian
4218,4218,,Crowsnest
4289,4289,,Lisa Picard Is Famous
4304,4304,,Sparkler
4317,4317,,Childless


In [7]:
# Drop the missing records
df=df.dropna()
df=df.reset_index(drop=True)
df.shape

(4666, 3)

In [9]:
# Different kind of Genres in the dataset.
l=[]

for genr in df['genres'].str.split():
  for gen_l in genr:
    l.append(gen_l)

print(f'There are toatl {len(set(l))} different genres in the dataset:\n')
list(set(l))

There are toatl 22 different genres in the dataset:



['Crime',
 'Music',
 'Foreign',
 'Adventure',
 'Fiction',
 'Documentary',
 'Fantasy',
 'War',
 'Action',
 'History',
 'Animation',
 'Thriller',
 'Romance',
 'Horror',
 'Science',
 'Movie',
 'TV',
 'Mystery',
 'Family',
 'Western',
 'Comedy',
 'Drama']

In [10]:
#Vectorization.
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer=TfidfVectorizer()
feature_vectors=vectorizer.fit_transform(df['genres'])
print(feature_vectors)

  (0, 9)	0.47078448271289786
  (0, 17)	0.47078448271289786
  (0, 8)	0.5072564746038137
  (0, 1)	0.4129302770799773
  (0, 0)	0.35903119212216394
  (1, 8)	0.6798414654204339
  (1, 1)	0.5534224573590233
  (1, 0)	0.4811851676700948
  (2, 4)	0.6186047005372246
  (2, 1)	0.5929228054558979
  (2, 0)	0.5155296026840344
  (3, 18)	0.4864449153883293
  (3, 6)	0.36260910785806355
  (3, 4)	0.610656274445227
  (3, 0)	0.5089055842412188
  (4, 9)	0.5462835180571768
  (4, 17)	0.5462835180571768
  (4, 1)	0.4791513160665072
  (4, 0)	0.4166085118068071
  (5, 8)	0.6798414654204339
  (5, 1)	0.5534224573590233
  (5, 0)	0.4811851676700948
  (6, 7)	0.6242695332620867
  (6, 2)	0.7812090308238484
  (7, 9)	0.5462835180571768
  :	:
  (4654, 4)	0.6103457700864108
  (4655, 6)	1.0
  (4656, 12)	1.0
  (4657, 6)	1.0
  (4658, 12)	0.7228458198467108
  (4658, 3)	0.4515021677956472
  (4658, 18)	0.5231058336569835
  (4659, 6)	1.0
  (4660, 18)	0.4335803062242938
  (4660, 6)	0.3232024080245719
  (4660, 9)	0.594789173363792
  (4

In [12]:
#Getting the similarity matrix using Cosine Similarity
from sklearn.metrics.pairwise import cosine_similarity
similarity=cosine_similarity(feature_vectors)
print(similarity)

[[1.         0.74613936 0.42992699 ... 0.         0.         0.        ]
 [0.74613936 1.         0.57620199 ... 0.         0.         0.        ]
 [0.42992699 0.57620199 1.         ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 1.         0.30603907 0.        ]
 [0.         0.         0.         ... 0.30603907 1.         0.        ]
 [0.         0.         0.         ... 0.         0.         1.        ]]


In [13]:
print(similarity.shape)

(4666, 4666)


In [14]:
#creating the list of all movie names
list_of_all_titles=df['title'].tolist()
print(list_of_all_titles)

['Avatar', "Pirates of the Caribbean: At World's End", 'Spectre', 'The Dark Knight Rises', 'John Carter', 'Spider-Man 3', 'Tangled', 'Avengers: Age of Ultron', 'Harry Potter and the Half-Blood Prince', 'Batman v Superman: Dawn of Justice', 'Superman Returns', 'Quantum of Solace', "Pirates of the Caribbean: Dead Man's Chest", 'The Lone Ranger', 'Man of Steel', 'The Chronicles of Narnia: Prince Caspian', 'The Avengers', 'Pirates of the Caribbean: On Stranger Tides', 'Men in Black 3', 'The Hobbit: The Battle of the Five Armies', 'The Amazing Spider-Man', 'Robin Hood', 'The Hobbit: The Desolation of Smaug', 'The Golden Compass', 'King Kong', 'Titanic', 'Captain America: Civil War', 'Battleship', 'Jurassic World', 'Skyfall', 'Spider-Man 2', 'Iron Man 3', 'Alice in Wonderland', 'X-Men: The Last Stand', 'Monsters University', 'Transformers: Revenge of the Fallen', 'Transformers: Age of Extinction', 'Oz: The Great and Powerful', 'The Amazing Spider-Man 2', 'TRON: Legacy', 'Cars 2', 'Green Lant

In [15]:
#getting movie name from the user
movie_name=input('Enter your fav movie:')

print(movie_name)

Enter your fav movie:Iron man
Iron man


In [16]:
#Finding the close movie match for the input movie name
import difflib
find_close_match=difflib.get_close_matches(movie_name, list_of_all_titles)
print(find_close_match)

['Iron Man', 'Iron Man 3', 'Iron Man 2']


In [17]:
close_match=find_close_match[0]
print(close_match)

Iron Man


In [18]:
#Finding the index number of the movie
index_of_movie=df[df['title']==close_match]['index'].values[0]
index_of_movie

67

In [19]:
# getting the similarity score for each movie
similarity_score=list(enumerate(similarity[index_of_movie]))
print(similarity_score)

[(0, 0.8617951432750771), (1, 0.46563893539088363), (2, 0.4988737631265892), (3, 0.2120143981009078), (4, 1.0), (5, 0.46563893539088363), (6, 0.0), (7, 1.0), (8, 0.244692952755389), (9, 0.46563893539088363), (10, 0.8617951432750771), (11, 0.4474917333880216), (12, 0.46563893539088363), (13, 0.37532407146790564), (14, 0.8617951432750771), (15, 0.244692952755389), (16, 1.0), (17, 0.46563893539088363), (18, 0.8173023338214698), (19, 0.46563893539088363), (20, 0.46563893539088363), (21, 0.6349398678600584), (22, 0.302495274599838), (23, 0.302495274599838), (24, 0.5751843236377824), (25, 0.0), (26, 1.0), (27, 0.9290453925250153), (28, 0.9290453925250153), (29, 0.5379003219367802), (30, 0.46563893539088363), (31, 1.0), (32, 0.244692952755389), (33, 0.9290453925250153), (34, 0.0), (35, 1.0), (36, 1.0), (37, 0.244692952755389), (38, 0.46563893539088363), (39, 1.0), (40, 0.21611604056764608), (41, 0.9290453925250153), (42, 0.0), (43, 0.7993144266420141), (44, 0.4166085118068071), (45, 0.6705809

* 0th index movie has smilarity score of 0.86 w.r.t input movie. The 1st index
movie has 0.46 score, whereas 4th index movie has 1.0 score which means the genres of both movies are same.


In [20]:
# sorting movies based on similarity score
sorted_similarity_score=sorted(similarity_score, key=lambda x: x[1], reverse=True)
print(sorted_similarity_score)

[(4, 1.0), (7, 1.0), (16, 1.0), (26, 1.0), (31, 1.0), (35, 1.0), (36, 1.0), (39, 1.0), (47, 1.0), (51, 1.0), (52, 1.0), (56, 1.0), (67, 1.0), (78, 1.0), (84, 1.0), (90, 1.0), (93, 1.0), (99, 1.0), (100, 1.0), (109, 1.0), (156, 1.0), (167, 1.0), (172, 1.0), (180, 1.0), (181, 1.0), (191, 1.0), (205, 1.0), (227, 1.0), (228, 1.0), (231, 1.0), (239, 1.0), (257, 1.0), (396, 1.0), (461, 1.0), (477, 1.0), (489, 1.0), (501, 1.0), (502, 1.0), (505, 1.0), (571, 1.0), (1067, 1.0), (1470, 1.0), (1961, 1.0), (2330, 1.0), (2391, 1.0), (2402, 1.0), (2450, 1.0), (2854, 1.0), (2936, 1.0), (3238, 1.0), (3951, 1.0), (4022, 1.0), (148, 0.9456976068460854), (338, 0.9456976068460854), (27, 0.9290453925250153), (28, 0.9290453925250153), (33, 0.9290453925250153), (41, 0.9290453925250153), (75, 0.9290453925250153), (106, 0.9290453925250153), (120, 0.9290453925250153), (121, 0.9290453925250153), (123, 0.9290453925250153), (125, 0.9290453925250153), (147, 0.9290453925250153), (164, 0.9290453925250153), (201, 0.92

* There are many movies that match absolutely with the input movie, but we will output the first 10 movies.

In [22]:
# Top 10 movies with high score
print('Movies you might like: \n')
i=1
for movie in sorted_similarity_score:
    index=movie[0]

    title_from_index=df[df.index==index]['title'].values[0]

    if i<=10:
      print(i, '-', title_from_index)
      i+=1

Movies you might like: 

1 - John Carter
2 - Avengers: Age of Ultron
3 - The Avengers
4 - Captain America: Civil War
5 - Iron Man 3
6 - Transformers: Revenge of the Fallen
7 - Transformers: Age of Extinction
8 - TRON: Legacy
9 - Star Trek Into Darkness
10 - Pacific Rim
