# Recommendation Systems - Content Based Filtering (Part 2)
> The three part series on building a beginner's recommendation system with Python. This blog provides a simple implementation of content-based filtering in Python.

- toc: true
- comments: true
- categories: [python, recommendation system, relevancy, collaborative filtering, content-based filtering, demographic filtering]

## Content Based Filtering

This recommendation systems works by finding similarities between the items. If a user has liked or wishlisted some items in the past, this would try to find similar items and recommend to the user. 

Content-based filtering is also used in Google PageRank algorithm to recommend the relevant webpages basis search keyworks. This is used along with citation model (reference of webpage in other webpages) and behavioral model (the activity on the webpage) to arrive at the final results.

We see this type of recommendation in work while searching for items in various apps. In Netflix, we can see some sort of weightage to content based filtering in the section 'Because you watched xxx' (Read the article - https://qz.com/1059434/netflix-finally-explains-how-its-because-you-watched-recommendation-tool-works/)

![](img/Netflix3.png)

## Loading the Data

Let's load our CSV file which we have saved in the Part-1 of the blog!

In [54]:
import pandas as pd 
import numpy as np
import warnings
warnings.filterwarnings("ignore")

data=pd.read_csv('movies_database.csv')

Let's look at first few rows of our data!

In [55]:
data = data[['movie_title','overview','cast','genres','keywords','director']]
data.head()

Unnamed: 0,movie_title,overview,cast,genres,keywords,director
0,Avatar,"In the 22nd century, a paraplegic Marine is di...","['Sam Worthington', 'Zoe Saldana', 'Sigourney ...","['Action', 'Adventure', 'Fantasy']","['culture clash', 'future', 'space war']",James Cameron
1,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","['Johnny Depp', 'Orlando Bloom', 'Keira Knight...","['Adventure', 'Fantasy', 'Action']","['ocean', 'drug abuse', 'exotic island']",Gore Verbinski
2,Spectre,A cryptic message from Bond’s past sends him o...,"['Daniel Craig', 'Christoph Waltz', 'Léa Seydo...","['Action', 'Adventure', 'Crime']","['spy', 'based on novel', 'secret agent']",Sam Mendes
3,The Dark Knight Rises,Following the death of District Attorney Harve...,"['Christian Bale', 'Michael Caine', 'Gary Oldm...","['Action', 'Crime', 'Drama']","['dc comics', 'crime fighter', 'terrorist']",Christopher Nolan
4,John Carter,"John Carter is a war-weary, former military ca...","['Taylor Kitsch', 'Lynn Collins', 'Samantha Mo...","['Action', 'Adventure', 'Science Fiction']","['based on novel', 'mars', 'medallion']",Andrew Stanton


We can find the similarity scores between movies based on various metadata. We will first build a model looking at movie plot summaries given in the 'overview' column and then refine our recommendations by including actor, director, genre, etc.

## Plot Description based Recommendation

### Creating a TF-IDF Vectorizer

We first need to convert each overview into its word vector. Next, we will have the find the Term Frequency - Inverse Document Frequency (TF-IDF) vector for each overview. 

The TF-IDF algorithm is used to weight a word in each document and assign the importance of the word based on the following two factors:
- Term Frequency (TF): The number of times the word appears in the document (in our case, the movie plot description)
- Inverse Document Frequency (IDF): The number of times the word appears in the corpus, representing how significant the term is in the whole corpus  (in our case, corpus of movie plot descriptions)

The below formula is used for TFIDF calculation:
![](img/tfidf-formula.png)

In Python, 'scikit-learn' library has a pre-built TF-IDF vectorizer that calculates the TF-IDF score for each document’s description, word-by-word.

Let's now implement a TFIDF matrix for our data! (Link - https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html)

In [11]:
#Import TfIdfVectorizer from scikit-learn
from sklearn.feature_extraction.text import TfidfVectorizer

#Define a TF-IDF Vectorizer Object. Remove all english stop words such as 'the', 'a', 'an'
tfidf = TfidfVectorizer(stop_words='english')

#Replace NaN with an empty string
data['overview'] = data['overview'].fillna('')

#Construct the required TF-IDF matrix by fitting and transforming the data
tfidf_matrix = tfidf.fit_transform(data['overview'])

#Output the shape of tfidf_matrix
tfidf_matrix.shape

(4803, 20978)

We can see that over 20000 unique words are used to describe 4803 movies in our dataset. Let's see how the TFIDF matrix looks like!

In [72]:
#Convert TFIDF matrix to Pandas Dataframe if you want to see the word frequencies.
doc_term_matrix = tfidf_matrix.todense()
df = pd.DataFrame(doc_term_matrix, 
                  columns=tfidf.get_feature_names(), index=data.overview)
df.to_csv('movies_database_tfidf.csv', index=True)

In [75]:
df.head()

Unnamed: 0_level_0,00,000,007,07am,10,100,1000,101,108,10th,...,zuckerberg,zula,zuzu,zyklon,æon,éloigne,émigré,été,única,über
overview,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Captain Barbossa, long believed to be dead, has come back to life and is headed to the edge of the Earth with Will Turner and Elizabeth Swann. But nothing is quite as it seems.",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"A cryptic message from Bond’s past sends him on a trail to uncover a sinister organization. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE.",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect a city that has branded him an enemy.",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"John Carter is a war-weary, former military captain who's inexplicably transported to the mysterious and exotic planet of Barsoom (Mars) and reluctantly becomes embroiled in an epic conflict. It's a world on the brink of collapse, and Carter rediscovers his humanity when he realizes the survival of Barsoom and its people rests in his hands.",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Computing Similarity Score using Cosine Similarity

With this matrix in hand, we can now compute a similarity score. We will be using the cosine similarity to calculate a numeric quantity that denotes the similarity between two movies.

Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space. The cosine similarity is advantageous because even if the two similar documents are far apart by the Euclidean distance (due to the size of the document), chances are they may still be oriented closer together. The smaller the angle, higher the cosine similarity.

![](img/cosine-similarity.png)

Reference - https://www.machinelearningplus.com/nlp/cosine-similarity/

In [78]:
# Compute Cosine Similarity
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
print(cosine_sim)

[[1.         0.         0.         ... 0.         0.         0.        ]
 [0.         1.         0.         ... 0.02160533 0.         0.        ]
 [0.         0.         1.         ... 0.01488159 0.         0.        ]
 ...
 [0.         0.02160533 0.01488159 ... 1.         0.01609091 0.00701914]
 [0.         0.         0.         ... 0.01609091 1.         0.01171696]
 [0.         0.         0.         ... 0.00701914 0.01171696 1.        ]]


In [38]:
#Let's create a dataframe of the similarity matrix with rows and columns as movie titles

sim = pd.DataFrame(cosine_sim, 
                  columns=data.movie_title, index=data.movie_title)
sim.head()

movie_title,Avatar,Pirates of the Caribbean: At World's End,Spectre,The Dark Knight Rises,John Carter,Spider-Man 3,Tangled,Avengers: Age of Ultron,Harry Potter and the Half-Blood Prince,Batman v Superman: Dawn of Justice,...,On The Downlow,Sanctuary: Quite a Conundrum,Bang,Primer,Cavite,El Mariachi,Newlyweds,"Signed, Sealed, Delivered",Shanghai Calling,My Date with Drew
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,1.0,0.0,0.0,0.024995,0.0,0.030353,0.0,0.037581,0.0,0.0,...,0.0,0.0,0.029175,0.042176,0.0,0.0,0.0,0.0,0.0,0.0
Pirates of the Caribbean: At World's End,0.0,1.0,0.0,0.0,0.033369,0.0,0.0,0.022676,0.0,0.0,...,0.0,0.0,0.006895,0.0,0.0,0.0,0.0,0.021605,0.0,0.0
Spectre,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.030949,0.02483,0.0,...,0.027695,0.0,0.0,0.0,0.017768,0.0,0.0,0.014882,0.0,0.0
The Dark Knight Rises,0.024995,0.0,0.0,1.0,0.010433,0.005145,0.012601,0.026954,0.020652,0.13374,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033864,0.042752,0.022692
John Carter,0.0,0.033369,0.0,0.010433,1.0,0.0,0.009339,0.037407,0.0,0.017148,...,0.01273,0.0,0.0,0.0,0.0,0.0,0.0,0.006126,0.0,0.0


We have now computed the similarity score of each movie with all the other movies based on plot description. Please note, similarity of the movie with itself is 1 and this can be seen in the diagonal in the above matrix.

### Implementing the Recommendation System

Let's implement a recommendation system where we can input a movie title and the model returns the top 10 movies similar to the movie.

In [49]:
# Create a column of movie titles
indices = pd.Series(data.index, index=data['movie_title']).drop_duplicates()
print(indices)

movie_title
Avatar                                         0
Pirates of the Caribbean: At World's End       1
Spectre                                        2
The Dark Knight Rises                          3
John Carter                                    4
                                            ... 
El Mariachi                                 4798
Newlyweds                                   4799
Signed, Sealed, Delivered                   4800
Shanghai Calling                            4801
My Date with Drew                           4802
Length: 4803, dtype: int64


In [50]:
# Function that takes in movie title as input and outputs most similar movies

def get_recommendations(title, cosine_sim=cosine_sim):
    # Get the index of the movie that matches the title
    idx = indices[title]

    # Get the pairwise similarity scores of all movies with that movie
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the movies in descending order of similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar movies ignoring the first one as it is itself movie
    sim_scores = sim_scores[1:11]

    # Get the movie indices
    movie_indices = [i[0] for i in sim_scores]

    # Return the top 10 most similar movies
    return data['movie_title'].iloc[movie_indices]

In [57]:
get_recommendations('The Godfather')

2731     The Godfather: Part II
1873                 Blood Ties
867     The Godfather: Part III
3727                 Easy Money
3623                       Made
3125                     Eulogy
3896                   Sinister
4506            The Maid's Room
3783                        Joe
2244      The Cold Light of Day
Name: movie_title, dtype: object

We can see the model did a good job in finding the Godfather trilogy movies and other crime movies such as 'Blood Ties' . However, it can be further improved by the following:
- Including other features such as Director, Genre, etc. : People interested in 'The Godfather' may be more interested in movies directed by Francis Ford Coppola. Let's try to include this information too in our model. 
- People might be interested in different genres based on the movie watched by other users. We can solve this using collaborative filtering, which will be discussed in Part 3 of this blog.

## Cast, Genres and Keywords Based Recommendation

Now we will build our model based on top 3 actors in the movie, director, top 3 genres of the movie and top 3 keywords of the movie. First, let's load our dataset.

In [56]:
#Let's load our data
data.head()

Unnamed: 0,movie_title,overview,cast,genres,keywords,director
0,Avatar,"In the 22nd century, a paraplegic Marine is di...","['Sam Worthington', 'Zoe Saldana', 'Sigourney ...","['Action', 'Adventure', 'Fantasy']","['culture clash', 'future', 'space war']",James Cameron
1,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","['Johnny Depp', 'Orlando Bloom', 'Keira Knight...","['Adventure', 'Fantasy', 'Action']","['ocean', 'drug abuse', 'exotic island']",Gore Verbinski
2,Spectre,A cryptic message from Bond’s past sends him o...,"['Daniel Craig', 'Christoph Waltz', 'Léa Seydo...","['Action', 'Adventure', 'Crime']","['spy', 'based on novel', 'secret agent']",Sam Mendes
3,The Dark Knight Rises,Following the death of District Attorney Harve...,"['Christian Bale', 'Michael Caine', 'Gary Oldm...","['Action', 'Crime', 'Drama']","['dc comics', 'crime fighter', 'terrorist']",Christopher Nolan
4,John Carter,"John Carter is a war-weary, former military ca...","['Taylor Kitsch', 'Lynn Collins', 'Samantha Mo...","['Action', 'Adventure', 'Science Fiction']","['based on novel', 'mars', 'medallion']",Andrew Stanton


### Data Cleaning

Firstly, let's clean our data by converting all the text into lowercase and removing spaces in a single name. Example: Christian Bale would be converted to christianbale

In [59]:
# Function to convert all strings to lower case and strip names of spaces
def clean_data(x):
    if isinstance(x, list):
        return [str.lower(i.replace(" ", "")) for i in x]
    else:
        #Check if director exists. If not, return empty string
        if isinstance(x, str):
            return str.lower(x.replace(" ", ""))
        else:
            return ''

In [60]:
# Apply clean_data function to your features.
features = ['cast', 'keywords', 'director', 'genres']

for feature in features:
    data[feature] = data[feature].apply(clean_data)

Let's now combine all the feature data into a single string which combines all the metadata (such as actors, director, keywords and genres) to be feed into the count vectorizer.

In [67]:
def create_combined_features(x):
    return ' '.join(x['keywords']) + ' ' + ' '.join(x['cast']) + ' ' + x['director'] + ' ' + ' '.join(x['genres'])
data['combined_features'] = data.apply(create_combined_features, axis=1)

### Creating a Count Vectorizer

Now, we have a combined features column in our data. We now apply the count vectorizer which creates a word vector of the entire corpus and provides the frequency of the each word in the document.

In [69]:
# Import CountVectorizer and create the count matrix
from sklearn.feature_extraction.text import CountVectorizer

count = CountVectorizer(stop_words='english')
count_matrix = count.fit_transform(data['combined_features'])
count_matrix.shape

(4803, 2469)

We can see that 2469 unique words are used to describe 4803 movies in our dataset. Let's see how the Count matrix looks like!

In [76]:
#Convert count matrix to Pandas Dataframe if you want to see the word frequencies.
doc_term_matrix = count_matrix.todense()
df = pd.DataFrame(doc_term_matrix, 
                  columns=count.get_feature_names(), index=data.combined_features)
df.to_csv('movies_database_countmatrix.csv', index=True)

In [77]:
df.head()

Unnamed: 0_level_0,aaronhann,aaronschneider,abelferrara,abrams,adambrooks,adamcarolla,adamgoldberg,adamgreen,adamjayepstein,adammarcus,...,zackward,zakpenn,zalbatmanglij,zhangyimou,zoranlisinac,àlexpastor,álexdelaiglesia,émilegaudreault,érictessier,étiennefaure
combined_features,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"[ ' c u l t u r e c l a s h ' , ' f u t u r e ' , ' s p a c e w a r ' ] [ ' s a m w o r t h i n g t o n ' , ' z o e s a l d a n a ' , ' s i g o u r n e y w e a v e r ' ] jamescameron [ ' a c t i o n ' , ' a d v e n t u r e ' , ' f a n t a s y ' ]",0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
"[ ' o c e a n ' , ' d r u g a b u s e ' , ' e x o t i c i s l a n d ' ] [ ' j o h n n y d e p p ' , ' o r l a n d o b l o o m ' , ' k e i r a k n i g h t l e y ' ] goreverbinski [ ' a d v e n t u r e ' , ' f a n t a s y ' , ' a c t i o n ' ]",0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
"[ ' s p y ' , ' b a s e d o n n o v e l ' , ' s e c r e t a g e n t ' ] [ ' d a n i e l c r a i g ' , ' c h r i s t o p h w a l t z ' , ' l é a s e y d o u x ' ] sammendes [ ' a c t i o n ' , ' a d v e n t u r e ' , ' c r i m e ' ]",0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
"[ ' d c c o m i c s ' , ' c r i m e f i g h t e r ' , ' t e r r o r i s t ' ] [ ' c h r i s t i a n b a l e ' , ' m i c h a e l c a i n e ' , ' g a r y o l d m a n ' ] christophernolan [ ' a c t i o n ' , ' c r i m e ' , ' d r a m a ' ]",0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
"[ ' b a s e d o n n o v e l ' , ' m a r s ' , ' m e d a l l i o n ' ] [ ' t a y l o r k i t s c h ' , ' l y n n c o l l i n s ' , ' s a m a n t h a m o r t o n ' ] andrewstanton [ ' a c t i o n ' , ' a d v e n t u r e ' , ' s c i e n c e f i c t i o n ' ]",0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Computing Similarity Score using Cosine Similarity

In [83]:
# Compute the Cosine Similarity matrix based on the count_matrix
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim2 = cosine_similarity(count_matrix, count_matrix)

In [84]:
#Let's create a dataframe of the similarity matrix with rows and columns as movie titles
sim = pd.DataFrame(cosine_sim2, 
                  columns=data.movie_title, index=data.movie_title)
sim.head()

movie_title,Avatar,Pirates of the Caribbean: At World's End,Spectre,The Dark Knight Rises,John Carter,Spider-Man 3,Tangled,Avengers: Age of Ultron,Harry Potter and the Half-Blood Prince,Batman v Superman: Dawn of Justice,...,On The Downlow,Sanctuary: Quite a Conundrum,Bang,Primer,Cavite,El Mariachi,Newlyweds,"Signed, Sealed, Delivered",Shanghai Calling,My Date with Drew
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Pirates of the Caribbean: At World's End,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Spectre,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
The Dark Knight Rises,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
John Carter,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


We have now computed the similarity score of each movie with all the other movies based on actors, director, keywords and genres. Please note, similarity of the movie with itself is 1 and this can be seen in the diagonal of the above matrix.

### Implementing the Recommendation System

In [85]:
# Reset index of our main DataFrame and construct reverse mapping as before
data = data.reset_index()
indices = pd.Series(data.index, index=data['movie_title'])

In [87]:
get_recommendations('The Godfather', cosine_sim2)

1018           The Cotton Club
1167                   Dracula
1209             The Rainmaker
1525            Apocalypse Now
2333     Peggy Sue Got Married
2600          New York Stories
2731    The Godfather: Part II
3012             The Outsiders
3337             The Godfather
3401                     Twixt
Name: movie_title, dtype: object

We can see the model did a good job in finding movies similar to 'The Godfather'. Most of the movies are directed by Francis Ford Coppola. Majority of movies theme is Crime/Thriller.  

The recommender can be improved further by adding more features such as production_company such as DC or Marvel, release date, etc.

## Endnotes

I hope this has helped to understand the implementation of content-based filtering using the dummy dataset of ~5000 English Movies. Feel free to play around with the code by opening in Colab or cloning the repo in github.

As we see, the content-based method only has to analyze the items and a single user’s profile for the recommendation, which makes the process less cumbersome. Content-based filtering would thus produce more reliable results with fewer users in the system. However, if the content doesn’t contain enough information to discriminate the items precisely, the recommendation itself risks being imprecise. This can be somewhat overcome with Collaborating Filtering which provides recommendations based similarities in the purchase behavior of users. We will discuss this method in last part of this blog series.

If you have any comments or suggestions please comment below or reach out to me at - [Twitter](https://twitter.com/rahulsingla0959) or [LinkedIn](https://www.linkedin.com/in/rahul-singla1/)