## Article Recommendation System using Cosine Similarity

In this notebook, I follow Aman Kharwal's tutorial and dataset to make an article recommendation system, adding in my own notes along the way. The tutorial can be found at:

https://thecleverprogrammer.com/2021/11/10/article-recommendation-system-with-machine-learning/

First let's import the necessary packages and dataset.

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity
data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/articles.csv", encoding='latin1')
data.head()

Unnamed: 0,Article,Title
0,Data analysis is the process of inspecting and...,Best Books to Learn Data Analysis
1,The performance of a machine learning algorith...,Assumptions of Machine Learning Algorithms
2,You must have seen the news divided into categ...,News Classification with Machine Learning
3,When there are only two classes in a classific...,Multiclass Classification Algorithms in Machin...
4,The Multinomial Naive Bayes is one of the vari...,Multinomial Naive Bayes in Machine Learning


In [2]:
data.describe()

Unnamed: 0,Article,Title
count,34,34
unique,33,33
top,You must have seen the news divided into categ...,News Classification with Machine Learning
freq,2,2


So there are 34 articles. Next we convert the 'Article' column into a list, then use the TfidfVectorizer class to transform the articles' text into a tf-idf representation.

In [3]:
articles = data['Article'].tolist()
uni_tfidf = text.TfidfVectorizer(input = articles, stop_words = "english")
uni_matrix = uni_tfidf.fit_transform(articles)
uni_matrix

<34x407 sparse matrix of type '<class 'numpy.float64'>'
	with 846 stored elements in Compressed Sparse Row format>

Next, we use the cosine similarity method on `uni_matrix` to measure the differences between the text in the articles.

In [4]:
uni_sim = cosine_similarity(uni_matrix)
uni_sim

array([[1.        , 0.02858003, 0.02014231, ..., 0.12022323, 0.00455773,
        0.02511323],
       [0.02858003, 1.        , 0.07651482, ..., 0.30365338, 0.27795728,
        0.00383369],
       [0.02014231, 0.07651482, 1.        , ..., 0.08401534, 0.05252305,
        0.03233971],
       ...,
       [0.12022323, 0.30365338, 0.08401534, ..., 1.        , 0.12620279,
        0.04275628],
       [0.00455773, 0.27795728, 0.05252305, ..., 0.12620279, 1.        ,
        0.02113943],
       [0.02511323, 0.00383369, 0.03233971, ..., 0.04275628, 0.02113943,
        1.        ]])

Finally, we define a function `recommend_articles` that accepts a numpy array and returns a string of article titles that are the top 5 most similar to the article that corresponds to the input (according to the numerical data in `uni_sim`).

In [6]:
def recommend_articles(x):
    return ', '.join(data['Title'].loc[x.argsort()[-5:-1]])

We append the recommended articles to the data table as a new column.

In [7]:
data['Recommended Articles'] = [recommend_articles(x) for x in uni_sim]
data.head()

Unnamed: 0,Article,Title,Recommended Articles
0,Data analysis is the process of inspecting and...,Best Books to Learn Data Analysis,"Introduction to Recommendation Systems, Best B..."
1,The performance of a machine learning algorith...,Assumptions of Machine Learning Algorithms,"Applications of Deep Learning, Best Books to L..."
2,You must have seen the news divided into categ...,News Classification with Machine Learning,"Language Detection with Machine Learning, Appl..."
3,When there are only two classes in a classific...,Multiclass Classification Algorithms in Machin...,"Assumptions of Machine Learning Algorithms, Be..."
4,The Multinomial Naive Bayes is one of the vari...,Multinomial Naive Bayes in Machine Learning,"Assumptions of Machine Learning Algorithms, Me..."


Let's check the recommended articles for "Multinomial Naive Bayes in Machine Learning".

In [27]:
print(data['Title'][4])
print(data['Recommended Articles'][4])

Multinomial Naive Bayes in Machine Learning
Assumptions of Machine Learning Algorithms, Mean Shift Clustering in Machine Learning, Language Detection with Machine Learning, Naive Bayes Algorithm in Machine Learning


There is one recommended article that also contains the words "Naive Bayes". The others are likely recommended due to the words "Machine Learning".