When you visit any website, it recommends similar content based on what you are already watching or reading. Content recommendation based on the content the user is already consuming is a technique for creating a recommendation system known as Content-based filtering.

All the popular news websites use content-based recommendation systems designed to find similarities between the news you are reading and other news articles on their website to recommend the most similar news articles

In [2]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity
import plotly.express as px
import plotly.graph_objects as go


In [3]:
data = pd.read_csv("/content/News.csv")
data.head()

Unnamed: 0,ID,News Category,Title,Summary
0,N88753,lifestyle,"The Brands Queen Elizabeth, Prince Charles, an...","Shop the notebooks, jackets, and more that the..."
1,N45436,news,Walmart Slashes Prices on Last-Generation iPads,Apple's new iPad releases bring big deals on l...
2,N23144,health,50 Worst Habits For Belly Fat,These seemingly harmless habits are holding yo...
3,N86255,health,Dispose of unwanted prescription drugs during ...,
4,N93187,news,The Cost of Trump's Aid Freeze in the Trenches...,Lt. Ivan Molchanets peeked over a parapet of s...


In [4]:
# Types of News Categories
categories = data["News Category"].value_counts()
label = categories.index
counts = categories.values
figure = px.bar(data, x=label,
                y = counts,
            title="Types of News Categories")
figure.show()

There are two ways to build a recommendation system using this dataset:


If we choose the News Category column as the feature we will use to find similarities, the recommendations may not help grab the user’s attention for a longer time. Suppose a user is reading news about sports based on a cricket match and gets news recommendations about other sports like Wrestling, Hockey, Football etc., which could be inappropriate according to the content the user is reading.
The other way is to use the title or the summary as the feature to find similarities. It will give more accurate recommendations as the recommended content will be based on the content the user is already reading.

So we can use the title or the summary of the news article to find similarities with other news articles. Here I will use the title column. If you wish to use the summary column, first drop the rows with null values, as the summary column contains more than 5000 null values.

In [None]:
feature = data["Title"].tolist()
tfidf = text.TfidfVectorizer(input='content', stop_words="english")
tfidf_matrix = tfidf.fit_transform(feature)
similarity = cosine_similarity(tfidf_matrix)

In [None]:
indices = pd.Series(data.index, index=data['Title']).drop_duplicates()

In [None]:
def news_recommendation(Title, similarity = similarity):
    index = indices[Title]
    similarity_scores = list(enumerate(similarity[index]))
    similarity_scores = sorted(similarity_scores,
    key=lambda x: x[1], reverse=True)
    similarity_scores = similarity_scores[0:10]
    newsindices = [i[0] for i in similarity_scores]
    return data['Title'].iloc[newsindices]

print(news_recommendation("Walmart Slashes Prices on Last-Generation iPads"))