Text Ranker Method

In [6]:
import pandas as pd
import nltk
import networkx as nx
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.tokenize import sent_tokenize
from nltk.corpus import stopwords

nltk.download('punkt')
nltk.download('stopwords')

def preprocess_text(text):
    sentences = sent_tokenize(text)
    return sentences

def build_similarity_matrix(sentences, stop_words):
    vectorizer = TfidfVectorizer(stop_words=stop_words)
    tfidf_matrix = vectorizer.fit_transform(sentences)
    similarity_matrix = cosine_similarity(tfidf_matrix)
    return similarity_matrix

def text_rank(sentences, similarity_matrix):
    nx_graph = nx.from_numpy_array(similarity_matrix)
    scores = nx.pagerank(nx_graph)
    ranked_sentences = sorted(((scores[i], s) for i, s in enumerate(sentences)), reverse=True)
    return ranked_sentences

def summarize_text(text, num_sentences=2):
    stop_words = list(stopwords.words('english'))  # Convert stop words to list
    sentences = preprocess_text(text)
    similarity_matrix = build_similarity_matrix(sentences, stop_words)
    ranked_sentences = text_rank(sentences, similarity_matrix)
    summary = ' '.join([sent[1] for sent in ranked_sentences[:num_sentences]])
    return summary


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\user\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\user\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [7]:
data1=pd.read_csv('test.csv')

In [8]:
for i in range(20):
    generated_summary = summarize_text(data1.article[i], num_sentences=2)
    print(f"Generated Summary: {generated_summary}")
    print(f"Reference Summary: {data1.highlights[i]}")
    k=generated_summary.split()
    j=data1.highlights[i].split()
    print(len(k),len(j))
    print("\n")

Generated Summary: But these tests are conducted using planes with 31 inches between each row of seats, a standard which on some airlines has decreased, reported the Detroit News. British Airways has a seat pitch of 31 inches, while easyJet has 29 inches, Thomson's short haul seat pitch is 28 inches, and Virgin Atlantic's is 30-31.
Reference Summary: Experts question if  packed out planes are putting passengers at risk .
U.S consumer advisory group says minimum space must be stipulated .
Safety tests conducted on planes with more leg room than airlines offer .
54 36


Generated Summary: Next level drunk: Intoxicated Rahul Kumar, 17, climbed into the lions' enclosure at a zoo in Ahmedabad and began running towards the animals shouting 'Today I kill a lion!' Brave fool: Fortunately, Mr Kumar  fell into a moat as he ran towards the lions and could be rescued by zoo security staff before reaching the animals (stock image) Kumar later explained: 'I don't really know why I did it.
Reference 

In [9]:
text='''Nearly two decades ago, Facebook exploded on college campuses as a site for students to stay in touch. Then came Twitter, where people posted about what they had for breakfast, and Instagram, where friends shared photos to keep up with one another.
Today,Instagram and Facebook feeds are full of ads and sponsored posts. TikTok and Snapchat are stuffed with videos from influencers promoting dish soaps and dating apps. And soon, Twitter posts that gain the most visibility will come mostly from subscribers who pay for the exposure and other perks.
Social media is, in many ways, becoming less social. The kinds of posts where people update friends and family about their lives have become harder to see over the years as the biggest sites have become increasingly “corporatized.” Instead of seeing messages and photos from friends and relatives about their holidays or fancy dinners, users of Instagram, Facebook, TikTok, Twitter and Snapchat now often view professionalized content from brands, influencers and others that pay for placement.'''
reference_summary="""
The text discusses the evolution of social media from platforms for personal connection and updates to spaces dominated by ads, sponsored posts, and professional content. Initially, sites like Facebook and Instagram were used by friends and family to share personal updates and photos. However, over the years, these platforms, along with Twitter, TikTok, and Snapchat, have become increasingly commercialized. As a result, personal posts are now overshadowed by content from brands, influencers, and paid promotions, making social media less social and more corporate."""
summary = summarize_text(text,num_sentences=3)
print("generated_summary is:\n",summary)
print("refernece_summary is:",reference_summary)

generated_summary is:
 The kinds of posts where people update friends and family about their lives have become harder to see over the years as the biggest sites have become increasingly “corporatized.” Instead of seeing messages and photos from friends and relatives about their holidays or fancy dinners, users of Instagram, Facebook, TikTok, Twitter and Snapchat now often view professionalized content from brands, influencers and others that pay for placement. Today,Instagram and Facebook feeds are full of ads and sponsored posts. Then came Twitter, where people posted about what they had for breakfast, and Instagram, where friends shared photos to keep up with one another.
refernece_summary is: 
The text discusses the evolution of social media from platforms for personal connection and updates to spaces dominated by ads, sponsored posts, and professional content. Initially, sites like Facebook and Instagram were used by friends and family to share personal updates and photos. However,