TF-IDF Summerization

In [1]:
import nltk
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
import numpy as np

In [2]:
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\user\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\user\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [3]:
def preprocess_text(text):
    sentences = sent_tokenize(text)
    return sentences

In [4]:
def compute_tfidf(sentences):
    stop_words = stopwords.words('english')
    vectorizer = TfidfVectorizer(stop_words=stop_words)
    tfidf_matrix = vectorizer.fit_transform(sentences)
    return tfidf_matrix

In [5]:
def rank_sentences(sentences, tfidf_matrix):
    sentence_scores = np.sum(tfidf_matrix.toarray(), axis=1)
    ranked_sentence_indices = np.argsort(sentence_scores)[::-1]
    ranked_sentences = [sentences[i] for i in ranked_sentence_indices]
    return ranked_sentences

In [6]:
def summarize_text(text, num_sentences=2):
    sentences = preprocess_text(text)
    tfidf_matrix = compute_tfidf(sentences)
    ranked_sentences = rank_sentences(sentences, tfidf_matrix)
    summary = ' '.join(ranked_sentences[:num_sentences])
    return summary

In [7]:
import pandas as pd

In [8]:
data1=pd.read_csv('test.csv')

In [9]:
#data=pd.read_csv('preprocessed_test_data_2.csv')
for i in range(20):
    generated_summary = summarize_text(data1.article[i], num_sentences=2)
    print(f"Generated Summary: {generated_summary}")
    print(f"Reference Summary: {data1.highlights[i]}")
    k=generated_summary.split()
    j=data1.highlights[i].split()
    print(len(k),len(j))
    print("\n")

Generated Summary: This week, a U.S consumer advisory group set up by the Department of Transportation said at a public hearing that while the government is happy to set standards for animals flying on planes, it doesn't stipulate a minimum amount of space for humans. But could crowding on planes lead to more serious issues than fighting for space in the overhead lockers, crashing elbows and seat back kicking?
Reference Summary: Experts question if  packed out planes are putting passengers at risk .
U.S consumer advisory group says minimum space must be stipulated .
Safety tests conducted on planes with more leg room than airlines offer .
67 36


Generated Summary: Brave fool: Fortunately, Mr Kumar  fell into a moat as he ran towards the lions and could be rescued by zoo security staff before reaching the animals (stock image) Kumar later explained: 'I don't really know why I did it. Next level drunk: Intoxicated Rahul Kumar, 17, climbed into the lions' enclosure at a zoo in Ahmedabad 

In [10]:
text='''Nearly two decades ago, Facebook exploded on college campuses as a site for students to stay in touch. Then came Twitter, where people posted about what they had for breakfast, and Instagram, where friends shared photos to keep up with one another.
Today,Instagram and Facebook feeds are full of ads and sponsored posts. TikTok and Snapchat are stuffed with videos from influencers promoting dish soaps and dating apps. And soon, Twitter posts that gain the most visibility will come mostly from subscribers who pay for the exposure and other perks.
Social media is, in many ways, becoming less social. The kinds of posts where people update friends and family about their lives have become harder to see over the years as the biggest sites have become increasingly “corporatized.” Instead of seeing messages and photos from friends and relatives about their holidays or fancy dinners, users of Instagram, Facebook, TikTok, Twitter and Snapchat now often view professionalized content from brands, influencers and others that pay for placement.'''

reference_summary="""
The text discusses the evolution of social media from platforms for personal connection and updates to spaces dominated by ads, sponsored posts, and professional content. Initially, sites like Facebook and Instagram were used by friends and family to share personal updates and photos. However, over the years, these platforms, along with Twitter, TikTok, and Snapchat, have become increasingly commercialized. As a result, personal posts are now overshadowed by content from brands, influencers, and paid promotions, making social media less social and more corporate."""
summary=summarize_text(text,num_sentences=2)
print("generated_summary is:\n",summary)
print("refernece_summary is:",reference_summary)

generated_summary is:
 The kinds of posts where people update friends and family about their lives have become harder to see over the years as the biggest sites have become increasingly “corporatized.” Instead of seeing messages and photos from friends and relatives about their holidays or fancy dinners, users of Instagram, Facebook, TikTok, Twitter and Snapchat now often view professionalized content from brands, influencers and others that pay for placement. Nearly two decades ago, Facebook exploded on college campuses as a site for students to stay in touch.
refernece_summary is: 
The text discusses the evolution of social media from platforms for personal connection and updates to spaces dominated by ads, sponsored posts, and professional content. Initially, sites like Facebook and Instagram were used by friends and family to share personal updates and photos. However, over the years, these platforms, along with Twitter, TikTok, and Snapchat, have become increasingly commercialized