# AML Project - Abstractive Text Summarization

## Code for pushing trained model to huggingface

Problem Statment: Given a news article, generate a summary of two-to-three sentences and a headline for the article.

The summary should be abstractive rather than extractive. In abstractive summarization, new sentences are generated as part of the summary and the sentences in the summary might not be present in the news article.

### Importing necessary packages

In [2]:
import torch

# import pegasus generator and tokenizer
from transformers import PegasusForConditionalGeneration, PegasusTokenizerFast
from huggingface_hub import whoami, HfFolder, create_repo

# visualisation
# import plotly.express as px
# import plotly.graph_objects as go
# import plotly.offline as pyo

# pyo.init_notebook_mode() 

In [3]:
# for logging to huggingface
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### Load the trained model

In [4]:
print(torch.__version__)
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(torch_device)

2.0.1
cpu


In [7]:
model_path = "../../E/GH/model/checkpoint-47000"
pegasus_tokenizer = PegasusTokenizerFast.from_pretrained("google/pegasus-large")
pegasus_large = PegasusForConditionalGeneration.from_pretrained("google/pegasus-large").to(torch_device)
pegasus_finetuned = PegasusForConditionalGeneration.from_pretrained(model_path, local_files_only=True).to(torch_device)

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-large and are newly initialized: ['model.encoder.embed_positions.weight', 'model.decoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Running the model on a test case

In [8]:
# function to get summary of a text of list of texts
def get_summary(tokenizer, model, x):
    x_tokenized = tokenizer(x, truncation=True, padding = True, return_tensors="pt").to(torch_device)
    print("Input X tokenized. Generating Summary ...")
    y_pred_tokenized= model.generate(**x_tokenized).to(torch_device)
    print("Summary Generated. Decoding Summary ...")
    y_pred = tokenizer.batch_decode(y_pred_tokenized, skip_special_tokens=True)
    print("Summary Decoded.")
    return y_pred

# function to caluculate rogue score
def calculate_rouge(m, y_pred, y):
    candidate = [i.split() for i in y_pred]
    reference = [i.split() for i in y]
    # print(candidate, reference)
    
    m.update((candidate, reference))
    
    return m.compute()

In [9]:
test_text = """
Elon Musk said on Tuesday that he would “reverse the permanent ban” of former President Donald J. Trump on Twitter and let him back on the social network, in one of the first specific comments by Mr. Musk, the world’s richest man, of how he would change the social media service.

Mr. Musk, who struck a deal last month to buy Twitter for $44 billion, said at a Financial Times conference that the company’s decision to bar Mr. Trump last year for tweets about the riots at the U.S. Capitol was “a mistake because it alienated a large part of the country and did not ultimately result in Donald Trump not having a voice.” He added that it was “morally wrong and flat-out stupid” and that “permanent bans just fundamentally undermine trust in Twitter.”

Mr. Musk’s remarks were a preview of the kinds of sweeping changes he might make at Twitter, which he is expected to take ownership of in the next six months. The billionaire, who also leads the electric carmaker Tesla and the rocket company SpaceX, has called himself a “free speech absolutist” and has said he is unhappy with how Twitter decides what can and cannot be posted online.

But up until Tuesday, Mr. Musk, 50, had spoken mostly in general terms and had not singled out Twitter accounts that might be affected by his takeover. He had called free speech “the bedrock of a functioning democracy” and had spoken of his desire to give people more control over their own social media feeds. By specifying that Mr. Trump could return to the platform, Mr. Musk uncorked a political firestorm.

Mr. Trump wielded Twitter for many years as both a megaphone and a cudgel, rallying his 88 million followers on issues such as immigration while also going after opponents. That avenue was cut off in January 2021 when Twitter, along with Facebook and other platforms, barred Mr. Trump from posting in the wake of the attack on the U.S. Capitol building. Twitter said at the time that Mr. Trump had violated policies and risked inciting violence among his supporters. Facebook barred Mr. Trump for similar reasons.

Mr. Trump, who has since begun a social media site called Truth Social, did not respond to a request for comment. Last month, Mr. Trump said that even with Mr. Musk buying Twitter, he did not plan to return to the platform and was “going to stay on Truth.”

Twitter declined to comment.

Derrick Johnson, the president of the N.A.A.C.P., said that free speech online needed to come with guardrails.

“Mr. Musk: Free speech is wonderful, hate speech is unacceptable,” he said. “Do not allow 45 to return to the platform. Do not allow Twitter to become a petri dish for hate speech or falsehoods that subvert our democracy.”

But Jack Dorsey, a founder and board member of Twitter, tweeted that permanent suspensions of individual users “are a failure” of the company and largely “don’t work.” Mr. Dorsey, who was chief executive of Twitter when Mr. Trump was barred, had said last year that booting the president was the right decision for the company, but backtracked on Tuesday by calling it “a business decision” and saying “it shouldn’t have been.”

Even with Mr. Musk’s comments, Mr. Trump’s potential return to Twitter remains far from assured. Mr. Musk is mercurial and has a history of saying things he does not follow through on. In 2018, he famously declared that he planned to take Tesla private and had the funding to do so, when he did not. Even his most devoted followers sometimes wonder whether his obscurantist tweets are serious or are made in jest.

Investors have also questioned whether Mr. Musk’s deal for Twitter will be completed. The company’s shares closed Tuesday at $47.26, well below the $54.20 that Mr. Musk agreed to pay for them. Mr. Musk is also still securing financing for his takeover. While venture-capital firms and some big banks have lined up to invest, the billionaire is on the hook to provide as much as $21 billion of his own cash. He has not detailed where he will obtain that money.

Mr. Musk referred to the possibility of the deal not closing on Tuesday and of nothing happening with Mr. Trump’s Twitter account. “Obviously, I don’t own Twitter yet,” he said at the Financial Times conference. “So this is not a thing that will definitely happen because what if I don’t own Twitter.”

Mr. Trump long bedeviled social media companies because of how he pushed the line on speech, sometimes spreading lies and bullying people. But the companies’ moves to bar him drew accusations, especially from conservatives, that they were engaging in censorship and were biased against Republican voices.

Mr. Musk seemed to echo some of those conservative complaints on Tuesday, accusing Twitter of “a strong left bias, because it’s based in San Francisco” and saying “victory would be that the most far-right 10 percent and the most far-left 10 percent are equally upset.”

Some of the companies have since shied from appearing as the final word on who gets to say what online. Facebook referred Mr. Trump’s case to its Oversight Board, a company-appointed panel of academics, journalists and former members of government. The board ruled that Facebook was right to bar Mr. Trump, but it said the company had not thoroughly explained its decision and should revisit an indefinite suspension.

In June, Facebook said Mr. Trump’s ban would last at least two years, keeping the former president off the site through the 2022 midterm elections.

A blockbuster deal. Elon Musk, the world’s wealthiest man, capped what seemed an improbable attempt by the famously mercurial billionaire to buy Twitter for roughly $44 billion. Here’s how the deal unfolded:

The initial offer. Mr. Musk made an unsolicited bid worth more than $40 billion for the influential social network, saying that he wanted to make Twitter a private company and that he wanted people to be able to speak more freely on the service.

The response. Twitter’s board countered Mr. Musk’s offer with a defense mechanism known as a “poison pill.” This well-worn corporate tactic makes a company less palatable to a potential acquirer by making it more expensive for them to buy shares above a certain threshold.

Securing financing. Though his original offer had scant details and was received skeptically by Wall Street, Mr. Musk has been moving swiftly to secure commitments worth $46.5 billion to finance his bid, putting pressure on Twitter’s board to take his advances seriously.

Striking a deal. With the financing in place, Twitter’s board met with Mr. Musk to discuss his offer. The two sides soon reached a deal, with the social media company agreeing to sell itself for $54.20 a share.

What’s next? Shareholders will vote on the offer, which will also be reviewed by regulators. The deal is expected to take three to six months to close. In the meantime, scrutiny is likely to be intense and several questions remain about Mr. Musk’s plans for the company.

“Free speech is essential to a functioning democracy. Do you believe Twitter rigorously adheres to this principle?” he asked.

At another point, Mr. Musk wondered, “Is a new platform needed?”

After Mr. Musk inked the deal to buy Twitter last month, he reiterated his free speech stance and said he would take the company private to improve the service. He added that he hoped to increase trust by making Twitter’s technology more transparent, defeating the bots that spam people on the platform and “authenticating all humans.” He also said he hoped his worst critics would remain on Twitter, because “that is what free speech means.”

On Tuesday, he became more specific. “Permanent bans should be extremely rare,” Mr. Musk said, adding that they should be reserved “for accounts that are bots or spam” and “where there’s just no legitimacy to the account at all.”

But he also said that “doesn’t mean that somebody gets to say whatever they want to say.” Mr. Musk said he was in favor of temporary suspensions of accounts “if they say something that is illegal or otherwise just, you know, destructive to the world.” He also raised the idea that a particular tweet could be “made invisible or have very limited traction.”

Apart from Mr. Trump, others who have been indefinitely barred from Twitter for violating its policies include Representative Marjorie Taylor Greene, Republican of Georgia, the far-right figure Milo Yiannopoulos, and celebrities like Tila Tequila. Twitter also labels tweets that are factually inaccurate or that might incite violence.

Inside Twitter on Tuesday, some employees worried that Mr. Musk’s changes would unwind years of work on the company’s policies and unravel millions of dollars of investment in content moderation to stem abuse on the platform, four current and former employees said. Some said they hoped Mr. Musk would lose interest in the site, while others have begun reaching out to recruiters and friends at other tech companies for new opportunities.

Still others were excited at the prospect of Mr. Musk in charge, the current and former employees said. Mr. Musk has pitched investors on quintupling Twitter’s revenue and on the service topping more than 900 million users by 2028, up from 217 million or so today.
"""

In [10]:
%%time
text_summary = get_summary(pegasus_tokenizer, pegasus_large, test_text)
print(text_summary)

Input X tokenized. Generating Summary ...
Summary Generated. Decoding Summary ...
Summary Decoded.
['Elon Musk said on Tuesday that he would “reverse the permanent ban” of former President Donald J. Trump on Twitter and let him back on the social network, in one of the first specific comments by Mr. Musk, who struck a deal last month to buy Twitter for $44 billion, said at a Financial Times conference that the company’s decision to bar Mr. The billionaire, who also leads the electric carmaker Tesla and the rocket company SpaceX, has called himself a “free speech absolutist” and has said he is unhappy with how Twitter decides what can and cannot be posted online.']
CPU times: user 33.5 s, sys: 5.31 s, total: 38.8 s
Wall time: 29.6 s


In [11]:
%%time
text_headline = get_summary(pegasus_tokenizer, pegasus_finetuned, text_summary)
print(text_headline)

Input X tokenized. Generating Summary ...
Summary Generated. Decoding Summary ...
Summary Decoded.
['Will reverse permanent ban on Trump on Twitter: Elon Musk']
CPU times: user 2.69 s, sys: 467 ms, total: 3.16 s
Wall time: 2.81 s


In [12]:
whoami()

{'type': 'user',
 'id': '62591773a68a1fd2395f75c4',
 'name': 'kubershahi',
 'fullname': 'Kuber Shahi',
 'email': 'shahikuber97@gmail.com',
 'emailVerified': True,
 'plan': 'NO_PLAN',
 'canPay': False,
 'isPro': False,
 'periodEnd': None,
 'avatarUrl': 'https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/1654633500663-62591773a68a1fd2395f75c4.png?w=200&h=200&f=face',
 'orgs': [],
 'auth': {'type': 'access_token',
  'accessToken': {'displayName': 'Jupyter Notebook', 'role': 'write'}}}

In [13]:
hf_token = HfFolder().get_token()

In [14]:
# create_repo("pegasus-inshorts", token=hf_token)

In [14]:
pegasus_tokenizer.push_to_hub("pegasus-inshorts")


spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/kubershahi/pegasus-inshorts/commit/c699129fab6a53dce062df44827b3b117e3b2eaf', commit_message='Upload tokenizer', commit_description='', oid='c699129fab6a53dce062df44827b3b117e3b2eaf', pr_url=None, pr_revision=None, pr_num=None)

In [15]:
pegasus_finetuned.push_to_hub("pegasus-inshorts")

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/kubershahi/pegasus-inshorts/commit/77a417061de65e2ac4d178576250ded5c3e07c33', commit_message='Upload PegasusForConditionalGeneration', commit_description='', oid='77a417061de65e2ac4d178576250ded5c3e07c33', pr_url=None, pr_revision=None, pr_num=None)