# Original code using Brown Corpus

## Import of libraries

In [3]:
import nltk
from nltk.corpus import stopwords
from nltk.corpus import brown
from nltk.tokenize import sent_tokenize
import numpy as np
import re

#Establishing stopwords variable for later use
stopwords = stopwords.words('english')

## Creating dictionaries

In [2]:
#Function to get rid of contractions - necessary to do so when removing stopwords. 
def contractions(word):
    word = re.sub(r"won\'t", "will not", word)
    word = re.sub(r"can\'t", "cannot", word)
    word = re.sub(r"n\'t", " not", word)
    word = re.sub(r"\'re", " are", word)
    word = re.sub(r"\'s", " is", word)
    word = re.sub(r"\'d", " would", word)
    word = re.sub(r"\'ll", " will", word)
    word = re.sub(r"\'t", " not", word)
    word = re.sub(r"\'ve", " have", word)
    word = re.sub(r"\'m", " am", word)
    return word

In [3]:
#Creating dictionaries - articles dictionary which has {fileid : article text}
#Catids dictionary which has {category : list of fileids in that category} - use for reference to find a fileid

articles={}
catids={}
ids=[]
for category in brown.categories():
    ids=[]
    for fileid in brown.fileids(category):
        articles[fileid] = ' '.join(brown.words(fileids=[fileid]))
        catids.setdefault(category, []).append(fileid)

In [4]:
#Brown article used for summarization
articles['cn01']

"Dan Morgan told himself he would forget Ann Turner . He was well rid of her . He certainly didn't want a wife who was fickle as Ann . If he had married her , he'd have been asking for trouble . But all of this was rationalization . Sometimes he woke up in the middle of the night thinking of Ann , and then could not get back to sleep . His plans and dreams had revolved around her so much and for so long that now he felt as if he had nothing . The easiest thing would be to sell out to Al Budd and leave the country , but there was a stubborn streak in him that wouldn't allow it . The best antidote for the bitterness and disappointment that poisoned him was hard work . He found that if he was tired enough at night , he went to sleep simply because he was too exhausted to stay awake . Each day he found himself thinking less often of Ann ; ; each day the hurt was a little duller , a little less poignant . He had plenty of work to do . Because the summer was unusually dry and hot , the sprin

In [5]:
#Creating new dictionary of {fileid : articles text cleaned without contractions or stopwords}

nostop_articles={}
for key, val in articles.items():
    #Cleaning text with stopwords & contractions
    articles[key] = val.replace("[^a-zA-Z0-9]", " ")
    articles[key] = re.sub(r'''\s([?.!",](?:\s|$))''', r'\1', articles[key])
    articles[key] = articles[key].replace("``", "")
    articles[key] = articles[key].replace("''", "")
    
    #Cleaning text, elongating contractions, and removing stopwords
    nostop_articles[key] = articles[key]
    a = contractions(nostop_articles[key])
    b = ' '.join([x for x in nltk.word_tokenize(a) if x not in stopwords])
    nostop_articles[key] = b

In [6]:
#This is why it is important to remove contractions before removing stop words. 

###See text below, original 3rd sentence: 
    #He certainly didn't want a wife who was fickle as Ann .
    
###Third sentence with stopwords removed, but contractions not removed:
    #He certainly n't want wife fickle Ann .
    
###Third sentence with contraction elongated and then stopwords removed
    #He certainly want wife fickle Ann.

##### Manual Summarization of the article

##### Brent_Summary =

"""Due to his unhappy marriage, Morgan keeps himself busy at work to avoid his wife and his thoughts 
about his unhappiness. One day, two stragglers appear on Morgan's property in bad shape; hungry, tired, and injured. 
Morgan takes them in cautiously, considering if they could be sent to kill him by his enemies. Their weakness eases 
Morgan's concerns of danger as he feeds them and gives them a place to stay. As the young couple were wandering for 
work, they ask Morgan if they can work for him. Unsure because he does not have money to pay them, their desperation
convinces Morgan to let them stay and work for their keep."""


## Extractive Methods

### Gensim with TextRank

In [7]:
import gensim
from gensim.summarization import summarize

'''' Ensure the version of gensim installed is version 3.8.3 or older.  The 4.0 and newer packages no longer include the 
gensim.summarization module as it is no longer maintained. pip install gensim==3.8.1'''

"' Ensure the version of gensim installed is version 3.8.3 or older.  The 4.0 and newer packages no longer include the \ngensim.summarization module as it is no longer maintained. pip install gensim==3.8.1"

#### Gensim Stopwords not Removed

In [8]:
short_summary = summarize(articles['cn01'], word_count = 100)
print(short_summary)

When, in late afternoon on the last day in June, he saw two people top the ridge to the south and walk toward the house, he quit work immediately and strode to his rifle.
?  We didn't want town work , Jones said.
She stared at him, her eyes wide as she thought about what he had said ; ; then she murmured :  You're very kind, Mr. Morgan.
When they were finally satisfied, Jones said,  I think he's going to give us work .
He said :  If it's all right with you, Mr. Morgan, I'll sleep out here on the couch.


#### Gensim Stopwords not Removed

In [9]:
short_summary = summarize(nostop_articles['cn01'], word_count = 100)
print(short_summary)

When , late afternoon last day June , saw two people top ridge south walk toward house , quit work immediately strode rifle .
? We want town work , Jones said .
When meal ready , told Jones wash , going front room , woke girl .
She stared , eyes wide thought said ; ; murmured : You kind , Mr. Morgan .
When finally satisfied , Jones said , I think going give us work .
He said : If right , Mr. Morgan , I sleep couch .


### SUMY LexRank

In [10]:
#Import summarizers 
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer

#### LexRank - stopwords not removed

In [11]:
#Use parser and tokenizer from library to prep text for use 
parsed = PlaintextParser.from_string(articles['cn01'],Tokenizer('english'))

In [12]:
#Creating the summarizer model, then establishing the model to used the parsed document
#created above and then specify number of sentences in output summary. Sentence count is 
#a mandatory field.

lexrank_summarizer = LexRankSummarizer()
lexrank_summary = lexrank_summarizer(parsed.document, sentences_count=5)

In [13]:
#To cleanly see summary output, use a for loop. Sentences printed first are 
#ranked as most important, meaning they are most similar to other sentences.

for sen in lexrank_summary:
    print(sen)

There was more to this than Jones had told him.
I'll see , Morgan said.
You fell down in front of the house, and I carried you in.
I mean, we don't have any way to get there and we can't expect you to quit work just to take us to town .
He said :  If it's all right with you, Mr. Morgan, I'll sleep out here on the couch.


#### Lexrank with stopwords removed

In [14]:
#Using parser with text that had stopwords removed and contractions elongated.

parsed = PlaintextParser.from_string(nostop_articles['cn01'],Tokenizer('english'))

In [15]:
lexrank_summary_stopremoved = lexrank_summarizer(parsed.document, sentences_count=5)

In [16]:
for sen in lexrank_summary_stopremoved:
    print(sen)

I see , Morgan said .
You fell front house , I carried .
I mean , way get expect quit work take us town .
We see , Morgan said .
He said : I going bed .


### SUMY LSA (Latent semantic analysis)

In [17]:
from sumy.summarizers.lsa import LsaSummarizer

#### LSA (Latent semantic analysis) with stopwords/contractions

In [18]:
LSAparsed = PlaintextParser.from_string(articles['cn01'],Tokenizer('english'))

In [19]:
LSA_summarizer = LsaSummarizer()
LSA_summary = LSA_summarizer(LSAparsed.document,5)

In [20]:
for sen in LSA_summary:
    print(sen)

No girl would go this far to fool a man so she could kill him.
He had seen a few nester wagons go through the country, the families almost starving to death, but he had never seen any of them on foot and as bad off as these two.
When they were finally satisfied, Jones said,  I think he's going to give us work .
He said :  If it's all right with you, Mr. Morgan, I'll sleep out here on the couch.
They were a pair of lost, whipped kids, Morgan thought as he went to bed.


#### LSA (Latent semantic analysis) without stopwords/contractions

In [21]:
LSAparsed = PlaintextParser.from_string(nostop_articles['cn01'],Tokenizer('english'))

In [22]:
LSA_summarizer = LsaSummarizer()
LSA_summary = LSA_summarizer(LSAparsed.document,5)

In [23]:
for sen in LSA_summary:
    print(sen)

He stopped every minutes leaned shovel studied horizon , nothing happened , day dragging monotonous calm .
When , late afternoon last day June , saw two people top ridge south walk toward house , quit work immediately strode rifle .
No one walked country , least Ed Dow Dutch Renfro rest Bar B crew .
It must hurt even walk , sole completely left foot Morgan saw bruised bleeding .
Morgan returned kitchen , built fire , carried several buckets water spring poured copper boiler placed stove .


### SUMY LUHN

In [24]:
# Import the summarizer library
from sumy.summarizers.luhn import LuhnSummarizer

In [25]:
from sumy.nlp.tokenizers import Tokenizer
from sumy.parsers.plaintext import PlaintextParser

#### SUMY LUHN with stopwords/contractions

In [26]:
# Using the parser functions
luhnparser=PlaintextParser.from_string(articles['cn01'],Tokenizer('english'))

In [27]:
# creating the summarizer, maintaining 3 sentences as the parameter 
luhn_summarizer=LuhnSummarizer()
luhn_summary=luhn_summarizer(luhnparser.document,sentences_count=5)

In [28]:
for sentence in luhn_summary:
    print(sentence)

They crawled through the north fence and came on toward him, and now he saw that both were young, not more than nineteen or twenty.
He had seen a few nester wagons go through the country, the families almost starving to death, but he had never seen any of them on foot and as bad off as these two.
He was silent a moment, thinking he could use a man this time of year, and if the girl could cook, it would give him more time in the meadows, but he knew nothing about the couple.
I mean, we don't have any way to get there and we can't expect you to quit work just to take us to town .
I just can't take any chances on getting her pregnant, and if we were sleeping together  He stopped, embarrassed, and Morgan said,  I understand that, but I don't savvy why you'd go off and leave your jobs in the first place .


#### SUMY LUHN without stopwords/contractions

In [29]:
# Using the parser functions with the updated article with stopwords and contractions removed
luhnparser=PlaintextParser.from_string(nostop_articles['cn01'],Tokenizer('english'))

In [30]:
luhn_summarizer=LuhnSummarizer()
luhn_summary=luhn_summarizer(luhnparser.document,sentences_count=5)

In [31]:
for sentence in luhn_summary:
    print(sentence)

Each day found thinking less often Ann ; ; day hurt little duller , little less poignant .
He silent moment , thinking could use man time year , girl could cook , would give time meadows , knew nothing couple .
I could use help , Morgan said finally , I afford pay anything .
When meal ready , told Jones wash , going front room , woke girl .
When finally satisfied , Jones said , I think going give us work .


### SUMY KL-SUM

In [32]:
# Import the summarizer library
from sumy.summarizers.kl import KLSummarizer

#### SUMY KL-Sum with stopwords/contractions

In [33]:
# Using the parser functions already imported for the lSA Algorthm
klparser=PlaintextParser.from_string(articles['cn01'],Tokenizer('english'))

In [34]:
kl_summarizer=KLSummarizer()
kl_summary=kl_summarizer(klparser.document,sentences_count=5)

In [35]:
for sentence in kl_summary:
    print(sentence)

In the kitchen , he said.
He brought his Winchester in from the front of the house, then faced the boy.
We didn't want town work , Jones said.
When he saw the expression in her eyes, he knew he couldn't send them on.
He nodded at the door in front of him.


#### SUMY KL-Sum without stopwords/contractions

In [36]:
# Using the parser functions with the updated article with stopwords and contractions removed
klparser=PlaintextParser.from_string(nostop_articles['cn01'],Tokenizer('english'))

In [37]:
kl_summarizer=KLSummarizer()
kl_summary=kl_summarizer(klparser.document,sentences_count=5)

In [38]:
for sentence in kl_summary:
    print(sentence)

When , late afternoon last day June , saw two people top ridge south walk toward house , quit work immediately strode rifle .
They dirty , clothes torn , girl exhausted fell still twenty feet front door .
When meal ready , told Jones wash , going front room , woke girl .
When finally satisfied , Jones said , I think going give us work .
He said : If right , Mr. Morgan , I sleep couch .


## Abstractive Methods

In [39]:
# Install required libraries
# !pip install transformers
# !pip install torch
# !pip install TensorFlow 
# !pip install sentencepiece

In [2]:
# Import required libraries
from nltk.corpus import stopwords, brown
from nltk.tokenize import sent_tokenize
from transformers import T5Tokenizer, T5Config, T5ForConditionalGeneration
from transformers import BartForConditionalGeneration, BartTokenizer, BartConfig
from transformers import GPT2Tokenizer,GPT2LMHeadModel
from summarizer import Summarizer,TransformerSummarizer
from transformers import XLMWithLMHeadModel, XLMTokenizer
import re

In [5]:
# Create an empty dictionary to store the articles
articles={}

# Populate the dictionary with the articles in the brown corpus
for category in brown.categories():
    for fileid in brown.fileids(category):
        articles[fileid] = ' '.join(brown.words(fileids=[fileid]))

# Loop takes the dictionary created above, splits the text into sentences and then cleans the text
for key, val in articles.items():
    articles[key] = val.split(" . ")
    articles[key] = val.replace("[^a-zA-Z]", " ")

In [6]:
# Put fileids and article text in lists for ease of access
fileids=[]
text=[]
for x,y in articles.items():
    fileids.append(x)
    text.append(y)
    
sample_text = text[0].strip().replace("\n","")

### T5 Transformers

In [64]:
# Summarization with T5 Transformers

# T5 transformers performs different tasks based on the prefix at the beggining of text.
# Therefore, we need to add the tstring "summarize:" at the start of the input text.
text = "summarize:" + sample_text

# Instantiate the pretrained "t5-small" model and tokenizer
my_model = T5ForConditionalGeneration.from_pretrained('t5-small')
tokenizer = T5Tokenizer.from_pretrained('t5-small')

# T5 is a encoder-decorder mode, meaning that the input sentence should be in the form of a sequence of ids.
# Hence, we convert the text in the requested format through the process called encoding.
input_ids=tokenizer.encode(
    text,                        # sequence to be encoded
    return_tensors='pt',         # return PyTorch tensors
    max_length=512,              # maximum length to use by one of the truncation/padding parameter
    truncation=True              # truncate to a maximum length specified with the argument max_length
)

# Pass the encoded text to the function which returns a sequence of ids corresponding to the summary
summary_ids = my_model.generate(
    input_ids,                   # previously encoded text
    num_beams=4,                 # number of beams for beam search
    no_repeat_ngram_size=2,      # all ngrams of the specified size can occur only once
    min_length=100,               # the minimum length of the sequence to be generated
    max_length=200,              # the maximum length of the sequence to be generated
    early_stopping=True          # stop the beam search when at least num_beams sentences are finished per batch
)

# Use the function to generate the summary text form the generated ids
t5_summary = tokenizer.decode(
    summary_ids[0],              # list of tokenized input ids
    skip_special_tokens=True     # remove special tokens in the decoding
)

# Print the summary
print("T5 summary:", t5_summary, sep="\n\n")

T5 summary:

Dan Morgan told himself he would forget Ann Turner. He was well rid of her - He certainly didn't want a wife who was fickle as Ann, but all of this was rationalization! sometimes woke up in the middle of the night thinking of Ann, and then could not get back to sleep. His plans and dreams had revolved around her so much and for so long that now... the easiest thing would be to sell out to Al Budd and leave the country


### BART Transformers

In [65]:
# Summarization with BART Transformers

# Instantiate the pretrained "bart-large-cnn" model, fine tuned especially for summarization task
tokenizer=BartTokenizer.from_pretrained('facebook/bart-large-cnn')
model=BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')

# Pass the input text in the form of a sequence of ids using the batch_encode_plus() function with the tokenizer
inputs = tokenizer.batch_encode_plus( 
    [sample_text],               # sequence to be encoded
    return_tensors='pt',         # return PyTorch tensors
    max_length=512,              # maximum length to use by one of the truncation/padding parameter
    truncation=True              # truncate to a maximum length specified with the argument max_length
)

# Pass the encoded sequence to the function to generate the ids of the summarized output
summary_ids = model.generate(
    inputs['input_ids'],         # the sequence used as a prompt for the generation
    num_beams=4,                 # number of beams for beam search
    no_repeat_ngram_size=2,      # all ngrams of the specified size can occur only once
    min_length=100,               # the minimum length of the sequence to be generated
    max_length=200,              # the maximum length of the sequence to be generated
    early_stopping=True          # stop the beam search when at least num_beams sentences are finished per batch
)

# Convert the encoded sequence of ids to text
bart_summary = tokenizer.decode(
    summary_ids[0],              # list of tokenized input ids
    skip_special_tokens=True     # remove special tokens in the decoding
)

# Print the summary
print("BART summary:", bart_summary, sep="\n\n")

BART summary:

Dan Morgan told himself he would forget Ann Turner. He didn't want a wife who was fickle as Ann. If he had married her, he'd have been asking for trouble. The best antidote for the bitterness and disappointment that poisoned him was hard work. When he saw two people top the ridge to the south and walk toward the house, he quit work immediately and strode to his rifle. It could be some kind of trick Budd had thought up. No one walked in this country, least of all Ed Dow or Dutch Renfro.


### GPT-2 Transformers

In [7]:
GPT2_model = TransformerSummarizer(
    transformer_type="GPT2",
    transformer_model_key="gpt2-large")

text = "".join(sample_text)

summarize = ''.join(GPT2_model(text, 
    min_length=60, 
    max_length=100))

print("GPT-2 summary:", summarize, sep = "\n\n")

GPT-2 summary:

The best antidote for the bitterness and disappointment that poisoned him was hard work . Both had blonde hair and blue eyes , and there was even a faint similarity of features . He brought his Winchester in from the front of the house , then faced the boy . They might kill him in his sleep , thinking there was money in the house . She rubbed her eyes and stretched , then sat up , her hands going to her hair . `` I don't have many strays coming to my front door '' , he said . When he saw the expression in her eyes , he knew he couldn't send them on . He said : `` If it's all right with you , Mr. Morgan , I'll sleep out here on the couch .


### XLN Transformers

In [8]:
xln_model = TransformerSummarizer(
    transformer_type="XLNet",
    transformer_model_key="xlnet-base-cased")

full = ''.join(xln_model(sample_text, 
                      min_length=60, 
                      max_length = 100))

print("XLNet summary:", full, sep = "\n\n")

Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetModel: ['lm_loss.weight', 'lm_loss.bias']
- This IS expected if you are initializing XLNetModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLNetModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


XLNet summary:

The best antidote for the bitterness and disappointment that poisoned him was hard work . The grass in the meadows came fast , now that the warm weather was here . When they were closer and he saw that one was a woman , he was more puzzled than ever . Reaching the house ahead of them , he waited with his Winchester in his hands . She rubbed her eyes and stretched , then sat up , her hands going to her hair . When they were finally satisfied , Jones said , `` I think he's going to give us work '' . He said : `` You'll feel a lot better after you have a bath . They were a pair of lost , whipped kids , Morgan thought as he went to bed .


# Updated code with new text

In [11]:
# NY Times article was downloaded as a pdf and later a txt file as the article sits behind a paywall
article2 = open("Opinion _ How Is the U.S. Economy Doing_ - The New York Times.txt", encoding="utf8").read()

In [12]:
## Full text
article2 

'Last week’s employment report was puzzling. The Bureau of Labor Statistics carries out\ntwo separate surveys, one of employers and another of households; we normally expect the\ntwo to paint a similar picture. This time, not so much.\nThe employer survey was, to use the technical term, meh — 210,000 jobs added, a\nrespectable number but not what many had hoped for. The household survey, however, was\nterrific; in particular, the employment rate among prime-age adults, a key measure of labor\nmarket health, is beginning to approach prepandemic levels.\nWell, we shouldn’t make too much of the apparent inconsistencies in the report. Noisy data\nhappens, and overall the economic picture looks pretty good — indeed, in many ways this\nlooks like the best economic recovery in many decades.\nYet consumers appear to be feeling very downbeat — or at least that’s what they tell\nsurveys like the famous Michigan Survey of Consumers. And this perception of a bad\neconomy is clearly weighing on Pre

### Summary of NY Times article created using Quillbot (https://quillbot.com/summarize) 

"" Last week's employment report from the Bureau of Labor Statistics was puzzling. Employer survey added 210,000 jobs, a respectable number but not what many had hoped for. But household survey added a lot more jobs, and key measure of labor market health is beginning to approach prepandemic levels. Frum asks: Is this a bad economy despite data showing it as verygood? And if it really isn't a bad recession, why does the public say it is?

Inflation has a corrosive effect on confidence even when incomes are keeping up, as it creates the perception that things are out of control. The Michigan Surveys don't ask quite the same questions, but they do ask people how their current financial situation compares with five years earlier. 63 percent say they're better off, the same number as in September 1984, just before Ronald Reagan won an electoral landslide.""

## Extractive Methods

### Gensim with TextRank

In [46]:
short_summary = summarize(article2, word_count = 100)
print(short_summary)

That said, surveys about inflation also illustrate the point that when you talk to consumers,
So what question are people really answering when asked about the state of the economy?
Another clue is that you get very different answers when you ask people “How are you
doing?” rather than “How is the economy doing?” The Langer Consumer Confidence Index
asks people separately about the national economy — where their assessment is dismal —
The Michigan Surveys don’t ask quite the same questions, but they do ask people
As I said, part of the answer is probably that inflation unnerves people even when their


### SUMY LexRank

In [47]:
parsed = PlaintextParser.from_string(article2,Tokenizer('english'))

In [48]:
lexrank_summarizer = LexRankSummarizer()
lexrank_summary = lexrank_summarizer(parsed.document, sentences_count=5)

In [49]:
for sen in lexrank_summary:
    print(sen)

This time, not so much.
I don’t think it’s a crude case of “people are being lied to by the corporate media,” although if you ask me, it’s silly when people in the media get all huffy over any suggestion that how they report on the economy has an influence on public perceptions.
That said, surveys about inflation also illustrate the point that when you talk to consumers, the questions they answer may not be the ones you thought you were asking.
The Michigan Surveys don’t ask quite the same questions, but they do ask people how their current financial situation compares with five years earlier; 63 percent say they’re better off, the same number as in September 1984, just before Ronald Reagan won an electoral landslide with claims that it was “morning in America.” Aside from looking at what people say, surely it makes sense to look at what they do.
As I said, part of the answer is probably that inflation unnerves people even when their incomes are keeping up.


### SUMY LSA (Latent semantic analysis)

In [50]:
LSAparsed = PlaintextParser.from_string(article2,Tokenizer('english'))

In [51]:
LSA_summarizer = LsaSummarizer()
LSA_summary = LSA_summarizer(LSAparsed.document,5)

In [52]:
for sen in LSA_summary:
    print(sen)

Let’s start with the obvious culprit, inflation, which is indeed running hotter than it has for decades.
Republicans say, bizarrely, that current economic conditions are much worse than they were in March 2009, when the economy was losing 800,000 jobs a month.
That is, businesses are investing as if they see a booming economy and expect the boom to continue.
As I said, part of the answer is probably that inflation unnerves people even when their incomes are keeping up.
Finally, as I also said, it’s implausible to assert that the tone of media coverage is irrelevant.


### SUMY Luhn

In [53]:
luhnparser=PlaintextParser.from_string(article2,Tokenizer('english'))

In [54]:
luhn_summarizer=LuhnSummarizer()
luhn_summary=luhn_summarizer(luhnparser.document,sentences_count=5)

In [55]:
for sentence in luhn_summary:
    print(sentence)

I don’t think it’s a crude case of “people are being lied to by the corporate media,” although if you ask me, it’s silly when people in the media get all huffy over any suggestion that how they report on the economy has an influence on public perceptions.
And my sense is that inflation has a corrosive effect on confidence even when incomes are keeping up, because it creates the perception that things are out of control.
That said, surveys about inflation also illustrate the point that when you talk to consumers, the questions they answer may not be the ones you thought you were asking.
Another clue is that you get very different answers when you ask people “How are you doing?” rather than “How is the economy doing?” The Langer Consumer Confidence Index asks people separately about the national economy — where their assessment is dismal — and about their personal financial situation, where their rating is high by historical standards.
The Michigan Surveys don’t ask quite the same questi

### SUMY KL-Sum

In [56]:
klparser=PlaintextParser.from_string(article2,Tokenizer('english'))

In [57]:
kl_summarizer=KLSummarizer()
kl_summary=kl_summarizer(klparser.document,sentences_count=5)

In [58]:
for sentence in kl_summary:
    print(sentence)

Just to be clear, I genuinely want to know the answer to these questions.
(If it doesn’t, why do they bother?)
That said, surveys about inflation also illustrate the point that when you talk to consumers, the questions they answer may not be the ones you thought you were asking.
The Michigan Surveys don’t ask quite the same questions, but they do ask people how their current financial situation compares with five years earlier; 63 percent say they’re better off, the same number as in September 1984, just before Ronald Reagan won an electoral landslide with claims that it was “morning in America.” Aside from looking at what people say, surely it makes sense to look at what they do.
Don’t let the doomsayers tell you otherwise.


## Abstractive Methods

### T5 Transformers

In [61]:
# Summarization with T5 Transformers

# T5 transformers performs different tasks based on the prefix at the beggining of text.
# Therefore, we need to add the tstring "summarize:" at the start of the input text.
text = "summarize:" + article2

# Instantiate the pretrained "t5-small" model and tokenizer
my_model = T5ForConditionalGeneration.from_pretrained('t5-small')
tokenizer = T5Tokenizer.from_pretrained('t5-small')

# T5 is a encoder-decorder mode, meaning that the input sentence should be in the form of a sequence of ids.
# Hence, we convert the text in the requested format through the process called encoding.
input_ids=tokenizer.encode(
    text,                        # sequence to be encoded
    return_tensors='pt',         # return PyTorch tensors
    max_length=512,              # maximum length to use by one of the truncation/padding parameter
    truncation=True              # truncate to a maximum length specified with the argument max_length
)

# Pass the encoded text to the function which returns a sequence of ids corresponding to the summary
summary_ids = my_model.generate(
    input_ids,                   # previously encoded text
    num_beams=4,                 # number of beams for beam search
    no_repeat_ngram_size=2,      # all ngrams of the specified size can occur only once
    min_length=100,               # the minimum length of the sequence to be generated
    max_length=200,              # the maximum length of the sequence to be generated
    early_stopping=True          # stop the beam search when at least num_beams sentences are finished per batch
)

# Use the function to generate the summary text form the generated ids
t5_summary = tokenizer.decode(
    summary_ids[0],              # list of tokenized input ids
    skip_special_tokens=True     # remove special tokens in the decoding
)

# Print the summary
print("T5 summary:", t5_summary, sep="\n\n")

T5 summary:

the employment rate among prime-age adults is beginning to approach prepandemic levels. consumer perception of a bad economy is clearly weighing on president Biden’s approval rating, which raises the question: Are consumers right? rising prices have certainly eroded many workers’ wage gains, although real personal income per capita is still above its prepundetic level even though the government is no longer handing out lots of money. if you talk to consumers, the questions they answer may not be the ones you thought you were going to be


### BART Transformers

In [62]:
# Summarization with BART Transformers

# Instantiate the pretrained "bart-large-cnn" model, fine tuned especially for summarization task
tokenizer=BartTokenizer.from_pretrained('facebook/bart-large-cnn')
model=BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')

# Pass the input text in the form of a sequence of ids using the batch_encode_plus() function with the tokenizer
inputs = tokenizer.batch_encode_plus( 
    [article2],               # sequence to be encoded
    return_tensors='pt',         # return PyTorch tensors
    max_length=512,              # maximum length to use by one of the truncation/padding parameter
    truncation=True              # truncate to a maximum length specified with the argument max_length
)

# Pass the encoded sequence to the function to generate the ids of the summarized output
summary_ids = model.generate(
    inputs['input_ids'],         # the sequence used as a prompt for the generation
    num_beams=4,                 # number of beams for beam search
    no_repeat_ngram_size=2,      # all ngrams of the specified size can occur only once
    min_length=100,               # the minimum length of the sequence to be generated
    max_length=200,              # the maximum length of the sequence to be generated
    early_stopping=True          # stop the beam search when at least num_beams sentences are finished per batch
)

# Convert the encoded sequence of ids to text
bart_summary = tokenizer.decode(
    summary_ids[0],              # list of tokenized input ids
    skip_special_tokens=True     # remove special tokens in the decoding
)

# Print the summary
print("BART summary:", bart_summary, sep="\n\n")

BART summary:

Last week’s employment report was puzzling. The employer survey was, to use the technical term, meh — 210,000 jobs added. But the household survey, however, wasterrific; in particular, the employment rate among prime-age adults is beginning to approach prepandemic levels. Are consumers right? Is this a bad economy despite data showing it as very good? And why does the public say it is? I genuinely want to know the answer to these questions.


### GPT-2

In [15]:
GPT2_model = TransformerSummarizer(
    transformer_type="GPT2",
    transformer_model_key="gpt2-large")

text = "".join(article2)

summarize = ''.join(GPT2_model(text, 
    min_length=60, 
    max_length=150))

print("GPT-2 summary:", summarize, sep = "\n\n")

GPT-2 summary:

The employer survey was, to use the technical term, meh — 210,000 jobs added, a
respectable number but not what many had hoped for. And this perception of a bad
economy is clearly weighing on President Biden’s approval rating. And if we turn our attention from consumers to businesses, what we see is a huge surge in
capital expenditures. Finally, as I also said, it’s implausible to assert that the tone of media coverage is irrelevant.


### XLNet

In [16]:
xln_model = TransformerSummarizer(
    transformer_type="XLNet",
    transformer_model_key="xlnet-base-cased")

full = ''.join(xln_model(article2, 
                      min_length=60, 
                      max_length = 150))

print("XLNet summary:", full, sep = "\n\n")

Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetModel: ['lm_loss.weight', 'lm_loss.bias']
- This IS expected if you are initializing XLNetModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLNetModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


XLNet summary:

The employer survey was, to use the technical term, meh — 210,000 jobs added, a
respectable number but not what many had hoped for. And this perception of a bad
economy is clearly weighing on President Biden’s approval rating. If
consumers are really as depressed as the sentiment numbers say, why are retail sales
running so high? As I said, part of the answer is probably that inflation unnerves people even when their
incomes are keeping up.
