### Is t5-base summary good enough?

It is not easy to say/ measure if a summary produced by a model is good or not. I am going to try to automatically summarize a text for which human produced summary is available. Let's try this text from a website http://explainwell.org/index.php/table-of-contents-synthesize-text/examples-of-summaries/

In [1]:
ARTICLE = """Johannes Gutenberg (1398 – 1468) was a German goldsmith and publisher who introduced printing to Europe. His introduction of mechanical movable type printing to Europe started the Printing Revolution and is widely regarded as the most important event of the modern period. It played a key role in the scientific revolution and laid the basis for the modern knowledge-based economy and the spread of learning to the masses.

Gutenberg many contributions to printing are: the invention of a process for mass-producing movable type, the use of oil-based ink for printing books, adjustable molds, and the use of a wooden printing press. His truly epochal invention was the combination of these elements into a practical system that allowed the mass production of printed books and was economically viable for printers and readers alike.

In Renaissance Europe, the arrival of mechanical movable type printing introduced the era of mass communication which permanently altered the structure of society. The relatively unrestricted circulation of information—including revolutionary ideas—transcended borders, and captured the masses in the Reformation. The sharp increase in literacy broke the monopoly of the literate elite on education and learning and bolstered the emerging middle class."""
ARTICLE

'Johannes Gutenberg (1398 – 1468) was a German goldsmith and publisher who introduced printing to Europe. His introduction of mechanical movable type printing to Europe started the Printing Revolution and is widely regarded as the most important event of the modern period. It played a key role in the scientific revolution and laid the basis for the modern knowledge-based economy and the spread of learning to the masses.\n\nGutenberg many contributions to printing are: the invention of a process for mass-producing movable type, the use of oil-based ink for printing books, adjustable molds, and the use of a wooden printing press. His truly epochal invention was the combination of these elements into a practical system that allowed the mass production of printed books and was economically viable for printers and readers alike.\n\nIn Renaissance Europe, the arrival of mechanical movable type printing introduced the era of mass communication which permanently altered the structure of societ

The human produced summary:

In [2]:
sumRef ="""The German Johannes Gutenberg introduced printing in Europe. His invention had a decisive contribution in spread of mass-learning and in building the basis of the modern society.

Gutenberg major invention was a practical system permitting the mass production of printed books. The printed books allowed open circulation of information, and prepared the evolution of society from to the contemporary knowledge-based economy."""

In [3]:
sumRef

'The German Johannes Gutenberg introduced printing in Europe. His invention had a decisive contribution in spread of mass-learning and in building the basis of the modern society.\n\nGutenberg major invention was a practical system permitting the mass production of printed books. The printed books allowed open circulation of information, and prepared the evolution of society from to the contemporary knowledge-based economy.'

In [4]:
# some pre-processing:
ARTICLE = ARTICLE.replace('.', '.<eos>')
ARTICLE = ARTICLE.replace('?', '?<eos>')
ARTICLE = ARTICLE.replace('!', '!<eos>')

In [5]:
# now the model witha tokenizer:
from transformers import AutoTokenizer, AutoModelWithLMHead

tokenizer = AutoTokenizer.from_pretrained('t5-base')
model = AutoModelWithLMHead.from_pretrained('t5-base', return_dict=True)



In [6]:
# tokenize the input text, ARTICLE:
inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors='pt', max_length=512, truncation=True)

In [7]:
# now generate the summary:
summary_ids = model.generate(inputs, max_length=75, min_length=50, length_penalty=1., num_beams=3)
summary = tokenizer.decode(summary_ids[0])
summary

To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ..\aten\src\ATen\native\BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)


'<pad> Johannes Gutenberg was a goldsmith and publisher who introduced printing to Europe. his introduction of mechanical movable type printing to Europe started the Printing Revolution. it played a key role in the scientific revolution and laid the basis for the modern knowledge-based economy.</s>'

To see if the summary produced by t5-base model says the same as the reference (human-produced) summary, I'll use a trick - compare the two summaries if one is a paraphrase of another. This is not exactly the same but that model produces a score (from 0 to 1) which indicates how one text is close to another.

In [8]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

In [9]:
tokenizer1 = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
model1 = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")

In [10]:
sequence_0 = summary
sequence_1 = sumRef
classes = ["not paraphrase", "is paraphrase"]

In [11]:
paraphrase = tokenizer1(sequence_0, sequence_1, return_tensors="pt")
paraphrase_classification_logits = model1(**paraphrase).logits
paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0]
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")

not paraphrase: 91%
is paraphrase: 9%


Not very close.

###  Pipeline summarizer

Let's try a summarizer from Hugging Face pipelines functions:

In [12]:
from transformers import pipeline
summarizer = pipeline("summarization")



In [14]:
result = summarizer(ARTICLE, max_length=75, min_length=50, do_sample=False)

In [18]:
summaryPL = result[0]['summary_text']

In [19]:
summaryPL

' Johannes Gutenberg (1398 – 1468) was a German goldsmith and publisher who introduced printing to Europe . Gutenberg many contributions to printing are: the invention of a process for mass-producing movable type, the use of oil-based ink for printing books, adjustable molds, and use of a wooden printing press .'

In [20]:
sequence_0 = summaryPL
sequence_1 = sumRef

In [21]:
paraphrase = tokenizer1(sequence_0, sequence_1, return_tensors="pt")
paraphrase_classification_logits = model1(**paraphrase).logits
paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0]
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")

not paraphrase: 95%
is paraphrase: 5%


The score is even less.

In [22]:
# compare the two automatically produced summaries - for completeness:
sequence_0 = summaryPL
sequence_1 = summary

In [23]:
paraphrase = tokenizer1(sequence_0, sequence_1, return_tensors="pt")
paraphrase_classification_logits = model1(**paraphrase).logits
paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0]
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")

not paraphrase: 95%
is paraphrase: 5%


But wait, t5 summarizer allows us to use more beams (I used 3), let's try 9 beams:

In [24]:
inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors='pt', max_length=512, truncation=True)
summary_ids = model.generate(inputs, max_length=75, min_length=50, length_penalty=1., num_beams=9)
summary = tokenizer.decode(summary_ids[0])
summary

'<pad> Johannes Gutenberg (1398 – 1468) was a german goldsmith and publisher who introduced printing to Europe. his introduction of mechanical movable type printing to Europe started the Printing Revolution. it played a key role in the scientific revolution and laid the basis for the modern knowledge-based economy.</s>'

Very little change: it added years of birth and death. Let's see the paraphrase score if it changed:

In [26]:
sequence_0 = summary
sequence_1 = sumRef

In [27]:
paraphrase = tokenizer1(sequence_0, sequence_1, return_tensors="pt")
paraphrase_classification_logits = model1(**paraphrase).logits
paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0]
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")

not paraphrase: 95%
is paraphrase: 5%


Now it dropped the score for a more accurate summary from 9% to 5%...OK

In [28]:
# let's compare a summary to the original article:
sequence_0 = summary
sequence_1 = ARTICLE

In [29]:
paraphrase = tokenizer1(sequence_0, sequence_1, return_tensors="pt")
paraphrase_classification_logits = model1(**paraphrase).logits
paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0]
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")

not paraphrase: 95%
is paraphrase: 5%


Still, anyone can read all the versions and decide for themselves if an automatically produced summary is good enough.