# Zero-shot summaries

In this part we will use Hugging Face's high-level Pipeline to create summaries with a pre-trained model. There are three main steps involved when you pass some text to a pipeline:

1) The text is preprocessed into a format the model can understand.

2) The preprocessed inputs are passed to the model.

3) The predictions of the model are post-processed, so you can make sense of them.

In [4]:
!pip install transformers
from transformers import pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")



2023-11-30 07:53:50.480203: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


This line of code allows us to see which model is being used by default. We can also find this information in the source code for pipelines:https://github.com/huggingface/transformers/blob/master/src/transformers/pipelines/__init__.py

In [5]:
summarizer.model.config.__getattribute__('_name_or_path')

'facebook/bart-large-cnn'

In [6]:
# we will read the preprocessed dataset
import pandas as pd
df_test = pd.read_csv('./DialogueSum/test.csv')
ref_summaries = list(df_test['summary'])
texts = list(df_test['text'])
print("Hi")

Hi


In [13]:
# Running an Example
summarizer(texts[0], max_length=60)

[{'summary_text': 'This review paper presents the results, which cover the study of current problems of approximation theory in abstract linear spaces. Such research has been actively developed since the 2000s, based on the ideas and approaches initiated in the articles by Stepanets. In particular, the review contains results concerning the'}]

Running the pipeline over all 2,000 examples. Because this will take a while we print a counter to keep track of the progress. This should take around 50 minutes.

In [None]:
candidate_summaries = []

for i, text in enumerate(texts):
    if i % 100 == 0:
        print(i)
    #print(text)
    candidate = summarizer(text, min_length=5, max_length=60, truncation=True)
    candidate_summaries.append(candidate[0]['summary_text'])

0


Saving the candidate summaries in case we want to investigate further.

In [22]:
!pip install pandas
df = pd.DataFrame(candidate_summaries, columns=["Predictions"])
df.to_csv("./output_shilfer_60_True.csv")


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




In [23]:
# print and visualize the first 10 prediction
candidate_summaries[:10]

[' This review paper presents the study of current problems of approximation theory in abstract linear spaces . Such research has been actively developed since the 2000s, based on the ideas and approaches initiated in the articles by Stepanets .',
 ' In this talk I will describe the deep influence Planck had on the development of statistical mechanics . I will also report on a still unsolved problem in statistical mechanics, historically related to the properties of black-body radiation .',
 ' The paper deals with the solution of Shevrin ans Sapir problem . Infinite finitely presented nilsemigroup is constructed . Construction is based on aperiodic tilings, Goodman-Strauss type theorems on uniformly elliptic space .',
 ' Ecodriving guidance includes courses or suggestions for human drivers to improve driving behaviour, reducing energy use and emissions . A standard agreement on the guidance design has not been reached, leading to difficulties in designing and implementing eco-driving g

Calculating the ROUGE scores

In [24]:
!pip install datasets
#!pip install flake8-noqa
!pip install rouge-score
from datasets import load_metric
metric = load_metric("rouge")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




  metric = load_metric("rouge")


In [25]:
def calc_rouge_scores(candidates, references):
    result = metric.compute(predictions=candidates, references=references, use_stemmer=True)
    result = {key: round(value.mid.fmeasure * 100, 1) for key, value in result.items()}
    return result

In [26]:
calc_rouge_scores(candidate_summaries, ref_summaries)

{'rouge1': 25.6, 'rouge2': 11.7, 'rougeL': 20.1, 'rougeLsum': 20.1}