 Description: Text summarization using BART

In [1]:
# Import libraries
from transformers import AutoTokenizer, BartForConditionalGeneration

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Load pre-trained model
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")

In [3]:
# Define article to summarize
ARTICLE_TO_SUMMARIZE = """PG&E stated it scheduled the blackouts in response to
forecasts for high winds amid dry conditions. The aim is to reduce the
risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by
the shutoffs which were expected to last through at least midday tomorrow."""
print("ARTICLE TO SUMMARIZE:", ARTICLE_TO_SUMMARIZE)
inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors="pt")

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


ARTICLE TO SUMMARIZE: PG&E stated it scheduled the blackouts in response to
forecasts for high winds amid dry conditions. The aim is to reduce the
risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by
the shutoffs which were expected to last through at least midday tomorrow.


In [4]:
# Generate Summary
summary_ids = model.generate(inputs["input_ids"], num_beams=2, min_length=0, max_length=20)
summary_text = tokenizer.batch_decode(
    summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]

In [5]:
print("SUMMARY OF ARTICLE:", summary_text)

SUMMARY OF ARTICLE: PG&E scheduled the blackouts in response to high winds amid dry conditions. The


In [6]:
# Generate Summary
summary_ids = model.generate(inputs["input_ids"], num_beams=2, min_length=0, max_length=30)
summary_text = tokenizer.batch_decode(
    summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]

In [7]:
print("SUMMARY OF ARTICLE:", summary_text)

SUMMARY OF ARTICLE: PG&E scheduled the blackouts in response to high winds amid dry conditions. The aim is to reduce the risk of wildfires.
