<a href="https://colab.research.google.com/github/karlbuscheck/bart-summarization-exploration/blob/main/bart_summarization_exploration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring Extractive Summarization with BART


Quickly explore a pretrained transformer (`facebook/bart-large-cnn`) for abstractive summarization using a real-world science article about pumas and penguins from [The NYT](https://www.nytimes.com/2025/12/16/science/penguins-pumas-patagonia.html).

This notebook:
- Uses Hugging Face's high-level `pipeline` abstraction to run a summarization task with `facebook/bart-large-cnn` (Check out the [model card](https://huggingface.co/facebook/bart-large-cnn))
- Summarizes a long-form science story, and debugs issues with BART's 1024-token limit
- Produces an end-to-end transformer demo on CPU
- Explores next steps, including chunking and long-form models

## Import `pipeline` and load the article

Begin by importing `pipeline` and loading a sliver of the full article to test out the wiring.

In [None]:
# Import Hugging Face's pipeline, a high-level abstraction
# that bundles the tokenizer, model, and inference logic into a callable object
from transformers import pipeline

# Intialize a summarization pipeline using BART
summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device=-1)

# Load a sample article
ARTICLE= """Penguins throughout the southern seas have to worry about being picked off by seals or hunted by orcas. On land, they can find safety in numbers. But in the Patagonia region of Argentina, the flightless seabirds are becoming snacks for an unexpected land predator: pumas.
New research, published Wednesday in the journal Proceedings of the Royal Society B, offers “a beautiful blend of animal movement and ‘who eats what,’” said Jake Goheen, a wildlife ecologist at Iowa State University who was not involved in the research.
He noted that pumas usually prefer to prey on grazing mammals, not birds as small as Magellanic penguins.
“It’s an extraordinary example of how flexible large carnivores can be,” Dr. Goheen said.
"""

# Run the sample article and print a short summary
print(summarizer(ARTICLE, truncation=True, max_new_tokens=80)[0]["summary_text"])

Device set to use cpu
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Flightless seabirds are becoming snacks for pumas in Argentina. Pumas usually prefer to prey on grazing mammals, not birds as small as Magellanic penguins. New research offers “a beautiful blend of animal movement and ‘who eats what,’” expert says.


*Flightless seabirds are becoming snacks...*

That's quite the phrase that our model just generated (and a pretty decent summary overall, too). Let's see what it can do with more context.

## Summarize the full article + next steps

Having tested the workflow, we see what happens when we run the full article through the model.

In [None]:
# Grab the full article text from the NYT
ARTICLE = """Penguins throughout the southern seas have to worry about being picked off by seals or hunted by orcas. On land, they can find safety in numbers. But in the Patagonia region of Argentina, the flightless seabirds are becoming snacks for an unexpected land predator: pumas.
New research, published Wednesday in the journal Proceedings of the Royal Society B, offers “a beautiful blend of animal movement and ‘who eats what,’” said Jake Goheen, a wildlife ecologist at Iowa State University who was not involved in the research.
He noted that pumas usually prefer to prey on grazing mammals, not birds as small as Magellanic penguins.
“It’s an extraordinary example of how flexible large carnivores can be,” Dr. Goheen said.
In the early 20th century, widespread sheep ranching vanquished pumas from Patagonia. With those predators gone, Magellanic penguins, which had mostly lived on oceanic islands, established large breeding colonies on Argentina’s coast. Conservation efforts have brought pumas back to the landscape, setting the stage for new interactions between these animals.

Mitchell Serota, an ecologist and lead author of the study, was interested in how Magellanic penguins, as a new food source, were reshaping the movement patterns of pumas across the landscape. He was also curious about how pumas interacted with each other, and their population density.

To understand the behavioral changes, Dr. Serota, who completed the research at the University of California, Berkeley, and some of his colleagues put GPS collars on 14 pumas in Monte León National Park. They collected information from 2019 to 2023. Because penguins are migratory animals and are present at the breeding colony in the park for just over half the year, the scientists tracked how the pumas moved and interacted across seasons.
They found that the behavior of the pumas changed as they spent more time near the penguin colony. Pumas that hunted penguins had smaller territories than pumas that did not, and the big cats interacted with each other more frequently around the colony.
Briana Abrahms, a wildlife ecologist at the University of Washington who was not involved in the research, was familiar with the puma attacks on penguins. She had studied a penguin colony north of Monte León and thought attacks were relatively rare.

“What surprised me initially, although I think it makes complete sense, is just how much predation is happening on these penguins,” she said, “and how much the pumas have adapted to this new food source.”
After integrating GPS tracking with camera trap data, scientists also found what might be the highest density of pumas ever documented at a specific site, Dr. Serota said. Although pumas are typically solitary creatures, their population density in this area was roughly double that observed elsewhere, leading to increased interactions among the felines.
Dr. Serota likened their presence to that of grizzly bears tolerating each other during the salmon run. “Penguins appear to be doing something similar for pumas,” he said. “Food can bring predators together.”

Changes to ecosystems can affect when, where and how predators obtain their food, leading to broader ecological effects. For pumas in the region, which typically feed on guanacos, a llamalike herbivore, those ecological effects are still unknown.

“Because pumas and guanacos form the dominant predator-prey relationship in the region, changes in how pumas move and hunt can have these massive ripple effects,” Dr. Serota said.
Defenseless penguins, an easy catch for pumas, might even find themselves part of this chain reaction. “Will we see a situation in the future where the penguins go back to living mostly on oceanic islands?” Dr. Goheen said.
For Dr. Serota, the study showed that a new predator-prey relationship like that between pumas and penguins transforms the ecosystem.
“Restoring wildlife in today’s changed landscapes doesn’t simply rewind ecosystems to the past,” Dr. Serota said. “It can create these entirely new interactions that reshape animal behavior and populations in really unexpected ways.”
A common assumption in scientific literature is that reintroducing large carnivores to ecosystems can revert an ecosystem to what it once was. But over the period of time that carnivores were absent, other things have changed too. “You’re putting carnivores back into an ecosystem that doesn’t necessarily resemble the one from which they went locally extinct,” Dr. Goheen said. The animals find new situations to contend with.

“As scientists, we should get comfortable with that,” he said, “and not sell to the general public that, Hey, if we restore carnivores, they’re going to have all these other kind of chain-reaction type benefits for everything else in the ecosystem.”
He added: “We should restore carnivores because they deserve to be there and because we’re the ones that eradicated them to begin with.”
"""
 # Ensure that the tokenizer will respect BART's 1024-token context limit (pipeline won't truncate reliably otherwise)
summarizer.tokenizer.model_max_length = 1024

print(summarizer(ARTICLE, truncation=True, max_new_tokens=80)[0]["summary_text"])

Pumas are eating Magellanic penguins in the Patagonia region of Argentina. Scientists tracked pumas with GPS collars in Monte León National Park. Pumas that hunted penguins had smaller territories than those that did not. The big cats interacted with each other more frequently around the colony.


When we run the full article through BART the summary quality becomes noticeably weaker. It's more literal and less insightful. It’s not a mystery what's going on. BART has a fixed context window of 1024 tokens and was trained on front-loaded news articles, not long-form, narrative science writing. As the input grew longer, important details lost contrast, attention spread thin, and the model ended up summarizing an incomplete, diluted version of the story.

**Next Steps**:
- **Chunk the article**, or summarize chunks and then *summarize summaries*
- **Use long-context models**, like [LED](https://huggingface.co/docs/transformers/en/model_doc/led?usage=Pipeline) or [LongT5](https://huggingface.co/docs/transformers/en/model_doc/longt5)
- **Tune generation parameters**