## **PDF Summarizer**

In [1]:
!pip install transformers



In [2]:
from transformers import pipeline

# Load summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

def summarize_text(text, max_length=130, min_length=30):
    if not text or len(text.split()) < 20:
        return "Summary not available (Text too short)"

    summary = summarizer(text, max_length=max_length, min_length=min_length, do_sample=False)
    return summary[0]['summary_text']

Device set to use cpu


In [3]:
test_abstract = """
Neil Alden Armstrong (August 5, 1930 – August 25, 2012) was an American astronaut and aeronautical engineer who, in 1969, became the first person to walk on the Moon. He was also a naval aviator, test pilot, and university professor.

Armstrong was born and raised near Wapakoneta, Ohio. He entered Purdue University, studying aeronautical engineering, with the U.S. Navy paying his tuition under the Holloway Plan.
"""

print(summarize_text(test_abstract))

Your max_length is set to 130, but your input_length is only 103. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=51)


Armstrong was born and raised near Wapakoneta, Ohio. He entered Purdue University, studying aeronautical engineering, with the U.S. Navy paying his tuition under the Holloway Plan.


In [4]:
from transformers import pipeline

# Load summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

def chunk_text(text, max_tokens=500):
    """
    Split the long text into chunks of roughly `max_tokens` words.
    """
    words = text.split()
    return [' '.join(words[i:i+max_tokens]) for i in range(0, len(words), max_tokens)]

def summarize_text(text, max_length=130, min_length=30):
    if not text or len(text.split()) < 20:
        return "Summary not available (Text too short)"

    chunks = chunk_text(text)
    summaries = []

    for chunk in chunks:
        try:
            summary = summarizer(chunk, max_length=max_length, min_length=min_length, do_sample=False)
            summaries.append(summary[0]['summary_text'])
        except Exception as e:
            summaries.append("[Chunk failed to summarize]")

    return "\n".join(summaries)

# Example
test_abstract = """
Neil Alden Armstrong (August 5, 1930 – August 25, 2012) was an American astronaut and aeronautical engineer who, in 1969, became the first person to walk on the Moon. He was also a naval aviator, test pilot, and university professor.

Armstrong was born and raised near Wapakoneta, Ohio. He entered Purdue University, studying aeronautical engineering, with the U.S. Navy paying his tuition under the Holloway Plan. Later, he became a test pilot and flew over 200 different models of aircraft.

He joined NASA in 1962, and his first spaceflight was the Gemini 8 mission in 1966, where he performed the first docking of two spacecraft. His second and final spaceflight was Apollo 11 in July 1969, when he and Buzz Aldrin descended to the lunar surface.

Armstrong’s iconic words, “That's one small step for [a] man, one giant leap for mankind,” were heard by millions. After the Apollo program, he became a professor and served on multiple commissions related to spaceflight safety.

Armstrong was widely respected and remained a private figure until his death in 2012.
"""

print(summarize_text(test_abstract))

Device set to use cpu


Neil Alden Armstrong was an American astronaut and aeronautical engineer. In 1969, he became the first person to walk on the Moon. He was also a naval aviator, test pilot, and university professor.
