# **Text Summarization**

* Text summarization using Natural Language Processing (NLP) is the automated process of condensing large amounts of text into a shorter, coherent summary, preserving the original meaning and key information.  
* It employs various NLP techniques to understand text context and extract essential details, using either extractive methods (selecting key sentences) or abstractive methods (generating new sentences).  
* Text summarization is a core NLP application used across industries for faster information retrieval, document analysis, and content summarization.

## **Using a Pretrained Model**

In [2]:
# importing libraries

from transformers import pipeline

In [2]:
# Define the text

ARTICLE = """
The Amazon rainforest is the largest tropical rainforest in the world, covering an area of
about 5.5 million square kilometers. It spans nine countries, with the majority in Brazil,
followed by Peru, Colombia, Venezuela, Ecuador, Bolivia, Guyana, Suriname, and French Guiana.
It is vital for the world's climate, as its dense foliage absorbs millions of tons of carbon
dioxide every year, a process that helps to stabilize global warming.
The rainforest is also home to an astonishing variety of wildlife, including over 40,000
plant species, 3,000 types of fish, 1,300 species of birds, and hundreds of mammals.
Deforestation, driven primarily by cattle ranching and agriculture, poses a serious threat
to this critical ecosystem, leading to habitat loss and a reduction in its carbon absorption capacity.
"""

In [6]:
# loading a pretrained model

# A powerful pre-trained model 'facebook/bart-large-cnn'.
# Abstractive summarization generates a new summary, not just extracts sentences.

print("Initializing summarization pipeline...")
summarizer = pipeline(
    "summarization",
    # model="facebook/bart-large-cnn"
    model='facebook/bart-large-xsum'
)
print("Pipeline initialized successfully.")

Initializing summarization pipeline...


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/309 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


Pipeline initialized successfully.


In [7]:
# generating the summary

print("Generating summary...")
summary = summarizer(
    ARTICLE,
    max_length=130,
    min_length=30,
    do_sample=False
)

Generating summary...


In [8]:
# extracting the summary

summarized_text = summary[0]['summary_text']

print("ORIGINAL TEXT:")
print(ARTICLE)
print("-" * 50)
print("SUMMARIZED TEXT:")
print(summarized_text)
print("-" * 50)

ORIGINAL TEXT:

The Amazon rainforest is the largest tropical rainforest in the world, covering an area of 
about 5.5 million square kilometers. It spans nine countries, with the majority in Brazil, 
followed by Peru, Colombia, Venezuela, Ecuador, Bolivia, Guyana, Suriname, and French Guiana. 
It is vital for the world's climate, as its dense foliage absorbs millions of tons of carbon 
dioxide every year, a process that helps to stabilize global warming. 
The rainforest is also home to an astonishing variety of wildlife, including over 40,000 
plant species, 3,000 types of fish, 1,300 species of birds, and hundreds of mammals. 
Deforestation, driven primarily by cattle ranching and agriculture, poses a serious threat 
to this critical ecosystem, leading to habitat loss and a reduction in its carbon absorption capacity.

--------------------------------------------------
SUMMARIZED TEXT:
The Amazon rainforest is one of the world's most important natural resources, and is a major source 