<a href="https://colab.research.google.com/github/harshitpathak10088/Text-Summarizer/blob/main/Text_Summarizer_using_BART%2C_T5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install rouge_score
!pip install evaluate

# ðŸ“Œ Text Summarization Project

# ----------------------------
# 1. Import Required Libraries
# ----------------------------
import nltk
import spacy
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize

from transformers import pipeline
import evaluate

nltk.download("punkt")
nltk.download("punkt_tab")
nltk.download("stopwords")
nlp = spacy.load("en_core_web_sm")

# ----------------------------
# 2. Preprocessing Function
# ----------------------------
def preprocess_text(text):
    """
    Preprocess text:
    - Lowercasing
    - Stopword removal
    - Lemmatization
    """
    doc = nlp(text.lower())
    tokens = [
        token.lemma_ for token in doc
        if token.is_alpha and token.text not in stopwords.words("english")
    ]
    return " ".join(tokens)

sample_text = """Natural Language Processing (NLP) is a field of AI
that focuses on the interaction between humans and computers using language.
Summarization is an important application of NLP where large texts are
condensed into shorter, meaningful versions while retaining core information."""

clean_text = preprocess_text(sample_text)
print("âœ… Preprocessed Text:", clean_text)

# ----------------------------
# 3. Extractive Summarization (Baseline)
# ----------------------------
def extractive_summary(text, num_sentences=2):
    sentences = sent_tokenize(text)
    return " ".join(sentences[:num_sentences])  # simple: first N sentences

print("\nðŸ“Œ Extractive Summary:")
print(extractive_summary(sample_text))

# ----------------------------
# 4. Abstractive Summarization (Transformers)
# ----------------------------
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

generated_summary = summarizer(
    sample_text,
    max_length=60,
    min_length=20,
    do_sample=False
)[0]['summary_text']

print("\nðŸ“Œ Abstractive Summary (BART):")
print(generated_summary)

# ----------------------------
# 5. Evaluation (ROUGE-1 Score)
# ----------------------------
rouge = evaluate.load("rouge")

reference = """NLP enables communication between humans and computers.
Summarization condenses long texts while keeping key information."""

scores = rouge.compute(predictions=[generated_summary], references=[reference])
print("\nðŸ“Š ROUGE-1 Score:", scores["rouge1"])


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


âœ… Preprocessed Text: natural language processing nlp field ai focus interaction human computer use language summarization important application nlp large text condense short meaningful version retain core information

ðŸ“Œ Extractive Summary:
Natural Language Processing (NLP) is a field of AI 
that focuses on the interaction between humans and computers using language. Summarization is an important application of NLP where large texts are 
condensed into shorter, meaningful versions while retaining core information.


Device set to use cpu
Your max_length is set to 60, but your input_length is only 58. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=29)



ðŸ“Œ Abstractive Summary (BART):
Natural Language Processing (NLP) is a field of AI that focuses on the interaction between humans and computers using language. Summarization is an important application of NLP where large texts are condensed into shorter, meaningful versions.

ðŸ“Š ROUGE-1 Score: 0.27450980392156865


In [None]:

# ðŸ“Œ Text Summarization Project (Clean + Fixed)

# ----------------------------
# 1. Import Required Libraries
# ----------------------------
import nltk
import spacy
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize

from transformers import pipeline
import evaluate   # instead of datasets.load_metric

nltk.download("punkt")
nltk.download("punkt_tab")
nltk.download("stopwords")
nlp = spacy.load("en_core_web_sm")

# ----------------------------
# 2. Preprocessing Function
# ----------------------------
def preprocess_text(text):
    """
    Preprocess text:
    - Lowercasing
    - Stopword removal
    - Lemmatization
    """
    doc = nlp(text.lower())
    tokens = [
        token.lemma_ for token in doc
        if token.is_alpha and token.text not in stopwords.words("english")
    ]
    return " ".join(tokens)

sample_text = """Natural Language Processing (NLP) is a field of AI
that focuses on the interaction between humans and computers using language.
Summarization is an important application of NLP where large texts are
condensed into shorter, meaningful versions while retaining core information."""

clean_text = preprocess_text(sample_text)
print("âœ… Preprocessed Text:", clean_text)

# ----------------------------
# 3. Extractive Summarization (Baseline)
# ----------------------------
def extractive_summary(text, num_sentences=2):
    sentences = sent_tokenize(text)
    return " ".join(sentences[:num_sentences])  # simple: first N sentences

print("\nðŸ“Œ Extractive Summary:")
print(extractive_summary(sample_text))

# ----------------------------
# 4. Abstractive Summarization (Transformers)
# ----------------------------
summarizer = pipeline("summarization", model="t5-small")

generated_summary = summarizer(
    sample_text,
    max_length=120,
    min_length=40,
    num_beams=6,
    length_penalty=1.0,
    early_stopping=True,
    do_sample=False
)[0]['summary_text']

print("\nðŸ“Œ Abstractive Summary (BART):")
print(generated_summary)

# ----------------------------
# 5. Evaluation (ROUGE-1 Score)
# ----------------------------
rouge = evaluate.load("rouge")

reference = """NLP enables communication between humans and computers.
Summarization condenses long texts while keeping key information."""

scores = rouge.compute(predictions=[generated_summary], references=[reference])
print("\nðŸ“Š ROUGE-1 Score:", scores["rouge1"])



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


âœ… Preprocessed Text: natural language processing nlp field ai focus interaction human computer use language summarization important application nlp large text condense short meaningful version retain core information

ðŸ“Œ Extractive Summary:
Natural Language Processing (NLP) is a field of AI 
that focuses on the interaction between humans and computers using language. Summarization is an important application of NLP where large texts are 
condensed into shorter, meaningful versions while retaining core information.


Device set to use cpu
Your max_length is set to 120, but your input_length is only 57. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=28)
Both `max_new_tokens` (=256) and `max_length`(=120) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



ðŸ“Œ Abstractive Summary (BART):
natural language processing (NLP) is a field of AI that focuses on the interaction between humans and computers using language . large texts are condensed into shorter, meaningful versions while retaining core information .

ðŸ“Š ROUGE-1 Score: 0.3404255319148936
