<a href="https://colab.research.google.com/github/surendraveeri/TEXT-SUMMARIZATION-AND-EVALUATION-USING-GOOGLE-PEGASUS/blob/main/NLP_Mini_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
# =============================================================
# üß† TEXT SUMMARIZATION AND EVALUATION USING GOOGLE PEGASUS (WITH GRADIO)
# =============================================================

# üéØ Objective:
# Summarize long text using Hugging Face's Pegasus model,
# evaluate the summary using ROUGE metrics, and display results via Gradio UI.

# =============================================================
# STEP 1: INSTALL DEPENDENCIES
# =============================================================
!pip install transformers rouge-score gradio --quiet

# =============================================================
# STEP 2: IMPORT LIBRARIES
# =============================================================
from transformers import PegasusTokenizer, PegasusForConditionalGeneration, pipeline
from rouge_score import rouge_scorer
import gradio as gr
import torch

# =============================================================
# STEP 3: LOAD PEGASUS MODEL AND TOKENIZER
# =============================================================
device = 0 if torch.cuda.is_available() else -1
model_name = "google/pegasus-cnn_dailymail"

summarizer = pipeline("summarization", model=model_name, tokenizer=model_name, device=device)

print("‚úÖ Model loaded successfully on", "GPU" if device == 0 else "CPU")

# =============================================================
# STEP 4: DEFINE SUMMARIZATION + EVALUATION FUNCTION
# =============================================================
def summarize_and_evaluate(text):
    if not text.strip():
        return "‚ö†Ô∏è Please enter some text.", "", ""

    # --- Generate Summary ---
    summary = summarizer(text, max_length=180, min_length=40, do_sample=False)[0]['summary_text']

    # --- Evaluate using ROUGE ---
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
    score = scorer.score(text, summary)
    rouge1 = round(score['rouge1'].fmeasure, 2)
    rougeL = round(score['rougeL'].fmeasure, 2)

    return summary, f"üìä ROUGE-1 Score: {rouge1}", f"üìà ROUGE-L Score: {rougeL}"

# =============================================================
# STEP 5: BUILD GRADIO FRONTEND
# =============================================================
with gr.Blocks(theme=gr.themes.Soft()) as demo:
    gr.Markdown("""
    # üß† Text Summarization and Evaluation using Google Pegasus
    Paste or type any paragraph below üëá and click **Generate Summary**
    ---
    ‚úÖ Generates human-like summaries using **Pegasus**
    ‚úÖ Evaluates summary quality using **ROUGE-1** and **ROUGE-L**
    ‚úÖ GPU-optimized for Google Colab (T4)
    """)

    text_input = gr.Textbox(label="üìù Enter your paragraph", lines=10, placeholder="Type or paste your text here...")
    summarize_btn = gr.Button("‚ú® Generate Summary")

    summary_output = gr.Textbox(label="üìÑ Generated Summary", lines=6)
    rouge1_output = gr.Textbox(label="üìä ROUGE-1 Score")
    rougeL_output = gr.Textbox(label="üìà ROUGE-L Score")

    summarize_btn.click(
        fn=summarize_and_evaluate,
        inputs=text_input,
        outputs=[summary_output, rouge1_output, rougeL_output]
    )

demo.launch()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-cnn_dailymail and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/88.0 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

Device set to use cuda:0


‚úÖ Model loaded successfully on GPU
It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://70f508b3539abda45c.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


