<a href="https://colab.research.google.com/github/sahanyafernando/My_NLP_Learning/blob/main/Public_Response_Analysis/notebooks/06_summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 06 â€“ Extractive and Abstractive Summarization

This notebook summarizes multilingual policy response posts using TextRank (extractive)
and transformer-based models (BART/T5) for abstractive summaries on English text.

In [None]:
from google.colab import drive
drive.mount('/content/drive')


In [None]:
import pickle, pathlib

artifacts_root = pathlib.Path("/content/drive/MyDrive/My_NLP_Learning/Public_Response_Analysis")
artifacts_path = artifacts_root / "artifacts/preprocessing_outputs.pkl"

if artifacts_path.exists():
    with open(artifacts_path, "rb") as f:
        artifacts = pickle.load(f)
    df = artifacts["df"]
    print("Loaded preprocessing artifacts and DataFrame.")
else:
    raise FileNotFoundError(
        "Artifacts not found. Please run 01_data_loading_and_preprocessing.ipynb first "
        "and execute the 'Save preprocessing artifacts' cell."
    )


## Extractive summarization with TextRank

We apply a simple TextRank-style approach over English posts to extract the most
representative sentences for each policy topic.

In [None]:
!pip install -q summa

from summa.summarizer import summarize

english_df = df[df["language"] == "en"]

for topic in english_df["topic"].unique():
    subset = english_df[english_df["topic"] == topic]
    long_text = "\n".join(subset["text"].tolist())
    print(f"\n===== Topic: {topic} =====")
    try:
        summary = summarize(long_text, ratio=0.3)
        print("Extractive summary:")
        print(summary)
    except ValueError:
        print("Not enough text for summarization.")


## Abstractive summarization with BART/T5

We use a pretrained transformer summarization pipeline (BART) on English text.
For other languages, you can translate to English first or use multilingual T5 models.

In [None]:
!pip install -q transformers sentencepiece

from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

sample_posts = english_df["text"].head(5).tolist()
for i, text in enumerate(sample_posts, start=1):
    print(f"\n--- Post {i} ---")
    print("Original:", text)
    summary = summarizer(text, max_length=60, min_length=15, do_sample=False)[0]["summary_text"]
    print("Abstractive summary:", summary)
