<a href="https://colab.research.google.com/github/harshrupendrasingh/Summary/blob/main/Summary_Bart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import re
import torch
from transformers import BartTokenizer, BartForConditionalGeneration

# Device setup
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Load BART model and tokenizer
print("🔄 Loading BART model...")
model_name = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name).to(device)

# ==========================
# Feature Engineering
# ==========================

def clean_text(text):
    """Cleans and normalizes input text."""
    text = re.sub(r'\s+', ' ', text)                         # Remove extra whitespaces
    text = re.sub(r'\[[^]]*\]', '', text)                    # Remove [references]
    text = re.sub(r'\(.*?\)', '', text)                      # Remove (parentheticals)
    text = re.sub(r'https?://\S+|www\.\S+', '', text)        # Remove URLs
    text = text.strip()
    return text

def split_into_chunks(text, max_tokens=1024):
    """Splits long text into chunks BART can handle."""
    inputs = tokenizer.encode(text, return_tensors="pt", truncation=False)
    chunks = []
    input_ids = inputs[0]

    for i in range(0, len(input_ids), max_tokens):
        chunk = input_ids[i:i+max_tokens]
        chunks.append(chunk.unsqueeze(0))  # [1, max_tokens]

    return chunks

# ==========================
# Summarization Pipeline
# ==========================

def generate_summary(text, max_length=150, min_length=40, do_clean=True):
    """Generate summary for text input."""
    if do_clean:
        text = clean_text(text)

    # Handle long inputs with chunking
    input_ids_chunks = split_into_chunks(text)

    summaries = []
    for input_ids in input_ids_chunks:
        input_ids = input_ids.to(device)
        summary_ids = model.generate(
            input_ids,
            max_length=max_length,
            min_length=min_length,
            num_beams=4,
            length_penalty=2.0,
            early_stopping=True
        )
        summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
        summaries.append(summary)

    return ' '.join(summaries)

# ==========================
# CLI interface
# ==========================

if __name__ == "__main__":
    print("\n📜 Paste your long document below. Press Enter twice to summarize:\n")
    lines = []
    while True:
        line = input()
        if not line:
            break
        lines.append(line)

    raw_text = "\n".join(lines)

    print("\n🧠 Generating summary...\n")
    summary = generate_summary(raw_text)
    print("📄 Summary:\n")
    print(summary)


🔄 Loading BART model...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]


📜 Paste your long document below. Press Enter twice to summarize:

The Industrial Revolution was the transition to new manufacturing processes in Europe and the United States. This transition included going from hand production methods to machines, new chemical manufacturing and iron production processes...


🧠 Generating summary...

📄 Summary:

The Industrial Revolution was the transition to new manufacturing processes in Europe and the United States. This transition included going from hand production methods to machines. New chemical manufacturing and iron production processes were developed.
