<a href="https://colab.research.google.com/github/sriramkumar25/GenAIProjects/blob/main/Summarization_System_with_T5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Summarization System with T5

## Overview
This project uses the **T5 (Text-to-Text Transfer Transformer)** model for **text summarization**, treating summarization as a text generation task.

## Features
- **Pre-trained T5 Model**: Fine-tuned for summarization.
- **Custom Input**: Summarize user-provided text or documents.
- **Efficient**: Fast and accurate summarization.

## Tools & Technologies
- **T5 Model** (from Hugging Face)
- **Python** (Transformers, PyTorch)

## Workflow
1. **Load Model**: Import T5 model and tokenizer.
2. **Input Text**: Provide text for summarization.
3. **Generate Summary**: Produce a concise summary.


In [None]:
from transformers import T5ForConditionalGeneration, T5Tokenizer

In [2]:
# Initialize the model and tokenizer with t5-large
model_name = "t5-large"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.95G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [3]:

def summarize_text(text, max_length=150):
    # Preprocess the text for T5 (add "summarize:" prefix)
    input_text = "summarize: " + text
    inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)

    # Generate summary
    summary_ids = model.generate(inputs["input_ids"], max_length=max_length, min_length=30, length_penalty=2.0, num_beams=4, early_stopping=True)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary


In [7]:
print("T5 Summarization System")
print("Enter the text you want to summarize (or type 'exit' to quit):")

while True:
    # Get user input
    user_input = input("Your Text: ")

    # Check for exit condition
    if user_input.lower() == "exit":
        print("Exiting the summarization system. Goodbye!")
        break

    # Summarize and print the result
    summary = summarize_text(user_input)
    print("\nSummary:", summary)
    print("\n" + "-"*50 + "\n")

T5 Summarization System
Enter the text you want to summarize (or type 'exit' to quit):
Your Text: Text summarization is usually implemented by natural language processing methods, designed to locate the most informative sentences in a given document.On the other hand, visual content can be summarized using computer vision algorithms. Image summarization is the subject of ongoing research; existing approaches typically attempt to display the most representative images from a given image collection, or generate a video that only includes the most important content from the entire collection. Video summarization algorithms identify and extract from the original video content the most important frames , and/or the most important video segments , normally in a temporally ordered fashion.Video summaries simply retain a carefully selected subset of the original video frames and, therefore, are not identical to the output of video synopsis algorithms, where new video frames are being synthesiz