<a href="https://colab.research.google.com/github/louisesaavedra25-lang/Iris_sample/blob/main/summarizerAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# USER-PROMPT BASED SUMMARIZER



In [9]:
# Install huggingface transformers
!pip install transformers --quiet
!pip install torch --quiet

# Imports
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
import textwrap

# Using a high-quality and free summarization model (fast & accurate)
model_name = "facebook/bart-large-cnn"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
summarizer = pipeline("summarization", model=model, tokenizer=tokenizer)

# Summarizer Agent
def summarize_text(text: str, style: str = "paragraph", max_chunk=500) -> str:
    """
    Summarize text using Hugging Face pipeline.

    Args:
        text (str): Long text to summarize.
        style (str): "bullet" or "paragraph" output.
        max_chunk (int): Max tokens per chunk (to handle long text).

    Returns:
        str: Summarized text.
    """
    # Longtext to chunks
    chunks = textwrap.wrap(text, max_chunk)
    summaries = []

    for chunk in chunks:
        result = summarizer(chunk, max_length=150, min_length=40, do_sample=False)
        summaries.append(result[0]['summary_text'])

    full_summary = " ".join(summaries)

    # Convert to bulleted form
    if style.lower() == "bullet":
        sentences = full_summary.split(". ")
        bullet_summary = "\n- " + "\n- ".join([s.strip() for s in sentences if s.strip()]) + "."
        return bullet_summary

    return full_summary

# Example is from: https://www.allianzcare.com/en/support/health-and-wellness/national-healthcare-systems/healthcare-in-philippines.html
long_text = """
Doctors and nursing staff in public hospitals are highly proficient, but public healthcare in the Philippines faces some limitations.
Despite having achieved universal healthcare, the Philippines still struggles with unequal access to medical care.
As such, the standard of public healthcare in the Philippines generally varies from excellent in urban centres to poor in rural areas.
Public healthcare also faces strain from treating the large number of Filipinos who rely on it.
There is also a trend of Filipino medical staff migrating to Western countries, which has resulted in understaffing in some hospitals and delays in treatment.
Public healthcare in the Philippines is administered by PhilHealth, a government-owned corporation.
PhilHealth subsidises a variety of treatments including inpatient care and non-emergency surgeries.
Both local citizens and legal residents are entitled to join a PhilHealth programme.
"""

# Paragraph summary
paragraph_summary = summarize_text(long_text, style="paragraph")
print("Paragraph Summary:\n")
print(paragraph_summary)

# Bullet summary
bullet_summary = summarize_text(long_text, style="bullet")
print("\nBullet Summary:\n")
print(bullet_summary)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Device set to use cpu
Your max_length is set to 150, but your input_length is only 89. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=44)
Your max_length is set to 150, but your input_length is only 80. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=40)
Your max_length is set to 150, but your input_length is only 89. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=44)


Paragraph Summary:

Despite achieving universal healthcare, the Philippines still struggles with unequal access to medical care. The standard of public healthcare in the Philippines generally varies from excellent in urban centres to poor in rural areas. Public healthcare also faces strain from treating the large number of Filipinos who rely on it. Public healthcare in the Philippines is administered by PhilHealth, a government-owned corporation. PhilHealth subsidises a variety of treatments including inpatient care and non-emergency surgeries. Both local citizens and legal residents are entitled to join a PhilHealth programme.


Your max_length is set to 150, but your input_length is only 80. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=40)



Bullet Summary:


- Despite achieving universal healthcare, the Philippines still struggles with unequal access to medical care
- The standard of public healthcare in the Philippines generally varies from excellent in urban centres to poor in rural areas
- Public healthcare also faces strain from treating the large number of Filipinos who rely on it
- Public healthcare in the Philippines is administered by PhilHealth, a government-owned corporation
- PhilHealth subsidises a variety of treatments including inpatient care and non-emergency surgeries
- Both local citizens and legal residents are entitled to join a PhilHealth programme..


**Brief Description**
This AI agent converts long texts into concise and readable summaries. It's able to produce bullet form summaries and paragraph style summaries ideal for summarizing materials such as articles, reports, and many more.

Features:
1. Uses huggingface transformer for a free, fast, and accurate model: facebook/bart-large-cnn (a pre-trained model in English Language, and fine-tuned on CNN Daily Mail).
2. Tokenizer will convert the texts into numbers that the AI model can understand.
3. Through the use of chunks we are able to split long texts/paragraphs into multiple short texts.
4. Doesn't need API key and is local, making it easy to integrate in any python supported applications.
5. Have 2 summary form: Bulleted and Paragraph.