# Exploring Multi-Platform Text Summarization

project that demonstrates how modern NLP models can be applied to the task of text summarization using three different approaches:

####-Hugging Face Transformers → leverages open-source pre-trained models (like BART, T5, or Pegasus) for local summarization pipelines.

####-Ollama → runs powerful open-source LLMs locally through a simple API, ensuring privacy and flexibility in model choice.

####-Google Gemini API → uses Google’s state-of-the-art cloud-based generative AI models for high-quality, long-context summarization.

By comparing these three platforms, the project highlights their strengths, limitations, and best-fit scenarios. It serves as both a learning exercise in NLP model deployment and a practical demonstration of how summarization can be adapted to different infrastructures—open-source, local-first, and cloud-native.

## 1) Hugging Face

Hugging Face (Transformers): you load a pre-trained seq2seq model (e.g., BART/T5) and call a ready-made pipeline("summarization"). It runs locally (CPU/GPU) and is great for a reproducible baseline. For long articles, you usually chunk the text and stitch the partial summaries.

In [1]:
!pip -q install -U "transformers>=4.43" "accelerate" "sentencepiece"
from transformers import pipeline


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.1/40.1 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.6/11.6 MB[0m [31m72.4 MB/s[0m eta [36m0:00:00[0m
[?25h

###Loading a summarization pipeline

BART is a strong default for news

In [2]:
# general text; switch to 'google/pegasus-xsum' for very short abstracts.
summarizer = pipeline(
    task="summarization",
    model="facebook/bart-large-cnn",
)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


###Summarize short text

In [3]:
text = """Natural language processing (NLP) enables computers to understand text and speech..."""
print(summarizer(text, max_length=140, min_length=60, do_sample=False)[0]["summary_text"])


Your max_length is set to 140, but your input_length is only 17. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=8)


Natural language processing (NLP) enables computers to understand text and speech. NLP can be used to improve speech and language comprehension. For more information on NLP, visit www.nLP.org.uk. For information on how to use NLP in your own language, visit the NLP website. For details on how NLP works in the U.S., visit NLP.gov.


### Handling long documents (chunk → summarize → stitch)

In [4]:
from transformers import AutoTokenizer
import math


In [5]:
tok = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")

In [6]:
def chunk_by_tokens(s, tokenizer, max_tokens=900):
    ids = tokenizer(s, return_tensors=None, truncation=False)["input_ids"]
    for i in range(0, len(ids), max_tokens):
        yield tokenizer.decode(ids[i:i+max_tokens], skip_special_tokens=True)

In [8]:
def summarize_long(s, chunk_tokens=900):
    chunks = list(chunk_by_tokens(s, tok, chunk_tokens))
    partial = [summarizer(c, max_length=160, min_length=60, do_sample=False)[0]["summary_text"]
               for c in chunks]
    # (optional) second-pass to compress partials
    joined = " ".join(partial)
    final = summarizer(joined, max_length=160, min_length=80, do_sample=False)[0]["summary_text"]
    return final


testing it

In [9]:
#can be replaced with any long article or document
sample_text = """
Natural Language Processing (NLP) is a field of Artificial Intelligence (AI)
that helps computers understand and work with human language.
It is used in applications like chatbots, translation, summarization,
and sentiment analysis. NLP combines linguistics with machine learning models
to process large amounts of text data effectively.
"""


final_summary = summarize_long(sample_text)
print("---- SUMMARY ----")
print(final_summary)


Your max_length is set to 160, but your input_length is only 75. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=37)
Your max_length is set to 160, but your input_length is only 64. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=32)


---- SUMMARY ----
Natural Language Processing (NLP) is a field of Artificial Intelligence (AI) that helps computers understand and work with human language. NLP combines linguistics with machine learning models to process large amounts of text data effectively. It is used in applications like chatbots, translation, summarization, and sentiment analysis. It can also be used to create chatbots and translate text into other languages.


For long documents

In [11]:
with open("my_long_doc.txt", "r", encoding="utf-8") as f:
    text = f.read()

summary = summarize_long(text)
print("---- SUMMARY ----")
print(summary)


Your max_length is set to 160, but your input_length is only 63. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=31)


---- SUMMARY ----
Artificial Intelligence (AI) has become one of the most transformativetechnologies of the 21st century. It refers to the simulation of human intelligence in machines that are programmed to think, reason, and learn. AI systems are capable of performing tasks such as speech recognition, decision-making, and language translation. They can also be used to make decisions about what to do in the future.


## 2) Ollama

Ollama: a tiny local server that runs open models (Llama, Mistral, Gemma, Qwen, etc.). You send an HTTP request (e.g., POST /api/generate) with a “summarize …” prompt, and it streams back text. Privacy-friendly (fully local), but quality and speed depend on the model you pull.

### Install Ollama

In [17]:
# Install Ollama (Linux)
!curl -fsSL https://ollama.com/install.sh | sh


>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


###Starting the server (and keeping it running in background)

In [19]:
# Start the daemon in the background
!nohup ollama serve > /dev/null 2>&1 &

# Give it a few seconds, then verify the HTTP API is alive
!sleep 5
!curl -s http://localhost:11434/api/tags || echo "Ollama API not responding yet"


{"models":[]}

### Pull a small-ish model (Colab free tier can’t handle huge ones)

In [20]:
# Examples: choose ONE that fits memory. Smaller = more likely to run on free Colab.
!ollama pull llama3.2:3b   # tiny & fast
# ollama pull mistral:7b
# ollama pull gemma2:2b


[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?2

Calling the API from Python

In [21]:
import requests

OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "llama3.2:3b"  # use the model you pulled above

def ollama_summarize(text, sentences=3):
    prompt = (
        f"Summarize the following text in {sentences} concise sentences. "
        f"Be factual and avoid repetition.\n\nTEXT:\n{text}\n\nSUMMARY:"
    )
    r = requests.post(
        OLLAMA_URL,
        json={"model": MODEL, "prompt": prompt, "stream": False,
              "options": {"temperature": 0.2, "num_predict": 256}},
        timeout=120
    )
    r.raise_for_status()
    return r.json().get("response", "").strip()

print(ollama_summarize("NLP lets computers process human language..."))


Here is a summary of the provided text in 3 concise sentences:

Natural Language Processing (NLP) enables computers to analyze and understand human language. This technology allows computers to process and interpret human communication, such as speech, text, or gestures. By leveraging NLP, computers can perform tasks that would be challenging for them to accomplish on their own, such as translation, sentiment analysis, and text summarization.


### Long documents with Ollama (same chunking idea)

In [22]:
def ollama_summarize_long(text, sentences=5, chunk_chars=4000):
    # simple char-based chunks (you can switch to token-based later)
    chunks = [text[i:i+chunk_chars] for i in range(0, len(text), chunk_chars)]
    partials = [ollama_summarize(c, sentences=4) for c in chunks]
    return ollama_summarize(" ".join(partials), sentences=sentences)


### Test the summarizer

In [23]:
print(ollama_summarize("NLP lets computers process human language..."))


Here is a summary of the provided text in 3 concise sentences:

Natural Language Processing (NLP) enables computers to understand, interpret, and generate human language. This technology allows computers to process and analyze vast amounts of human-generated data, such as text or speech. By leveraging NLP, computers can perform tasks that would be challenging for humans alone, such as language translation, sentiment analysis, and chatbots.


## Test on long text

In [24]:
with open("my_long_doc.txt", "r", encoding="utf-8") as f:
    text = f.read()

print(ollama_summarize_long(text, sentences=5))

Here is a 5-sentence summary of the text:

Artificial Intelligence (AI) simulates human intelligence in machines that can think, reason, and learn. AI has various applications, including Natural Language Processing and computer vision. However, its development raises ethical concerns such as privacy, bias, job displacement, and decision accountability. To mitigate these risks, it is essential to prioritize ethical principles and regulations in AI development. Effective governance will help maximize the benefits of AI while minimizing its negative consequences.


## 3) Gemini API

Gemini API: Google’s hosted models via the new Google GenAI SDK (google-genai). You create a client, call generate_content with a summarization instruction, and get high-quality results with very long context support compared to most local models. Requires an API key.

### Install and set  key

In [25]:
!pip -q install -U google-genai
import os
os.environ["GEMINI_API_KEY"] = "<AIzaSyBCLf1hG_7v6yot7-uyBsknZUnorkj4Y2M>"


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m245.6/245.6 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [30]:
import os
os.environ["GEMINI_API_KEY"] = "AIzaSyCtADGV2NmehYI6ip2zPOxek3YdRCo2nhw"


In [31]:
from google import genai
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])


### Summarize with the SDK

In [32]:
from google import genai

client = genai.Client()  # reads GEMINI_API_KEY from env

def gemini_summarize(text, style="bullet points", length="5-7 bullets"):
    prompt = (
        f"Summarize the following text as {length} in {style}. "
        f"Be faithful to the source and avoid hallucinations.\n\n{text}"
    )
    resp = client.models.generate_content(
        model="gemini-2.5-flash",    # fast/affordable; try "gemini-2.5-pro" for higher quality
        contents=prompt,
        config={"temperature": 0.2}
    )
    return resp.text.strip()

print(gemini_summarize("NLP lets computers process human language..."))


Here's a summary of the text in 5-7 bullet points:

*   NLP enables computers to process human language, facilitating tasks like translation and sentiment analysis.
*   It is an interdisciplinary field, blending AI, linguistics, and computer science.
*   Key techniques utilized include tokenization, parsing, and machine learning algorithms.
*   Challenges in NLP involve addressing ambiguity, understanding context, and overcoming data scarcity.
*   NLP is transforming various industries, from healthcare to customer service.
*   Advancements, such as large language models, are continuously pushing its capabilities further.


###For Long Documents

In [33]:
with open("my_long_doc.txt", "r", encoding="utf-8") as f:
    text = f.read()

print(gemini_summarize(text))


Here is a summary of the text in 5 bullet points:

*   Artificial Intelligence (AI) simulates human intelligence in machines, allowing them to think, reason, and learn, performing tasks like speech recognition and decision-making.
*   Natural Language Processing (NLP) is a key AI subfield that enables machines to understand and generate human language, used in chatbots, virtual assistants, and translation tools.
*   Computer vision is another major AI area, allowing machines to interpret visual information for applications such as facial recognition, autonomous driving, and medical image analysis.
*   AI raises significant ethical concerns, including issues of privacy, bias in algorithms, potential job displacement, and decision accountability.
*   Overall, AI is a transformative technology with diverse and rapidly expanding applications that are reshaping industries and impacting daily life.


#####Project Summary

We built a Text Summarization System using three different methods:

-Hugging Face – local/hosted transformer models (flan-t5, bart-cnn).

-Ollama – local inference engine for open-source LLaMA/Gemma/Mistral models.

-Gemini API – Google’s cloud-hosted Gemini models (e.g., gemini-2.5-flash).

#####🔧 Approach

-Hugging Face

Load summarization models from transformers.

Handle long documents with chunking → summarize each chunk → stitch into final summary.

-Ollama

Run an Ollama server (ollama serve).

Use Python requests to call the Ollama API (/api/generate).

Summarize short or long texts (with simple chunking for long docs).

-Gemini API

Get an API key from Google AI Studio.

Use google-genai SDK to connect.

Generate summaries with different styles (bullets, paragraphs, short/long).