In large-language-model (LLM) land, **summarization** is the task of transforming a longer chunk of content—anything from a tweet thread to a 300-page legal brief—into a **shorter, more digestible form that preserves the key information and intent of the source**.

---

### Why we care

| Use-case                  | What a summary unlocks                                    |
| ------------------------- | --------------------------------------------------------- |
| **News & research**       | Rapid catch-up on breaking stories or new papers.         |
| **Enterprise docs**       | Meeting minutes, support tickets, compliance reports.     |
| **Personal productivity** | “Too-long-didn’t-read” for emails, articles, group chats. |
| **Code reviews**          | Explain what a pull-request changes in plain English.     |

---

### How LLMs perform summarization under the hood

1. **Encoding the source**
   The model tokenizes and embeds the input, capturing semantic relations in its hidden states.

2. **Decoding with length control**
   The decoder (or next-token predictor) is guided—via prompt instructions or tuned loss—to produce far fewer tokens than it consumed.

3. **Attention filtering**
   Self-attention helps the model decide which spans of the source warrant focus, enabling it to drop peripheral details.

4. **Compression strategies**

   * *Extractive*: literally copy the most salient sentences/phrases.
   * *Abstractive*: paraphrase and synthesize new sentences never seen verbatim in the source (the dominant method in modern LLMs).

---

### Prompt patterns that steer summarization

| Pattern                | Example                                                |
| ---------------------- | ------------------------------------------------------ |
| **TL;DR**              | “TL;DR: …”                                             |
| **Bullets**            | “Summarize the main points as bullet points.”          |
| **Headline & tagline** | “Give me a newspaper headline and one-sentence blurb.” |
| **Role-specific**      | “Explain this paper so a high-schooler gets it.”       |
| **Length hints**       | “≤ 100 words” or “in exactly three sentences.”         |

---


### Bottom line

Summarization with LLMs is *intelligent compression*: selectively re-expressing the essence of a text while striving to remain faithful. Abstractive LLMs outshine classic extractive methods in fluidity and adaptability, but they bring new challenges—chiefly factual drift—that must be countered with careful prompting, retrieval aids, and post-generation verification.


In [1]:
import os
import wikipedia
import tiktoken
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from langchain.schema import SystemMessage, HumanMessage
from IPython.display import display, Markdown
from pathlib import Path
from dotenv import load_dotenv

from deepeval import evaluate
from deepeval.metrics import SummarizationMetric
from deepeval.test_case import LLMTestCase

In [2]:
dotenv_path = Path("../../.env")
load_dotenv(dotenv_path=dotenv_path)

True

In [3]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
model_eval = ChatOpenAI(model="gpt-4o-mini")

In [4]:
enc = tiktoken.get_encoding("o200k_base")

## Recovering the context (from wikipedia)

In [5]:
wikipedia.set_lang("en")
page = wikipedia.page("Albert Einstein")
context = page.content
display(Markdown(f"**Context:**"))
display(Markdown(f"----"))
display(Markdown(f"{context[:500]}...{context[-500:]}"))
display(Markdown(f"----"))
display(Markdown(f"Estimated number of tokens: {len(enc.encode(context))}"))

**Context:**

----

Albert Einstein (14 March 1879 – 18 April 1955) was a German-born theoretical physicist who is best known for developing the theory of relativity. Einstein also made important contributions to quantum mechanics. His mass–energy equivalence formula E = mc2, which arises from special relativity, has been called "the world's most famous equation". He received the 1921 Nobel Prize in Physics for his services to theoretical physics, and especially for his discovery of the law of the photoelectric eff...in
Finding aid to Albert Einstein Collection from Center for Jewish History


==== Digital collections ====
The Digital Einstein Papers — An open-access site for The Collected Papers of Albert Einstein, from Princeton University
Albert Einstein Digital Collection from Vassar College Digital Collections
Newspaper clippings about Albert Einstein in the 20th Century Press Archives of the ZBW
Albert – The Digital Repository of the IAS, which contains many digitized original documents and photographs

----

Estimated number of tokens: 17617

## Question

In [6]:
chat = ChatOllama(model="llama3.2")

prompt = f"""
Based on the following context:

{context}

Summarize the life and work of Albert Einstein, emphasizing the following points:
- Where he was born and in what year
- Who his parents were
- Where he studied and worked
- What were his main achievements
"""

response = chat.invoke(prompt)
answer = response.content

prompt_aux = f"""
{prompt[:500]}

...

{prompt[-500:]}
"""

display(Markdown(f"**Prompt:**"))
display(Markdown(f"----"))
display(Markdown(prompt_aux))
display(Markdown(f"----"))

display(Markdown(f"**Answer:**"))
display(Markdown(f"----"))
display(Markdown(f"{answer}"))
display(Markdown(f"----"))

**Prompt:**

----



Based on the following context:

Albert Einstein (14 March 1879 – 18 April 1955) was a German-born theoretical physicist who is best known for developing the theory of relativity. Einstein also made important contributions to quantum mechanics. His mass–energy equivalence formula E = mc2, which arises from special relativity, has been called "the world's most famous equation". He received the 1921 Nobel Prize in Physics for his services to theoretical physics, and especially for his discovery o

...

Princeton University
Albert Einstein Digital Collection from Vassar College Digital Collections
Newspaper clippings about Albert Einstein in the 20th Century Press Archives of the ZBW
Albert – The Digital Repository of the IAS, which contains many digitized original documents and photographs

Summarize the life and work of Albert Einstein, emphasizing the following points:
- Where he was born and in what year
- Who his parents were
- Where he studied and worked
- What were his main achievements



----

**Answer:**

----

Albert Einstein (1879-1955) was a renowned German-born physicist who is widely regarded as one of the most influential scientists of the 20th century.

**Early Life**

Einstein was born on March 14, 1879, in Ulm, Kingdom of Württemberg, German Empire, to Hermann and Pauline Einstein. His parents were both Jewish, and his father was an engineer and salesman. Einstein's early life was marked by a curious and imaginative childhood, which laid the foundation for his future scientific endeavors.

**Education and Career**

Einstein studied physics at the Swiss Federal Polytechnic University in Zurich, where he graduated in 1900 with a degree in physics. He then worked as a patent clerk in Bern, Switzerland, where he evaluated patents related to electrical and mechanical inventions. During this period, Einstein developed his theory of special relativity, which posits that the laws of physics are the same for all observers in uniform motion relative to one another.

In 1905, a "miracle year" in which Einstein published four groundbreaking papers, including his famous equation E=mc². This paper introduced the concept of mass-energy equivalence and revolutionized our understanding of space, time, and matter.

**Main Achievements**

Einstein's main achievements include:

1. **Theory of Special Relativity (1905)**: Introduced the concept of time dilation, length contraction, and relativity of simultaneity.
2. **Theory of General Relativity (1915)**: Described gravity as the curvature of spacetime caused by massive objects. This theory predicted phenomena such as black holes and gravitational waves.
3. **Mass-Energy Equivalence (E=mc²)**: Demonstrated that mass and energy are interchangeable, with equal amounts of each being equivalent to a certain amount of energy.
4. **Photons and the Photoelectric Effect**: Showed that light is composed of particles called photons, which have both wave-like and particle-like properties.

**Legacy**

Einstein's work had a profound impact on our understanding of the universe, from the smallest subatomic particles to the vast expanse of cosmological scales. His theories of relativity revolutionized physics, astronomy, and engineering, influencing fields such as quantum mechanics, astrophysics, and nuclear physics.

Throughout his life, Einstein was known for his passion for peace, civil rights, and social justice. He was a vocal advocate for these causes, often using his fame to raise awareness about important issues.

**Personal Life**

Einstein married Mileva Marić in 1903, with whom he had two sons, Hans Albert and Eduard. The couple divorced in 1919. Einstein then married Elsa Löwenthal in 1919, who remained his wife until her death in 1936. He was known for his love of music, particularly violin, and enjoyed playing the violin to relax.

Einstein's legacy extends far beyond his scientific contributions, inspiring generations of scientists, thinkers, and humanitarians. His commitment to peace, social justice, and human rights continues to inspire people around the world.

----

## Summarization evaluation

In [7]:
metric = SummarizationMetric(
    model="gpt-4o-mini",
    assessment_questions = [
        "Does the answer indicate where he was born?",
        "Does the answer indicate when he was born?",
        "Does the answer indicate who his parents were?",
        "Does the answer indicate where he studied?",
        "Does the answer indicate where he worked?",
        "Does the answer indicate what his greatest achievements were?"
    ],    
    include_reason=True)

test_case = LLMTestCase(
    input=context,
    actual_output=answer
)

metric.measure(test_case)

display(Markdown(f"**Eval:**"))
display(Markdown(f"----"))
display(Markdown(f"**Grade**: {metric.score}"))
display(Markdown(f"----"))
display(Markdown(f"**Reason**: {metric.reason}"))
display(Markdown(f"----"))

**Eval:**

----

**Grade**: 0.8333333333333334

----

**Reason**: The score is 0.83 because the summary provides additional context about Einstein's family background and personal interests, which enhances understanding, but it lacks some details from the original text that could have been included for a more comprehensive summary.

----