# Summarise Daily Snippets into Journal Entries

This notebook explores different models to perform summarisation of an array of daily snippets into a coherent journal entry.

In [59]:
from IPython.display import Markdown, HTML, display

snippets = [
    "The alarm blared, and I instantly wished for more sleep.",
    "Coffee was a burnt necessity, not a luxury.",
    "My email inbox was a chaotic, overflowing mess.",
    "The team meeting was a flurry of ideas, leaving my head spinning.",
    "Lunch was a rushed sandwich, barely a pause in the day.",
    "A brief park walk offered a moment of sunshine and bird song.",
    "Coding became an endless cycle of debugging frustration.",
    "My kid's call brought a hilarious school story, needing to be remembered.",
    "Dinner was simple pasta, a comfort against exhaustion.",
    "A bizarrely fluffy, cloud-like dog crossed my path.",
    "The grocery run included a movie chat with Sarah.",
    "Home brought aching feet and the promise of bed.",
    "My brain buzzed with floating pancake dreams and looming tasks.",
    "I had to remind myself to call Mom and pay that bill.",
    "The laundry pile loomed, a testament to my busy life.",
    "The project deadline pressed down, demanding organization.",
    "A sudden worry about the locked front door crept in.",
    "A late-night social media scroll proved a poor choice.",
    "A strange noise stirred my anxiety, house settling or something else?",
    "All that remained was the desperate plea for sleep.",
]

In [77]:
def display_markdown(checkpoint_name, summary):
    lines = summary.splitlines()
    quoted_lines = ["> " + line for line in lines]
    markdown_summary = "\n".join(quoted_lines)
    display(Markdown(f"**{checkpoint_name}:**\n{markdown_summary}\n"))

## T5

See [documentation](https://huggingface.co/docs/transformers/en/model_doc/t5). T5 models are great for text-to-text tasks with concise prompting. This session explores multiple local T5 models available via HuggingFace transformers.

In [22]:
t5_checkpoints = [
    "google-t5/t5-small",
    "google-t5/t5-base",
    "google-t5/t5-large",
    # "google-t5/t5-3b",
    # "google-t5/t5-11b",
]

## Using Default Parameters

In [23]:
from transformers import pipeline

def create_summarizer(checkpoint):
    return pipeline("summarization", model=checkpoint)

t5_summarizers = [create_summarizer(checkpoint) for checkpoint in t5_checkpoints]

Device set to use cuda:0
Device set to use cuda:0
Device set to use cuda:0


In [67]:
for summarizer in t5_summarizers:
    summary = summarizer("summarize: " + "\n".join(snippets))[0]["summary_text"]
    display_markdown(summarizer.model.name_or_path, summary)

**google-t5/t5-small:**
> my kid's call brought a hilarious school story, needing to be remembered . my brain buzzed with floating pancake dreams and looming tasks . a sudden worry about the locked front door crept in .


**google-t5/t5-base:**
> cnn's kelly wallace sat down for a few hours to get a good night's sleep . she wished for more sleep after a day of debugging and email . wallace says her brain buzzed with floating pancake dreams and looming tasks .


**google-t5/t5-large:**
> coffee was a burnt necessity, not a luxury, for sarah . her brain buzzed with floating pancake dreams and looming tasks . now she's sharing her tips on how to get more sleep .


In [68]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

for checkpoint in t5_checkpoints:
    tokenizer = T5Tokenizer.from_pretrained(checkpoint)
    model = T5ForConditionalGeneration.from_pretrained(checkpoint)
    
    input_ids = tokenizer("summarize: " + "\n".join(snippets), return_tensors="pt").input_ids
    outputs = model.generate(input_ids, min_length=50, max_length=350, num_beams=4, length_penalty=3, do_sample=False)

    summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
    display_markdown(checkpoint, summary)

**google-t5/t5-small:**
> my kid's call brought a hilarious school story, needing to be remembered. my brain buzzed with floating pancake dreams and looming tasks. a sudden worry about the locked front door crept in. a late-night social media scroll proved a poor choice.


**google-t5/t5-base:**
> cnn.com's ireport boot camp challenges ireporters to get more sleep . cnn.com's ireport boot camp challenges ireporters to get more sleep . cnn.com's ireport boot camp challenges ireporters to get more sleep . do you know a hero? nominations are open for 2013 cnn heroes . ireport.com: do you have a story to share? share it with c


**google-t5/t5-large:**
> coffee was a burnt necessity, not a luxury, for sarah . her brain buzzed with floating pancake dreams and looming tasks . sarah's new book, "sleep, sleep, sleep," is out now .


In [69]:
from transformers import T5Tokenizer, T5ForConditionalGeneration
from IPython.display import display, Markdown, clear_output
import ipywidgets as widgets

tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-large")
model = T5ForConditionalGeneration.from_pretrained("google-t5/t5-large")

min_length = widgets.IntSlider(value=50, min=50, max=500, description="Min Length:")
max_length = widgets.IntSlider(value=350, min=50, max=500, description="Max Length:")
num_beams = widgets.IntSlider(value=4, min=1, max=10, description="Beams:")
length_penalty = widgets.IntSlider(value=1, min=-3, max=3, description="Length Penalty:")
temperature = widgets.FloatSlider(value=1.0, min=0.1, max=2.0, step=0.1, description="Temp:")
top_k = widgets.IntSlider(value=50, min=1, max=200, description="Top K:")
top_p = widgets.FloatSlider(value=0.95, min=0.1, max=1.0, step=0.05, description="Top P:")

button = widgets.Button(description="Summarize")

output_area = widgets.Output()
spinner = widgets.HTML("<i class='fa fa-spinner fa-spin' style='font-size:24px; color:white; padding: 4px'></i>")
spinner.layout.display = 'none'

def summarize(b):
    with output_area:
        clear_output()
        spinner.layout.display = ''
        display(spinner)
        try:
            text_input = "\n".join(snippets)
            inputs = tokenizer.encode("summarize: " + text_input, return_tensors="pt", max_length=1024, truncation=True)
            summary_ids = model.generate(
                inputs,
                min_length=min_length.value,
                max_length=max_length.value,
                num_beams=num_beams.value,
                length_penalty=length_penalty.value,
                temperature=temperature.value,
                top_k=top_k.value,
                top_p=top_p.value,
                early_stopping=False
            )
            summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
            clear_output()
            display_markdown(model.name_or_path, summary)
        except Exception as e:
            clear_output()
            print(f"An error occured: {e}")
        finally:
            spinner.layout.display = 'none'

button.on_click(summarize)

widgets.VBox([min_length, max_length, num_beams, length_penalty, temperature, top_k, top_p, button, output_area])

VBox(children=(IntSlider(value=50, description='Min Length:', max=500, min=50), IntSlider(value=350, descripti…

## FLAN-T5

FLAN-T5 models are particularly well suited for prompting.

In [73]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

flan_t5_checkpoints = [
    "google/flan-t5-small",
    "google/flan-t5-base",
    "google/flan-t5-large",
]

for checkpoint in flan_t5_checkpoints:
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
    
    inputs = tokenizer(
        "Summarise these daily snippets into a coherent journal entry in a reflective and personal tone: " 
        + "\n".join(snippets), return_tensors="pt")
    outputs = model.generate(**inputs, min_length=50, max_length=350, num_beams=6, length_penalty=3)

    summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
    display_markdown(checkpoint, summary)

**google/flan-t5-small:**
> In a journal, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippets of daily snippets, snippet


**google/flan-t5-base:**
> The alarm blared, and I instantly wished for more sleep. Coffee was a burnt necessity, not a luxury. My email inbox was a chaotic, overflowing mess. Lunch was a rushed sandwich, barely a pause in the day. Coding became an endless cycle of debugging frustration. My kid's call brought a hilarious school story, needing to be remembered. Dinner was simple pasta, a comfort against exhaustion. A bizarrely fluffy, cloud-like dog crossed my path. The grocery run included a movie chat with Sarah. Home brought aching feet and the promise of bed. My brain buzzed with floating pancake dreams and looming tasks. The project deadline pressed down, demanding organization. A sudden worry about the locked front door crept in. A late-night social media scroll proved a poor choice. A strange noise stirred my anxiety, house settling or something else? All that remains was the desperate plea for sleep.


**google/flan-t5-large:**
> The alarm blared, and I instantly wished for more sleep. Coffee was a burnt necessity, not a luxury. My email inbox was a chaotic, overflowing mess. The team meeting was a flurry of ideas, leaving my head spinning. Lunch was a rushed sandwich, barely a pause in the day. A brief park walk offered a moment of sunshine and bird song. Coding became an endless cycle of debugging frustration. My kid's call brought a hilarious school story, needing to be remembered. Dinner was simple pasta, a comfort against exhaustion. A bizarrely fluffy, cloud-like dog crossed my path. The grocery run included a movie chat with Sarah. Home brought aching feet and the promise of bed. My brain buzzed with floating pancake dreams and looming tasks. I had to remind myself to call Mom and pay that bill. The laundry pile loomed, a testament to my busy life. The project deadline pressed down, demanding organization. A sudden worry about the locked front door crept in. A late-night social media scroll proved a poor choice. A strange noise stirred my anxiety, house settling or something else? All that remained was the desperate plea for sleep.


## BART fine-tuned on CNN

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
print(summarizer("\n".join(snippets), min_length=50, max_length=350, do_sample=False))

## Pegasus

In [None]:
from transformers import AutoTokenizer, PegasusForConditionalGeneration

pegasus_model = PegasusForConditionalGeneration.from_pretrained("google/pegasus-xsum")
pegasus_tokenizer = AutoTokenizer.from_pretrained("google/pegasus-xsum")
inputs = tokenizer("\n".join(snippets), max_length=1024, return_tensors="pt")
summary_ids = model.generate(inputs["input_ids"], min_length=50, max_length=350)
tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

## Gemini API

In [78]:
from kaggle_secrets import UserSecretsClient
from google import genai

user_secrets = UserSecretsClient()
gemini_apiKey = user_secrets.get_secret("GEMINI_API_KEY")

In [79]:
try: 
    google_client = genai.Client(api_key=gemini_apiKey)
    response = google_client.models.generate_content(
        model="gemini-2.0-flash",
        contents="Summarise these daily snippets into a coherent journal entry in a reflective and personal tone in under 250 words: " + "\n".join(snippets)
    )
    display_markdown("GEMINI API", response.text)
except genai.APIError as e:
    print(f"Gemini API Error: {e}")
    # TODO: use local model

**GEMINI API:**
> Today felt like running on a treadmill that kept speeding up. The alarm was a cruel joke, coffee a desperate act, and the workday a blur of emails and brainstorming that left me utterly drained. That brief walk in the park was the only genuine breath of fresh air. The relentless debugging wore me down, and I almost forgot the joy in my kid’s hilarious school story – I need to hold onto those moments.
> 
> Evenings are a fragile truce between comfort and obligation. Simple pasta eased the edge, but then came the grocery run, deadlines looming, and anxieties bubbling up. That fluffy dog, Sarah's impromptu movie review, a fleeting hope in the mess. Now, lying in bed, my mind is still a ping-pong table of pancake dreams, to-dos, and that unsettling house noise. All I want is sleep, a sweet release from the relentless hum of it all. I swear, tomorrow I need to find a way to slow down.
