In [0]:
%%capture
#### RUN THIS IN EVERY NEW COLAB SESSION
#### RUN IT if you change runtimes
#### shouldn't need to run after a kernel restart in the same session

from google.colab import drive
import sys
from pathlib import Path

drive.mount('/content/drive')
COLAB_NOTEBOOKS_DIR = Path("/content/drive/MyDrive/Colab Notebooks")

########## MODIFY THIS PATH TO AS NEEDED ##########
WORKING_DIR = COLAB_NOTEBOOKS_DIR / "Homework_12"
################################################### 
sys.path.append(str(WORKING_DIR))

# ✅ Now you can import from helpers.py in the your homework folder

# ✅ Install JupyterLab so the nbconvert lab template becomes available
%pip install -q jupyterlab jupyterlab_widgets
!jupyter nbconvert --to html --template lab --stdout --output dummy /dev/null || true

# ✅ Install the introdl course package
!wget -q https://github.com/DataScienceUWL/DS776/raw/main/Lessons/Course_Tools/introdl.zip
!unzip -q introdl.zip -d introdl_pkg
%pip install -q -e introdl_pkg --no-cache-dir

src_path = Path("introdl_pkg/src").resolve()
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

# Reload the introdl package (no kernel restart needed)
import importlib
try:
    import introdl
    importlib.reload(introdl)
except ImportError:
    import introdl

In [0]:
#### Run this cell later when you want to export your notebook to HTML
# see post @420 in Piazza for how to do this in CoCalc

from introdl.utils import convert_nb_to_html
my_html_file = (WORKING_DIR / "Homework_12_MY_NAME.html").resolve()  # change file name as needed
my_notebooks_dir = (WORKING_DIR / "Homework_12_Colab_Version.ipynb").resolve() # must include name of this notebook
convert_nb_to_html(output_filename = my_html_file, notebook_path = my_notebooks_dir)

# Homework 12 - Text Summarization

We're going to work with conversational data in this homework.  The `SAMsum` dataset consists of chat-like conversations and summaries like this:

Conversation-
```
Olivia: Who are you voting for in this election?
Oliver: Liberals as always.
Olivia: Me too!!
Oliver: Great
```

Summary-
```
Olivia and Olivier are voting for liberals in this election.
```

Applications for this kind of summarization include generating chat and meeting summaries.

Throughout this assignment you'll work with the first 100 conversations and summaries from the validation split of ["spencer/samsum_reformat"](https://huggingface.co/datasets/spencer/samsum_reformat) on Hugging Face.

## Task 1 - Build a zero-shot LLM conversation summarizer (10 points)

Use either an 8B local Llama model or an API-based model like `gemini-2.0-flash-lite` or better to build an `llm_summarizer` function that takes as input a list of conversations and returns a list of extracted summaries.  Your function should be constructed similarly to `llm_classifier` or `llm_ner_extractor` in Lessons 8 and 10, respectively.  

Put some effort into the prompt to make it good at generating succinct summaries of converations that identify both the topics and the people.

Your list of returned summaries should be cleanly extracted summaries with no additional text such as parts of the input prompt.

Give a qualitative evaluation of the first three generated summaries compared to the ground-truth summaries.

## Task 2 - Build a few-shot LLM conversation summarizer (6 points)

Follow the same instructions as in Task 1, but add a few examples from the training data.  Don't simply pick the first examples, rather take some care to choose diverse conversations and/or conversations that are difficult to summarize.

## Task 3 - Refine the llm_score function (10 points)

For this task you can use a local Llama model or an API-based model.  (I personally find the API-based models much easier to use.)

Start with the `llm_score` function from last week and refine the prompt to improve the scoring to better reflect similarities in semantic meaning between two texts.  Here are some guidelines that you should incorporate into your prompt:

- A score of **100** means the texts have **identical meaning**.
- A score of **80–99** means they are **strong paraphrases** or very similar in meaning.
- A score of **50–79** means they are **somewhat related**, but not expressing the same idea.
- A score of **1–49** means they are **barely or loosely related**.
- A score of **0** means **no semantic similarity**.
- Take into account word meaning, order, and structure.
- Synonyms count as matches.
- Do not reward scrambled words unless they convey the same meaning.
- Make the prompt few-shot by including several text pairs and the corresponding similarity scores.

Demonstrate your `llm_score` function by applying it to the 7 sentence pairs from the lesson.  Comment on the performance of the scoring.  Does it still get fooled by the sixth and seventh pairs like BERTScore did?


## Task 4 - Evaluate a Pre-trained Model and LLM_summarizer (10 points)

For this task you're going to qualitatively and quantitatively compare the generated summaries from:
1. The already fine-tuned Hugging Face model -   ['philschmid/flan-t5-base-samsum'](https://huggingface.co/philschmid/flan-t5-base-samsum)
2. The zero-shot or few shot LLM summarizer from above.

If, for some reason, you can't get the specified Hugging Face model to work, then find a different Hugging Face summarization model that has already been fine-tuned on SAMsum.

First, qualititavely compare the first three generated summaries from each approach to the ground-truth summaries.  Explain how the the two approaches seem to be working on the three examples.

Second, compute ROUGE scores, BERTScore, and llm_score for the first 100 examples in the validation set. 

What do these scores suggest about the performance of the two approaches?  Is one approach clearly better than the other?  Is llm_score working well as a metric?  Does it agree with the other metrics?

## Task 5 - Comparison and Reflection (4 points)

* Give a brief summary of what you learned in this assignment.

* What did you find most difficult to understand?

### Exporting to HTML

We've added to the course package a helper function to export your notebook to HTML.  This uses the preferred formatting and cleans outputs that sometimes cause errors with that format.

To use it, first update the course package (you'll need to do this on home and compute server if you want to use either)

In [0]:
#!pip install ~/Lessons/Course_Tools/introdl

Restart the kernel.

Add this code cell to your notebook and run it (I don't think it matters where you put in the notebook).  Modify the filename as desired.  This isn't particularly fast so you may need to wait 40 to 60 seconds the first time, and maybe 10-15 seconds thereafter.

In [5]:
from introdl.utils import convert_nb_to_html
convert_nb_to_html("HW12_Jeff_Bagggett.html")

[INFO] Using notebook: Homework_12_(UPDATED).ipynb
[INFO] Temporary copy created: /tmp/tmplmu0lu7h/Homework_12_(UPDATED).ipynb


[NbConvertApp] Converting notebook /tmp/tmplmu0lu7h/Homework_12_(UPDATED).ipynb to html


[NbConvertApp] Writing 282653 bytes to /home/user/Homework/Homework_12/HW12_Jeff_Bagggett.html


[SUCCESS] HTML export complete: /home/user/Homework/Homework_12/HW12_Jeff_Bagggett.html


Now your html file should be in the same directory as the notebook.