# Clinical Note Generation: From SFT to RL Fine-tuning with NeMo Microservices

<div style="background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%); padding: 25px; border-radius: 10px; margin-bottom: 20px;">
<h2 style="color: #00d4ff; margin-top: 0;">Training LLMs to Match a Specific Clinical Note Format & Style</h2>
<p style="color: #e0e0e0; font-size: 14px;">
This notebook demonstrates how to fine-tune LLMs to generate clinical notes in a <strong>specific format and style</strong> using two complementary techniques:
</p>
<ul style="color: #e0e0e0; font-size: 14px;">
<li><strong>SFT (Supervised Fine-Tuning)</strong> — Teaches the target FORMAT (structure, sections, information extraction)</li>
<li><strong>DPO (Direct Preference Optimization)</strong> — Refines the STYLE (phrasing, verbosity)</li>
</ul>
</div>

### The Problem We're Solving

**Clinical notes need to follow specific formats and styles:**
- Different healthcare systems have different documentation standards
- Physicians spend time reformatting notes to match institutional templates
- Consistency in note format improves downstream workflows (billing, handoffs, EHR integration)

**Our Goal:** Finetune an LLM to generate clinical notes that match a **specific format and writing style**.

<div style="background: #1a1a2e; padding: 15px; border-radius: 8px; border-left: 4px solid #00d4ff; margin: 10px 0;">
<strong style="color: #00d4ff;">Key Distinction:</strong> <span style="color: #e0e0e0;">This is a <strong>format/style transformation task</strong>, not a clinical reasoning task. The medical content is already in the conversation. We are teaching the model how to present it in a specific format.</span>
</div>


---

### The Dataset: Augmented Clinical Notes

We use the **[AGBonnet/augmented-clinical-notes](https://huggingface.co/datasets/AGBonnet/augmented-clinical-notes)** dataset from Hugging Face.

### What's in the Dataset?

| Field | Description | Example |
|-------|-------------|---------|
| `conversation` | Doctor-patient dialogue transcript | "Doctor: What brings you in today? Patient: I have chest pain..." |
| `note` | Corresponding clinical note | "HISTORY OF PRESENT ILLNESS: 45-year-old male presents with..." |

### Why This Dataset?

1. **Consistent target format** — Notes follow a specific documentation style we want the model to learn
2. **Paired data** — Each conversation has a corresponding note in our target format
3. **Format examples** — Shows the model exactly how to structure and present clinical information  
4. **Sufficient size** — 30K+ examples enable the model to learn the format patterns

<div style="background: #16213e; padding: 12px; border-radius: 6px; margin: 10px 0;">
<strong style="color: #ffd700;"> The dataset teaches FORMAT, not medicine:</strong> <span style="color: #e0e0e0;">We assume the model already knows medical terminology from pre-training. We're teaching it our specific way of organizing and presenting that information.</span>
</div>

### Dataset Splits

| Split | Size | Purpose |
|-------|------|---------|
| Training | 1,000 | Model learns patterns |
| Validation | 100 | Hyperparameter tuning |
| Test | 100 | Final evaluation |

---

## What We'll Cover

| Stage | Approach | What It Does |
|-------|----------|--------------|
| **Part 1** | Infrastructure Setup | Deploy NeMo Microservices + LLaMA 3.1 8B |
| **Part 2** | Baseline Evaluation | See how base model handles the task (spoiler: wrong format) |
| **Part 3** | SFT with LoRA | **Teach FORMAT** — structure, sections, info extraction |
| **Part 4** | Post-SFT Evaluation | Verify format is correct |
| **Part 5** | DPO Alignment | **Refine STYLE** — phrasing, verbosity, conventions |
| **Part 6** | Final Evaluation | Verify style matches preferences |


## Prerequisites
### Virtual Env
First create a virtual env in the terminal and select it as the kernel for this notebook.
```sh
python3 -m venv .myvenv
source .myvenv/bin/activate
pip3 install ipykernel dotenv
python3 -m ipykernel install --user --name=dli_venv --display-name "DLI Venv"

```

### API Keys Setup

This notebook requires two API keys to access the necessary resources:

1. **NVIDIA NGC API Key**  
   - Used for pulling models and resources from NVIDIA's NGC registry.
   - Obtain your NGC API key by logging in at [NGC NVIDIA](https://ngc.nvidia.com/), navigating to your profile (top right), and clicking "Setup" → "Get API Key".
   - [NGC API Key Documentation](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key)

2. **Hugging Face API Token**  
   - Needed for accessing models and datasets from Hugging Face Hub.
   - Create or log in to your account at [Hugging Face](https://huggingface.co/).
   - Go to your Settings → "Access Tokens" and generate a token with "read" access.
   - [HF Token Documentation](https://huggingface.co/docs/hub/security-tokens)

#### How to Provide Your Keys

Run the cell below to automatically create a `.env` file from the `.env.example` template. Then edit the `.env` file and replace the placeholder values with your actual API keys.

In [None]:
from pathlib import Path
import shutil

# Create .env from example if needed
if not Path('.env').exists():
    shutil.copy('.env.example', '.env')
    print("✓ .env created from example. Please update it with your real API keys.")
else:
    print("✓ .env file already exists")


In [None]:
import os
from dotenv import load_dotenv

load_dotenv()

NGC_API_KEY = os.getenv("NGC_API_KEY")
HF_TOKEN = os.getenv("HF_TOKEN")

## Setup Steps

### Part 1: Set Docker Storage Location

In [None]:
# find the partition that has enough free space
!df -h

If the partition mounted on `/` has enough storage space, there's no need to change the Docker storage location, we can skip to Part 2: Setup Software Packages now.

Otherwise, proceed to change the Docker storage location.

In [None]:
# assume we found it to be mounted to /ephemeral and not /
!mkdir -p /ephemeral/docker

Edit `/etc/docker/daemon.json` and add the following line:
```json
"data-root": "/ephemeral/docker"
```

In [None]:
import json
import subprocess

new_docker_storage_location = '/ephemeral/docker'

# Read existing docker config or start fresh
try:
    result = subprocess.run(['sudo', 'cat', '/etc/docker/daemon.json'], 
                          capture_output=True, text=True)
    config = json.loads(result.stdout) if result.stdout.strip() else {}
except:
    config = {}
print("config: ", config)
# Add/update the data-root setting
config['data-root'] = new_docker_storage_location
# Write back to json format
config_json = json.dumps(config, indent=2)
# update and view new content in /etc/docker/daemon.json
ret = subprocess.run(f'echo \'{config_json}\' | sudo tee /etc/docker/daemon.json', shell=True)

In [None]:
# Then reload docker
!sudo systemctl daemon-reload
!sudo systemctl restart docker

### Part 1.5: Fix UFW Firewall for Docker/Minikube Networking

If UFW (Uncomplicated Firewall) is active, its default FORWARD policy may block Docker container traffic, preventing minikube from reaching external registries like `registry.k8s.io`. This will cause `ImagePullBackOff` errors when enabling addons like ingress.

The fix below:
1. Changes UFW's default forward policy to ACCEPT (persistent)
2. Adds an iptables rule to allow Docker container traffic
3. Makes the iptables rule persistent across reboots

**Run this cell before running the deployment script.**


In [None]:
%%bash
# Fix UFW firewall to allow Docker container traffic (only runs if UFW is active)
if sudo ufw status | grep -q "Status: active"; then
    echo "UFW is active, configuring forward policy..."
    sudo sed -i 's/DEFAULT_FORWARD_POLICY="DROP"/DEFAULT_FORWARD_POLICY="ACCEPT"/' /etc/default/ufw
    sudo ufw reload
    echo "✓ UFW forward policy set to ACCEPT"
else
    echo "UFW is not active, skipping UFW config"
fi

# Add iptables rule to allow Docker container traffic (required for minikube to reach external registries)
sudo iptables -C DOCKER-USER -j ACCEPT 2>/dev/null || sudo iptables -I DOCKER-USER -j ACCEPT
echo "✓ iptables DOCKER-USER rule configured"

# Make the iptables rule persistent by creating a script that runs after Docker starts
cat << 'EOF' | sudo tee /etc/docker/docker-post-start.sh > /dev/null
#!/bin/bash
# Allow Docker container traffic to reach external networks
iptables -C DOCKER-USER -j ACCEPT 2>/dev/null || iptables -I DOCKER-USER -j ACCEPT
EOF
sudo chmod +x /etc/docker/docker-post-start.sh

# Create systemd override to run the script after Docker starts
sudo mkdir -p /etc/systemd/system/docker.service.d
cat << 'EOF' | sudo tee /etc/systemd/system/docker.service.d/iptables-fix.conf > /dev/null
[Service]
ExecStartPost=/etc/docker/docker-post-start.sh
EOF
sudo systemctl daemon-reload
echo "✓ iptables rule made persistent via systemd"


### Part 2: Setup Software Packages

We will be installing packages for the following software requirements for using minikube.

- NGC account. If you don’t have an NGC account, you can create one on the [NGC Sign In page](https://ngc.nvidia.com/signin).

- Docker 27 or later.

- [Minikube](https://minikube.sigs.k8s.io/docs/start/) version 1.33 or later.

- NVIDIA Container Toolkit v1.16.2 or higher. Refer to the [Installing the NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).

- NVIDIA GPU Driver 560.35.03 or higher. Refer to [Driver Installation Guide](https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/index.html).

- Kubernetes CLI, kubectl. Refer to [Install and Set Up kubectl on Linux](https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/) in the Kubernetes documentation.

- Helm CLI, helm. Refer to the [Helm documentation](https://helm.sh/docs/intro/install).

- Hugging Face CLI. You need this for uploading dataset files through NeMo Data Store. Refer to the [Hugging Face Hub CLI user guide](https://huggingface.co/docs/huggingface_hub/en/guides/cli) and the [Hugging Face Hub installation guide](https://huggingface.co/docs/huggingface_hub/en/installation).

- The jq command line tool. Refer to the [Download jq](https://jqlang.org/download/) page for instructions.


Reference: https://docs.nvidia.com/nemo/microservices/latest/get-started/setup/requirements.html#nemo-ms-get-started-requirements

#### Install minikube

In [None]:
!curl -LO https://github.com/kubernetes/minikube/releases/latest/download/minikube-linux-amd64

In [None]:
!sudo install minikube-linux-amd64 /usr/local/bin/minikube && rm minikube-linux-amd64

#### Install kubectl

In [None]:
!curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"

In [None]:
!curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl.sha256"

In [None]:
!echo "$(cat kubectl.sha256)  kubectl" | sha256sum --check

You should see a checksum OK message.

In [None]:
!sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
# check the kubectl version
!kubectl version --client

#### Install helm

In [None]:
!sudo apt-get install curl gpg apt-transport-https --yes
!curl -fsSL https://packages.buildkite.com/helm-linux/helm-debian/gpgkey | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
!echo "deb [signed-by=/usr/share/keyrings/helm.gpg] https://packages.buildkite.com/helm-linux/helm-debian/any/ any main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
!sudo apt-get update && sudo apt-get install helm

#### Install Hugging Face CLI

In [None]:
!curl -LsSf https://hf.co/cli/install.sh | bash

Add the hf cli installation path to $PATH. For example, if the above installation command showed: `[INFO] CLI location: /home/shadeform/.local/bin/hf`, then you will want to add `/home/shadeform/.local/bin` to `$PATH`.

In [None]:
import os
import sys

# Define the new paths to add
new_path_dir = "/home/shadeform/.local/bin"
python3_venv_path = "/home/shadeform/.myvenv/bin/python3"

paths_to_remove = [
    "/home/shadeform/.venv/bin",
    "/usr/lib/python310.zip",
    "/usr/lib/python3.10",
    "/usr/lib/python3.10/lib-dynload",
    "/home/shadeform/.env/bin",
    "/home/shadeform/.env/bin/python3",
]

# Get the current PATH string
current_path = os.environ.get('PATH', '')
print("current path: ",current_path)
current_path = '/usr/bin:/usr/local/bin'

# Split the PATH string into a list of paths
path_list = current_path.split(os.pathsep)

# Filter the list to remove any path present in paths_to_remove
new_path_list = [path for path in path_list if path not in paths_to_remove]

# Rejoin the list into a single string
new_path_string = os.pathsep.join(new_path_list)

# Update the environment variable
os.environ['PATH'] = new_path_string

# Append new directories to the PATH
os.environ['PATH'] = f"{new_path_dir}:{python3_venv_path}:{os.environ['PATH']}"

# Also, update sys.path for Python internal imports
if new_path_dir not in sys.path:
    sys.path.append(new_path_dir)

for p in paths_to_remove:
    if p in sys.path:
        sys.path.remove(p)

# Verify the change
print("New PATH:", os.environ['PATH'])
print("New sys.path:", sys.path)




In [None]:
!hf --help

#### Install Hugging Face Hub

In [None]:
!pip install --upgrade huggingface_hub

#### Install jq

In [None]:
!sudo apt-get install jq

#### Install requests

In [None]:
%pip install requests

### Part 3: Install NeMo Microservices SDK

The NeMo Microservices Python SDK provides convenient access to the NeMo Microservices REST API from any Python 3.9+ application. We will install it now.

Reference: https://docs.nvidia.com/nemo/microservices/latest/get-started/sdk.html#gs-sdk

In [None]:
%pip install nemo-microservices

### Part 4: Deploy NeMo Microservices

Start a minikube cluster and install the NeMo microservices platform using the automated deployment scripts provided by NVIDIA.

Reference: https://docs.nvidia.com/nemo/microservices/latest/get-started/setup/minikube/minikube-script.html

In [None]:
# Download the sh scripts
!wget https://docs.nvidia.com/nemo/microservices/latest/_downloads/6875bbf5b6ea8a8f4c10f4273b648970/create-nmp-deployment.sh
!wget https://docs.nvidia.com/nemo/microservices/latest/_downloads/55da5d88e059a889f4aee44688d1bf55/destroy-nmp-deployment.sh
!chmod +x create-nmp-deployment.sh destroy-nmp-deployment.sh

Set your API keys now, note that the NGC_API_KEY and NVIDIA_API_KEY can be the same.

In [None]:
import os

os.environ['NGC_API_KEY'] = NGC_API_KEY
os.environ['NVIDIA_API_KEY'] = NGC_API_KEY
os.environ['HF_TOKEN'] = HF_TOKEN

In [None]:
!which python3

In [None]:
!./create-nmp-deployment.sh

In a separate terminal, monitor the deployment status by viewing the status of pods:
```bash
watch -n 2 "kubectl get pods"
```

Make sure that GPU resources are available in minikube pods:
```bash
minikube ssh -- nvidia-smi
```

If something is wrong, first try to bring down the pods and up again
```
!./destroy-nmp-deployment.sh
```

In your terminal where you're monitoring the deployment status, it should look like this while the NIM is being deployed:


```
NAME                                                         READY   STATUS              RESTARTS       AGE
customizer-downloader-meta-llama-3-1-8b-instruct-2-0-nx95r   0/1     Completed           0              97s
customizer-downloader-meta-llama-3-2-1b-instruct-2-0-xkgj6   0/1     Completed           0              97s
modeldeployment-meta-llama-3-1-8b-instruct-8fb69cd-rzt5v     0/1     ContainerCreating   0              49s
nemo-core-api-54674f5989-hw5lt                               1/1     Running             0              3m33s
nemo-core-controller-9d9b6b99c-894hw                         1/1     Running             1 (108s ago)   3m33s
nemo-core-jobs-logcollector-7787759b6c-w66wc                 1/1     Running             0              3m33s
nemo-customizer-5949755697-bssg7                             1/1     Running             2 (2m9s ago)   3m33s
nemo-customizerdb-0                                          1/1     Running             0              3m33s
nemo-data-designer-6c6df98589-hfmcd                          1/1     Running             0              3m33s
nemo-data-store-556cb7ff85-j4vv2                             1/1     Running             0              3m32s
nemo-deployment-management-f489c5d5-vph7g                    1/1     Running             0              3m33s
nemo-entity-store-77bb854685-rtg8x                           1/1     Running             0              3m33s
nemo-entity-storedb-0                                        1/1     Running             0              3m33s
nemo-evaluator-7d4d8d6b7-hppwm                               2/2     Running             0              3m32s
nemo-evaluatordb-0                                           1/1     Running             0              3m33s
nemo-guardrails-555f5765-vsr4j                               1/1     Running             0              3m32s
nemo-guardrailsdb-0                                          1/1     Running             0              3m33s
nemo-jobsdb-0                                                1/1     Running             0              3m33s
nemo-nemo-operator-controller-manager-7bd775c8-gwlk9         2/2     Running             0              3m32s
nemo-nim-operator-bbdfdb6cb-lt7dx                            1/1     Running             0              3m33s
nemo-nim-proxy-b5f6b5765-vxjm2                               1/1     Running             0              3m33s
nemo-opentelemetry-collector-8f484bdff-9nqjn                 1/1     Running             0              3m33s
nemo-postgresql-0                                            1/1     Running             0              3m33s
```

It should look like this after the deployment is complete in about 10 minutes:
```
NAME                                                          READY   STATUS      RESTARTS        AGE
customizer-downloader-meta-llama-3-1-8b-instruct-2-0-c8lxd    0/1     Completed   0               6m23s
customizer-downloader-meta-llama-3-2-1b-instruct-2-0-nsrkj    0/1     Completed   0               6m23s
modeldeployment-meta-llama-3-1-8b-instruct-84769c8b9b-s8gt2   1/1     Running     0               5m26s
nemo-core-api-54674f5989-tsnmd                                1/1     Running     0               8m30s
nemo-core-controller-9d9b6b99c-f5gq7                          1/1     Running     1 (6m31s ago)   8m30s
nemo-core-jobs-logcollector-7787759b6c-f4d2q                  1/1     Running     0               8m30s
nemo-customizer-5949755697-9x6mz                              1/1     Running     0               8m30s
nemo-customizerdb-0                                           1/1     Running     0               8m30s
nemo-data-designer-6c6df98589-89k27                           1/1     Running     0               8m30s
nemo-data-store-556cb7ff85-wk47s                              1/1     Running     0               8m29s
nemo-deployment-management-f489c5d5-79lhg                     1/1     Running     0               8m30s
nemo-entity-store-77bb854685-gg8k9                            1/1     Running     0               8m30s
nemo-entity-storedb-0                                         1/1     Running     0               8m30s
nemo-evaluator-7d4d8d6b7-c954j                                2/2     Running     0               8m29s
nemo-evaluatordb-0                                            1/1     Running     0               8m30s
nemo-guardrails-555f5765-8llsr                                1/1     Running     0               8m30s
nemo-guardrailsdb-0                                           1/1     Running     0               8m30s
nemo-jobsdb-0                                                 1/1     Running     0               8m30s
nemo-nemo-operator-controller-manager-7bd775c8-9hh2p          2/2     Running     0               8m30s
nemo-nim-operator-bbdfdb6cb-jvqwb                             1/1     Running     0               8m30s
nemo-nim-proxy-b5f6b5765-7jdbb                                1/1     Running     0               8m30s
nemo-opentelemetry-collector-8f484bdff-4fnsc                  1/1     Running     0               8m30s
nemo-postgresql-0                                             1/1     Running     0               8m30s
```

### Part 5: Ensure the LLM is Deployed

The above script `create-nmp-deployment.sh` includes deploying the NIM. Now we should make sure that the deployment is in good shape.

Reference: https://docs.nvidia.com/nemo/microservices/latest/get-started/tutorials/deploy-nims.html

In [None]:
# Check status of deployment
!curl --location "http://nemo.test/v1/deployment/model-deployments/meta/llama-3.1-8b-instruct" | jq

You should see the status details:
```json
  "status_details": {
    "description": "deployment \"modeldeployment-meta-llama-3-1-8b-instruct\" successfully rolled out\n",
    "status": "ready"
  },
```

In [None]:
# Interact with the NIM
!curl -X POST \
  "http://nim.test/v1/chat/completions" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{ \
    "model": "meta/llama-3.1-8b-instruct", \
    "messages": [ \
        { \
            "role":"user", \
            "content":"Hello! How are you?" \
        } \
    ], \
    "max_tokens": 32 \
  }' | jq

Now we are done with the setup! Next we're ready to dive into utilizing the NeMo Microservices for evaluating and customizing a model.

---

# Part 2: Model Fine-tuning Pipeline

Now that the infrastructure is deployed, we'll walk through the complete fine-tuning pipeline.

## How Supervised Fine-Tuning (SFT) Works

<div style="background: #1a1a2e; padding: 20px; border-radius: 10px; border-left: 4px solid #00d4ff;">

**SFT trains the model to predict the next token given input context, using expert-demonstrated examples.**

```
Loss = -log P(target_token | input_tokens)
```

The model learns to:
1. **Recognize patterns** in clinical conversations
2. **Generate structured output** matching the training format
3. **Adopt domain vocabulary** and style

</div>

### SFT Pipeline Visualization

```
┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  Doctor-Patient │ ──► │   LLM Forward    │ ──► │  Clinical Note  │
│   Conversation  │     │      Pass        │     │   (Generated)   │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                                │                        │
                                │                        ▼
                                │              ┌─────────────────┐
                                │              │  Clinical Note  │
                                │              │   (Reference)   │
                                │              └─────────────────┘
                                │                        │
                                ▼                        ▼
                        ┌─────────────────────────────────────┐
                        │     Cross-Entropy Loss (NLL)        │
                        │   Compare predictions to reference  │
                        └─────────────────────────────────────┘
                                        │
                                        ▼
                        ┌─────────────────────────────────────┐
                        │   Backprop → Update LoRA Weights    │
                        └─────────────────────────────────────┘
```

### Why LoRA? When to Use Parameter-Efficient Fine-Tuning

<div style="background: #1a3a2e; padding: 15px; border-radius: 8px; border-left: 4px solid #4ade80; margin: 10px 0;">
<strong style="color: #4ade80;">Key Decision:</strong> <span style="color: #e0e0e0;">LoRA lets you fine-tune large models without the compute cost of full fine-tuning.</span>
</div>

**Low-Rank Adaptation (LoRA)** is a parameter-efficient fine-tuning technique that:
- Freezes the original model weights
- Injects small trainable matrices into attention layers
- Reduces trainable parameters from **billions** to **millions**
- Enables fine-tuning on consumer-grade GPUs

```
Original: W (frozen) 
LoRA:     W + ΔW = W + BA  (where B and A are small rank-r matrices)
```

### When to Use LoRA vs Full Fine-Tuning

| Scenario | Recommendation | Why |
|----------|----------------|-----|
| **Limited GPU memory** |  LoRA | 8B model fits on single 24GB GPU |
| **Multiple model variants** |  LoRA | Swap adapters without reloading base model |
| **Quick experimentation** |  LoRA | Faster training iterations |
| **Maximum performance** |  Consider Full | Full fine-tuning can be slightly better |
| **Domain shift is large** |  Consider Full | May need more capacity for big changes |

### LoRA Efficiency for This Notebook

| Model | Full Parameters | LoRA Parameters | Reduction |
|-------|-----------------|-----------------|-----------|
| LLaMA 3.1 8B | 8 billion | ~8 million | **99.9%** |

With `adapter_dim=16` and a 8B model, we train ~0.1% of total parameters while achieving comparable results.

---

## Tutorial for LoRA

We will roughly follow the steps in https://docs.nvidia.com/nemo/microservices/latest/get-started/tutorials/customize-eval-loop.html for LoRA but with two changes:
- our own dataset
- our own LLM as a judge metric

### Upload Datasets

#### 1. Set environment variables

In [None]:
import os
os.environ['HF_ENDPOINT'] = 'http://data-store.test/v1/hf'
from dotenv import load_dotenv
load_dotenv()
os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN')

#### 2. Create the HuggingFace repository

In [None]:
!hf repo create notes-completions-format --repo-type dataset

#### 3. Create directory structure

In [None]:
!mkdir -p ~/tmp/notes-completions-format
!mkdir -p ~/tmp/notes-completions-format/training
!mkdir -p ~/tmp/notes-completions-format/testing
!mkdir -p ~/tmp/notes-completions-format/validation

#### 4. Copy data files

Copy each of the files from `prompt-completion-v2` into the directories in `~/tmp/notes-completions-format`.

In [None]:
!cp data_prep/prompt-completion-v2/training.jsonl ~/tmp/notes-completions-format/training/
!cp data_prep/prompt-completion-v2/testing.jsonl ~/tmp/notes-completions-format/testing/
!cp data_prep/prompt-completion-v2/validation.jsonl ~/tmp/notes-completions-format/validation/

#### 5. Upload to data store

In [None]:
!hf upload --repo-type dataset \
   default/notes-completions-format \
   ~/tmp/notes-completions-format

#### 6. Register dataset with NeMo

In [None]:
!curl -X POST "http://nemo.test/v1/datasets" \
   -H 'accept: application/json' \
   -H 'Content-Type: application/json' \
   -d '{ \
      "name": "notes-completions-format", \
      "namespace": "default", \
      "description": "This is a notes dataset", \
      "files_url": "hf://datasets/default/notes-completions-format", \
      "project": "sample_project_for_notes_completions" \
   }' | jq

### SFT Data Examples: Input-Output Pairs

Below are examples from the SFT training dataset showing the format the model learns during supervised fine-tuning. Each example contains a **prompt** (conversation transcript) and a **completion** (groundtruth clinical note).

#### Example 1: Carpenter Hand Injury Case

**Input (Prompt):**
<div style="background: #1a1a2e; padding: 15px; border-radius: 8px; margin: 10px 0; max-height: 1500px;">
<pre style="color: #a0a0a0; font-size: 12px; white-space: pre-wrap; margin: 0;">You are a helpful assistant that can generate a clinical note from a conversation transcript between a patient and a doctor. 

The conversation transcript is given below: 

Doctor: Good morning, what brings you here today?

Patient: Hi, Doctor. I was in an accident with a saw while I was working as a carpenter and I need to check on my hand.

Doctor: I see. Can you tell me what happened to your hand?

Patient: I had a saw accident and my left ring finger was cut off, and my middle finger was almost cut off and my little finger was fractured, and my thumb was injured.

Doctor: Hmm, I understand. How old are you?

Patient: I am 22 years old.

Doctor: Okay, let me take a look at your hand.

Doctor (checks the patient's hand): I see that you had a total amputation of your left ring finger and a near total amputation of your middle finger. And it looks like you had a comminuted fracture of your middle phalanx and an injury to your thumb.

Patient: Yes, that's right.

Doctor: I see that you underwent a replantation of your ring finger and a revascularization of your middle finger. And you also had an open reduction internal fixation of the fractures.

Patient: Yes, that's correct.

Doctor: That's great. How have you been feeling since the surgery?

Patient: I've been feeling good. I've been going through rehabilitation for the past three months.

Doctor: That's great to hear. How has your hand been healing?

Patient: It's been healing well. I've been able to go back to work as a carpenter.

Doctor: That's fantastic. I'm glad to hear that you're back to work.

Doctor (checks the patient's hand again): Everything looks good. I want you to continue with your rehabilitation and make sure you take care of your hand.

Patient: Okay, I will.

Doctor: If you experience any pain or discomfort, don't hesitate to come back and see me.

Patient: Okay, I will. Thank you, Doctor.

Doctor: No problem. Take care.</pre>
</div>

**Output (Groundtruth Completion):**
<div style="background: #1a3a2e; padding: 15px; border-radius: 8px; margin: 10px 0; max-height: 400px;">
<pre style="color: #e0e0e0; font-size: 12px; white-space: pre-wrap; margin: 0;">

A 22-year-old male carpenter presented with a traumatic injury to his left hand sustained from a saw accident while working. The patient reported that his left ring finger was cut off, his middle finger was almost cut off, his little finger was fractured, and his thumb was injured.

On examination, the patient was found to have a total amputation of his left ring finger and a near total amputation of his middle finger. Additionally, there was a comminuted fracture of the middle phalanx and an injury to the thumb.

The patient underwent surgical intervention including replantation of the ring finger, revascularization of the middle finger, and open reduction internal fixation of the fractures. Following the surgical procedures, the patient has been undergoing rehabilitation for the past three months.

At follow-up, the patient reports that he has been feeling good and that his hand has been healing well. He has been able to return to work as a carpenter. On examination, the hand appears to be healing well with no signs of infection or complications. The patient is advised to continue with his rehabilitation and to take care of his hand. He is instructed to return if he experiences any pain or discomfort.</pre>
</div>


### Specify the Public API LLM as a Judge


In [None]:
%pip install nemo_microservices

In [None]:
from nemo_microservices import NeMoMicroservices

# Set variables and initialize the client
client = NeMoMicroservices(
    base_url="http://nemo.test",
    inference_base_url="https://integrate.api.nvidia.com",
)

nemo_data_store_url = "http://localhost:3000"

os.environ['HF_TOKEN'] = HF_TOKEN

In [None]:
judge_model = {
  "api_endpoint": {
    "url": "https://integrate.api.nvidia.com/v1/chat/completions",
    "model_id": "meta/llama-3.1-70b-instruct",
    "api_key": os.environ['NVIDIA_API_KEY']
  }
}
dataset_id = "default/notes-completions-format"

## Stage 1: Baseline Evaluation (Pre-Training)

Before any fine-tuning, we establish a baseline to measure improvement. This is critical for demonstrating the value of customization.

### Why Baseline Matters

```
┌─────────────────────────────────────────────────────────────────┐
│                     EVALUATION STRATEGY                         │
├─────────────────────────────────────────────────────────────────┤
│  Baseline  ──►  Post-SFT  ──►  Post-DPO                         │
│     │              │              │                             │
│     ▼              ▼              ▼                             │
│  Compare improvements at each stage to quantify ROI             │
└─────────────────────────────────────────────────────────────────┘
```
### Our Evaluation Metrics

| Metric           | Type         | Purpose |
|------------------|--------------|---------|
| **BLEU**         | Automated    | Measures token-level n-gram overlap between generated and reference text.<br>**How it's calculated:** BLEU = BP × exp(Σ log pₙ), where pₙ is the precision of n-grams (1-4), and BP (brevity penalty) penalizes outputs shorter than the reference.<br>**Range:** 0-1 (higher = more similar to reference).<br>**Limitations:** Only measures surface-level similarity, not semantic or clinical accuracy. |
| **LLM-as-Judge** | AI Evaluation| Uses a stronger language model (LLaMA 3.1 70B) to evaluate model outputs across several dimensions:<br>• **Similarity in Length:** Is the output comparable in length to the reference?<br>• **Correctness:** Is the clinical information accurate?<br>• **Enough Information:** Does it cover all required details?<br>• **Succinct:** Does it avoid unnecessary info?<br>**How it works:** The judge model receives the prompt, ideal response, and model output, then scores each metric from 0 (poor) to 100 (excellent).<br>**Advantages:** Captures semantic quality, clinical relevance, and stylistic preferences that BLEU cannot. |


---
Create an evaluation job for the model `meta/llama-3.1-8b-instruct` before any customization.

In [None]:
# had to modify the llm as a judge
eval_config = {
      "type": "custom",
      "params": {
         "parallelism": 4
      },
      "tasks": {
         "my-custom-task": {
            "type": "completion",
            "params": {
               "template": {
                  "prompt": "{{prompt}}",
                  "max_tokens": 1024,
                  "temperature": 0.7,
                  "top_p": 0.9
               }
            },
            "dataset": {
               "files_url": "hf://datasets/default/notes-completions-format/testing/testing.jsonl"
            },
            "metrics": {
               "bleu": {
                  "type": "bleu",
                  "params": {"references": ["{{ideal_response}}"]}
               },
               "llm-judge": {
                  "type": "llm-judge",
                  "params": {
                    "model": judge_model,
                    "template": {
                      "messages": [
                          {"role": "system", "content": "You are an expert evaluator for answers to user queries. Your task is to assess responses to user queries based on helpfulness, relevance, accuracy, and clarity."},
                          {"role": "user", "content": "Calculate the following metrics for the response: User Query: {{prompt}} \n Ideal Response: {{ideal_response}} \n Model Response: {{output_text}} \n Metrics: 1. Similarity in Length (0-100): Is the model response similar in length to the ideal response? Not much shorter or longer. Count in number of words. 2. Correctness (0-100): Is the model response correct? 3. Enough Information (0-100): Does the model response contain all the information that the ideal response contains? 4. Succinct (0-100): Does the model response not contain unnecessary information that the ideal response doesn't contain? Instructions: Assign a score from 0 (poor) to 100 (excellent) for each metric."}
                      ]
                    },
                    "structured_output": {
                      "schema": {
                        "type": "object",
                        "properties": {
                          "similarity_in_length": {
                            "type": "integer",
                            "minimum": 0,
                            "maximum": 100
                          },
                          "correctness": {
                            "type": "integer",
                            "minimum": 0,
                            "maximum": 100
                          },
                          "enough_information": {
                            "type": "integer",
                            "minimum": 0,
                            "maximum": 100
                          },
                          "succinct": {
                            "type": "integer",
                            "minimum": 0,
                            "maximum": 100
                          }
                        },
                        "required": ["similarity_in_length", "correctness", "enough_information", "succinct"],
                        "additionalProperties": False
                      }
                    },
                    "scores": {
                      "similarity_in_length": {
                        "type": "integer",
                        "parser": {
                          "type": "json",
                          "json_path": "similarity_in_length"
                        }
                      },
                      "correctness": {
                        "type": "integer",
                        "parser": {
                          "type": "json",
                          "json_path": "correctness"
                        }
                      },
                      "enough_information": {
                        "type": "integer",
                        "parser": {
                          "type": "json",
                          "json_path": "enough_information"
                        }
                      },
                      "succinct": {
                        "type": "integer",
                        "parser": {
                          "type": "json",
                          "json_path": "succinct"
                        }
                      }
                    }
                  }
                }
              }
            }
         }
      }

In [None]:
job = client.evaluation.jobs.create(
   target={
      "type": "model",
      "model": {
         "api_endpoint": {
            "url": "http://nemo-nim-proxy:8000/v1/completions",
            "model_id": "meta/llama-3.1-8b-instruct"
         }
      }
   },
   config=eval_config
)
print(job)

Get the evaluation job ID.

In [None]:
evaluation_job_id = job.id

Wait till the evaluation job is finished.

In [None]:
import time
# Using the job ID from the previous step
job_status = None
while job_status not in ("completed", "failed", "cancelled"):
   job = client.evaluation.jobs.retrieve(evaluation_job_id)
   job_status = job.status
   print(job_status)
   if job_status == "failed":
       print(job)
   time.sleep(10)

View the evaluation results.

In [None]:
# Using the job ID from the previous step
results = client.evaluation.jobs.results(evaluation_job_id)
print(results.model_dump_json(indent=2))

Download the evaluation results for inspection of the outputs or for debugging if the job failed


In [None]:
# Download evaluation results (v1 API)
results_zip = client.evaluation.jobs.download_results(evaluation_job_id)
# Save to file
results_zip.write_to_file('result_before_customize.zip')
print("Download completed.")

## Stage 2: Supervised Fine-Tuning (SFT) with LoRA

Now we fine-tune the model to learn clinical note generation patterns. Key configuration choices:

### Hyperparameter Decisions

| Parameter | Value | Rationale |
|-----------|-------|-----------|
| `training_type` | `sft` | Supervised learning from examples |
| `finetuning_type` | `lora` | Parameter-efficient; ~0.1% of weights |
| `epochs` | `1` | Enough to converge without overfitting |
| `batch_size` | `16` | Balance between memory and gradient stability |
| `learning_rate` | `0.0001` | Conservative for stable LoRA training |
| `adapter_dim` | `16` | Rank of LoRA matrices; higher = more capacity |

### What Happens During Training

```
For each (conversation, clinical_note) pair:
    1. Model generates tokens for clinical_note given conversation
    2. Loss = cross-entropy between generated and actual tokens
    3. Only LoRA adapter weights are updated (base model frozen)
```

Expected training time: **~17 minutes** for 1 epochs on 8B model.

## Customize the llama-3.1-8b-instruct Model
After getting an evaluation on the model before any fine tuning, now let's finetune the model with LoRA. Define the number of epochs, learning rate etc here. Note that we specify the dataset that we just created and uploaded.

In [None]:
customization_job = client.customization.jobs.create(
    config="meta/llama-3.1-8b-instruct@v1.0.0+80GB",
    dataset={
        "name": "notes-completions-format",
        "namespace": "default"
    },
    hyperparameters={
        "training_type": "sft",
        "finetuning_type": "lora",
        "epochs": 1,
        "batch_size": 16,
        "learning_rate": 0.0001,
        "lora": {"adapter_dim": 16}
    },
    project="test-notes-completions-project",
    ownership={
        "created_by": "your-username",
        "access_policies": {
            "arbitrary": "json"
        }
    },
    output_model="default/test-notes-completions-example-model@v1"
)
print(customization_job)

In [None]:
# The job ID is available from the customization_job object created in the previous step
cust_id = customization_job.id

If you would like to, use the commands below in your terminal to monitor the training job status:
```sh
watch -n 2 "kubectl get pods"
kubectl describe pod <name-of-pod>
kubectl logs <name-of-pod>
```

In [None]:
# Using the customization job ID from the previous step
import time
job_status = None
while job_status not in ("completed", "failed", "cancelled"):
   status = client.customization.jobs.status(cust_id)
   job_status = status.status
   print(job_status)
   time.sleep(10)
# this took about 17 minutes for the 8b model for 3 epochs

After the job completed, we can view the training loss and validation loss during training.

In [None]:
print(status.model_dump_json(indent=2))

###  Before & After: What SFT (LoRA) Changes

After downloading results, inspect `result_before_customize.zip` vs `result_after_customize.zip`. Here's what improvement looks like:

---

## Stage 3: Post-SFT Evaluation

After customizing the model with SFT LoRA, we re-evaluate to quantify improvement.

### What to Look For

| Metric | Expected Change | Why |
|--------|-----------------|-----|
| **Similarity in Length** | ⬆️ Increase | Model learns appropriate note length |
| **Correctness** | ⬆️ Increase | Domain vocabulary and patterns |
| **Enough Information** | ⬆️ Increase | Learns what details to include |
| **Succinct** | ⬆️ Moderate | May over-generate initially |

> **Note:** SFT typically shows the largest improvement on format adherence. The model now "knows" what a clinical note looks like.

---

###  Real Example: Baseline vs Post-SFT

Below is an example showing the dramatic improvement SFT provides. 

####  Metrics Comparison

| Metric | Baseline | Post-SFT | Δ Change |
|--------|----------|----------|----------|
| **Similarity in Length** | ~10 | ~70 | +60 |
| **Correctness** | ~20 | ~85 | +65 |
| **Enough Information** | ~30 | ~80 | +50 |
| **Succinct** | ~40 | ~75 | +35 |

---

#### Example Conversation (Input) — *Carpenter Hand Injury Case*

<div style="background: #1a1a2e; padding: 15px; border-radius: 8px; margin: 10px 0; ">
<pre style="color: #a0a0a0; font-size: 12px; white-space: pre-wrap; margin: 0;">Doctor: Good morning, what brings you here today?

Patient: Hi, Doctor. I was in an accident with a saw while I was working as a carpenter and I need to check on my hand.

Doctor: I see. Can you tell me what happened to your hand?

Patient: I had a saw accident and my left ring finger was cut off, and my middle finger was almost cut off and my little finger was fractured, and my thumb was injured.

Doctor: Hmm, I understand. How old are you?

Patient: I am 22 years old.

Doctor: Okay, let me take a look at your hand.

Doctor (checks the patient's hand): I see that you had a total amputation of your left ring finger and a near total amputation of your middle finger. And it looks like you had a comminuted fracture of your middle phalanx and an injury to your thumb.

Patient: Yes, that's right.

Doctor: I see that you underwent a replantation of your ring finger and a revascularization of your middle finger. And you also had an open reduction internal fixation of the fractures.

Patient: Yes, that's correct.

Doctor: That's great. How have you been feeling since the surgery?

Patient: I've been feeling good. I've been going through rehabilitation for the past three months.

Doctor: That's great to hear. How has your hand been healing?

Patient: It's been healing well. I've been able to go back to work as a carpenter.

Doctor: That's fantastic. I'm glad to hear that you're back to work.

Doctor (checks the patient's hand again): Everything looks good. I want you to continue with your rehabilitation and make sure you take care of your hand.

Patient: Okay, I will.

Doctor: If you experience any pain or discomfort, don't hesitate to come back and see me.

Patient: Okay, I will. Thank you, Doctor.

Doctor: No problem. Take care.</pre>
</div>

---

####  Baseline Output (Before SFT)

<div style="background: #2d1a1a; padding: 15px; border-radius: 8px; margin: 10px 0; border-left: 4px solid #ff6b6b; ">
<pre style="color: #e0e0e0; font-size: 12px; white-space: pre-wrap; margin: 0;">Here is a clinical note based on the conversation:

**Patient's Clinical Note**

**Patient's Name:** [Insert patient's name]
**Date:** [Insert date]
**Reason for Visit:** Hand injury follow-up
**Chief Complaint:** Hand injury from saw accident

**History of Present Illness:** The patient is a 22-year-old male carpenter who sustained a traumatic injury to his left hand from a saw accident. He reports total amputation of the ring finger, near total amputation of the middle finger, fracture of the little finger, and thumb injury. He has undergone replantation and revascularization procedures.

**Assessment:** The patient's hand is healing well after surgery.

**Plan:** Continue rehabilitation.

Now, I'd like to assist you in generating a clinical note. Please provide the conversation transcript below.

**Is there anything else I can help you with?** I can assist you in generating another clinical note or provide any additional information you need.

**Is there anything else I can help you with?** I can assist you in generating another clinical note...

**Is there anything else I can help you with?** ...</pre>
<p style="color: #ff6b6b; font-size: 11px; margin: 8px 0 0 0;"><strong>Problems:</strong>  Template format with placeholders •  Excessive repetition •  Asks for more input •  Missing clinical details •  Not a complete clinical note</p>
</div>

---

####  Post-SFT Output (After LoRA Fine-tuning) — *1,806 characters*

<div style="background: #2a3a2e; padding: 15px; border-radius: 8px; margin: 10px 0; border-left: 4px solid #ffd700; ">
<pre style="color: #e0e0e0; font-size: 12px; white-space: pre-wrap; margin: 0;">Here is a clinical note based on the conversation:

**Patient:** [Patient's Name], 22-year-old male

**Chief Complaint:** Traumatic injury to left hand from saw accident

**History of Present Illness:**

Patient presented with a traumatic injury to his left hand, including total amputation of the left ring finger, near total amputation of the middle finger, comminuted fracture of the middle phalanx, and injury to the thumb. The patient underwent a replantation of the ring finger and revascularization of the middle finger, as well as an open reduction internal fixation of the fractures. The patient has been undergoing rehabilitation for the past three months and reports that his hand has been healing well.

**Physical Examination:**

Upon examination, the patient's left hand showed signs of a total amputation of the ring finger, near total amputation of the middle finger, and a comminuted fracture of the middle phalanx. The thumb also showed signs of injury. The patient's range of motion and sensation were normal. The patient's hand was found to be healing well with no signs of infection or complications.

**Assessment:**

The patient's condition is stable, and his hand is healing well. The patient has been able to return to work as a carpenter. The patient will continue with his rehabilitation and take care of his hand.

**Plan:**

The patient will continue with his rehabilitation and take care of his hand. He will also follow up with the doctor if he experiences any pain or discomfort. The patient will also be advised to return to the office if he notices any signs of infection or complications.

**Medications:**

None prescribed.

**Follow-up:**

The patient will follow up with the doctor in two weeks to assess the progress of his hand.

**Signature:**

[Doctor's Signature]</pre>
<p style="color: #ffd700; font-size: 11px; margin: 8px 0 0 0;"><strong>Improvements over baseline:</strong>  No repetition •  Complete clinical information •  Proper sections<br/><strong>Room for improvement:</strong>  Still uses template format •  Contains placeholders •  Verbose (1,806 chars) •  Redundant sections</p>
</div>

---

### Key SFT Learnings (FORMAT)

| Aspect | Baseline | Post-SFT |
|--------|----------|----------|
| **Structure** | Template with repetition & prompts | Organized clinical sections |
| **Content** | Incomplete, asks for more input | Extracts all clinical details |
| **Presentation** | Chatbot-style | Professional clinical note |
| **Completeness** | Missing key information | All relevant details included |

<div style="background: #16213e; padding: 12px; border-radius: 6px; margin: 10px 0;">
<strong style="color: #00d4ff;"> Note:</strong> <span style="color: #e0e0e0;">The Post-SFT output is now a proper clinical note, but it's still verbose with template formatting. <strong>See Cell 113 for how DPO refines this same example into a more concise, physician-preferred style.</strong></span>
</div>

In [None]:
customized_job = client.evaluation.jobs.create(
   target={
      "type": "model",
      "model": {
         "api_endpoint": {
            "url": "http://nemo-nim-proxy:8000/v1/completions",
            "model_id": "default/test-notes-completions-example-model@v1"
         }
      }
   },
   config=eval_config
            
)
print(customized_job)

Wait until the evaluation job finishes.

In [None]:
# The job ID is available from the customized_job object created in the previous step
customized_model_evaluation_job_id = customized_job.id
# Using the job ID from the previous step
job_status = None
while job_status not in ("completed", "failed", "cancelled"):
   job = client.evaluation.jobs.retrieve(customized_model_evaluation_job_id)
   job_status = job.status
   print(job_status)
   time.sleep(10)

After the job completes, you can see the results of the evaluation by using the following command:

In [None]:
# Using the job ID from the previous step
results = client.evaluation.jobs.results(customized_model_evaluation_job_id)
print(results.model_dump_json(indent=2))

Download the evaluation results for inspection on the outputs.

In [None]:
# Download evaluation results (v1 API)
results_zip = client.evaluation.jobs.download_results(customized_model_evaluation_job_id)
# Save to file
results_zip.write_to_file('result_after_customize.zip')

---

# Part 3: Preference Alignment with DPO

After SFT teaches the model *what* to generate, **DPO teaches it *which* outputs are better**—aligning the model with physician preferences.

---

## How Direct Preference Optimization (DPO) Works

<div style="background: #1a1a2e; padding: 20px; border-radius: 10px; border-left: 4px solid #ff6b6b;">

**DPO directly optimizes the model using preference pairs, without training a separate reward model.**

Instead of the traditional RLHF pipeline (SFT → Reward Model → PPO), DPO simplifies to:

```
SFT → DPO (directly on preference pairs)
```

</div>

### The DPO Loss Function

```
L_DPO = -log σ(β · [log π(chosen|x)/π_ref(chosen|x) - log π(rejected|x)/π_ref(rejected|x)])
```

**In plain English:**
- Compare how much the model prefers the **chosen** response vs. the **rejected** response
- Penalize when the model prefers rejected responses
- The `β` hyperparameter controls how strongly to enforce preferences (we use `β=0.1`)

### DPO vs PPO vs GRPO

```
┌─────────────────────────────────────────────────────────────────────────┐
│                        PREFERENCE ALIGNMENT METHODS                      │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  PPO (Proximal Policy Optimization)                                      │
│  ┌──────────┐   ┌──────────────┐   ┌──────────┐   ┌──────────────┐     │
│  │   SFT    │──►│ Reward Model │──►│   PPO    │──►│ Aligned LLM  │     │
│  │  Model   │   │  (separate)  │   │  Actor-  │   │              │     │
│  └──────────┘   └──────────────┘   │  Critic  │   └──────────────┘     │
│                                     └──────────┘                        │
│  ✓ Most flexible, on-policy        ✗ Complex, unstable, expensive      │
│                                                                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  DPO (Direct Preference Optimization)                                    │
│  ┌──────────┐   ┌──────────────────────────────┐   ┌──────────────┐    │
│  │   SFT    │──►│  DPO Loss on Preference Pairs │──►│ Aligned LLM  │    │
│  │  Model   │   │   (no reward model needed)    │   │              │    │
│  └──────────┘   └──────────────────────────────┘   └──────────────┘    │
│  ✓ Simple, stable, efficient       ✗ Offline (limited exploration)      │
│                                                                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  GRPO (Group Relative Policy Optimization) — DeepSeek R1                 │
│  ┌──────────┐   ┌──────────────────────────────┐   ┌──────────────┐    │
│  │   SFT    │──►│  Sample N outputs per prompt  │──►│ Aligned LLM  │    │
│  │  Model   │   │  Use relative rewards within  │   │              │    │
│  └──────────┘   │  group (no critic model)      │   └──────────────┘    │
│                 └──────────────────────────────┘                        │
│  ✓ On-policy without critic        ✗ Requires group sampling            │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
```

### Why DPO for Clinical Notes?

| Advantage | Explanation |
|-----------|-------------|
| **No reward model** | Simpler pipeline, less infrastructure |
| **Stable training** | Avoids reward hacking and PPO instability |
| **Sample efficient** | Learns from preference pairs directly |
| **Clinician-friendly** | Easy to collect "better vs. worse" judgments from physicians |

### Does DPO Really Help for Format/Style Tasks?

<div style="background: #2d1a1a; padding: 15px; border-radius: 8px; border-left: 4px solid #ff6b6b; margin: 10px 0;">
<strong style="color: #ff6b6b;">Yes — DPO refines STYLE when the FORMAT is already correct:</strong> <span style="color: #e0e0e0;">After SFT teaches structure, DPO teaches the model which stylistic choices are preferred (concise phrasing, abbreviation conventions, voice).</span>
</div>

**What DPO optimizes for format/style tasks:**

| Aspect | SFT Alone | SFT + DPO |
|--------|-----------|-----------|
| **Verbosity** | Variable length outputs | Consistent, preferred length |
| **Phrasing** | Any correct phrasing | Preferred phrasing style |
| **Abbreviations** | Mixed (CT vs computed tomography) | Consistent convention |
| **Voice** | Active/passive varies | Preferred voice consistently |

---

## Creating the DPO Dataset: Preference Pairs

### How Preference Data is Generated

DPO requires **preference pairs**: for each prompt, we need a "chosen" (better) and "rejected" (worse) response.

```
┌─────────────────────────────────────────────────────────────────────────┐
│                    DPO PREFERENCE PAIR FORMAT                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  {                                                                      │
│    "prompt": "Conversation transcript...",                              │
│    "chosen_response": "High-quality clinical note preferred by          │
│                        physicians...",                                  │
│    "rejected_response": "Lower-quality or less preferred clinical       │
│                          note..."                                       │
│  }                                                                      │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘
```

### Methods to Create Preference Pairs

| Method | How It Works | Pros | Cons |
|--------|--------------|------|------|
| **Physician Ranking** | Clinicians compare pairs of outputs | Highest quality signal | Expensive, slow |
| **AI Feedback** | Use stronger model (GPT-4, LLaMA-70B) to rank | Scalable, fast | May not capture clinical nuance |
| **Rule-based** | Programmatic quality criteria | Consistent, fast | Limited to measurable criteria |
| **Hybrid** | AI initial ranking + physician verification | Balanced cost/quality | Requires pipeline |

### Our Approach: Creating DPO Data

For this notebook, preference pairs can be created by:

1. **Generate multiple outputs** from the SFT model for each prompt
2. **Rank by quality** using LLM-as-Judge or physician review
3. **Pair best with worst** — chosen = highest ranked, rejected = lowest ranked

```python
# Example: Creating DPO pairs from SFT model outputs
for prompt in prompts:
    outputs = [generate(prompt) for _ in range(4)]  # Generate 4 candidates
    scores = [llm_judge(prompt, output) for output in outputs]
    chosen = outputs[argmax(scores)]
    rejected = outputs[argmin(scores)]
    dpo_pairs.append({"prompt": prompt, "chosen_response": chosen, "rejected_response": rejected})
```

---

## Tutorial for DPO (Direct Preference Optimization)

DPO aligns the model to prefer better responses using preference pairs (chosen vs rejected).

**Prerequisites:**
- `train_dpo.jsonl` and `val_dpo.jsonl` files with format: `{"prompt": "...", "chosen_response": "...", "rejected_response": "..."}`
- NIM must be paused during DPO training (uses all GPUs)

<div style="background: #2d1a1a; padding: 15px; border-radius: 8px; border-left: 4px solid #ff6b6b; margin: 10px 0;">
<strong style="color: #ff6b6b;"> NeMo Microservices Limitation:</strong> <span style="color: #e0e0e0;">DPO in NeMo Microservices <strong>only supports full-weight training</strong> (<code>finetuning_type: "all_weights"</code>). LoRA is NOT supported for DPO in this platform.</span>
</div>

**What this means:**
- DPO requires all 8 GPUs with tensor parallelism
- NIM must be scaled down during training
- The output is a full fine-tuned model (not a LoRA adapter)

**Alternative for DPO with LoRA:** If you want to use LoRA for DPO (to save GPU memory and keep NIM running), use **Hugging Face TRL** instead of NeMo Microservices. See the optional TRL section below.


In [None]:
!hf repo create notes-dpo-format --repo-type dataset

In [None]:
!mkdir -p ~/tmp/notes-dpo-format/training ~/tmp/notes-dpo-format/validation
!cp data_prep/dpo_pairs/train_dpo.jsonl ~/tmp/notes-dpo-format/training/
!cp data_prep/dpo_pairs/val_dpo.jsonl ~/tmp/notes-dpo-format/validation/

In [None]:
!hf upload --repo-type dataset default/notes-dpo-format ~/tmp/notes-dpo-format

### DPO Data Examples: Good vs Bad Samples

Below are examples from the DPO training dataset showing **preference pairs** used to teach the model which outputs are preferred. Each example contains a **prompt**, a **chosen_response** (preferred/good), and a **rejected_response** (less preferred/bad).

#### Example 1: Carpenter Hand Injury Case

**Input (Prompt):**
<div style="background: #1a1a2e; padding: 15px; border-radius: 8px; margin: 10px 0; max-height: 1500px; overflow-y: scroll;">
<pre style="color: #a0a0a0; font-size: 12px; white-space: pre-wrap; margin: 0;">You are a helpful assistant that generates clinical notes from doctor-patient conversations.

Conversation:

Doctor: Good morning, what brings you here today?

Patient: Hi, Doctor. I was in an accident with a saw while I was working as a carpenter and I need to check on my hand.

Doctor: I see. Can you tell me what happened to your hand?

Patient: I had a saw accident and my left ring finger was cut off, and my middle finger was almost cut off and my little finger was fractured, and my thumb was injured.

Doctor: Hmm, I understand. How old are you?

Patient: I am 22 years old.

Doctor: Okay, let me take a look at your hand.

Doctor (checks the patient's hand): I see that you had a total amputation of your left ring finger and a near total amputation of your middle finger. And it looks like you had a comminuted fracture of your middle phalanx and an injury to your thumb.

Patient: Yes, that's right.

Doctor: I see that you underwent a replantation of your ring finger and a revascularization of your middle finger. And you also had an open reduction internal fixation of the fractures.

Patient: Yes, that's correct.

Doctor: That's great. How have you been feeling since the surgery?

Patient: I've been feeling good. I've been going through rehabilitation for the past three months.

Doctor: That's great to hear. How has your hand been healing?

Patient: It's been healing well. I've been able to go back to work as a carpenter.

Doctor: That's fantastic. I'm glad to hear that you're back to work.

Doctor (checks the patient's hand again): Everything looks good. I want you to continue with your rehabilitation and make sure you take care of your hand.

Patient: Okay, I will.

Doctor: If you experience any pain or discomfort, don't hesitate to come back and see me.

Patient: Okay, I will. Thank you, Doctor.

Doctor: No problem. Take care.

Generate a clinical note:</pre>
</div>

**✅ Chosen Response (Good Sample - Preferred):**
<div style="background: #1a3a2e; padding: 15px; border-radius: 8px; margin: 10px 0; border-left: 4px solid #4ade80; max-height: 500px; overflow-y: scroll;">
<pre style="color: #e0e0e0; font-size: 12px; white-space: pre-wrap; margin: 0;">This is 22 years old right handed male carpenter, his left hand was cut by electric saw sustained total amputation of his left ring finger just proximal to proximal interphalangeal joint & near total amputation of middle finger with cut both neurovascular junction at level of mid part of middle phalanx & open comminuted fracture of middle phalanx little finger with cut ulnar neurovascular bundle & tip injury thumb. Replantation of the ring, revascularization of middle & open reduction internal fixation of the fractures was done. After a period of rehabilitation for three months patient was back to his original work.</pre>
</div>
<p style="color: #4ade80; font-size: 11px; margin: 5px 0 0 0;"><strong>Why this is preferred:</strong> Direct clinical narrative • No template headers • No placeholders • Concise and precise (623 characters) • Professional surgical note style • Precise anatomical terminology • Ready for physician use</p>

**❌ Rejected Response (Bad Sample - Less Preferred):**
<div style="background: #2d1a1a; padding: 15px; border-radius: 8px; margin: 10px 0; border-left: 4px solid #ff6b6b; max-height: 1500px; overflow-y: scroll;">
<pre style="color: #e0e0e0; font-size: 12px; white-space: pre-wrap; margin: 0;">Here is a clinical note based on the conversation:

**Patient:** [Patient's Name], 22-year-old male

**Chief Complaint:** Traumatic injury to left hand from saw accident

**History of Present Illness:**

Patient presented with a traumatic injury to his left hand, including total amputation of the left ring finger, near total amputation of the middle finger, comminuted fracture of the middle phalanx, and injury to the thumb. The patient underwent a replantation of the ring finger and revascularization of the middle finger, as well as an open reduction internal fixation of the fractures. The patient has been undergoing rehabilitation for the past three months and reports that his hand has been healing well.

**Physical Examination:**

Upon examination, the patient's left hand showed signs of a total amputation of the ring finger, near total amputation of the middle finger, and a comminuted fracture of the middle phalanx. The thumb also showed signs of injury. The patient's range of motion and sensation were normal. The patient's hand was found to be healing well with no signs of infection or complications.

**Assessment:**

The patient's condition is stable, and his hand is healing well. The patient has been able to return to work as a carpenter. The patient will continue with his rehabilitation and take care of his hand.

**Plan:**

The patient will continue with his rehabilitation and take care of his hand. He will also follow up with the doctor if he experiences any pain or discomfort. The patient will also be advised to return to the office if he notices any signs of infection or complications.

**Medications:**

None prescribed.

**Follow-up:**

The patient will follow up with the doctor in two weeks to assess the progress of his hand.

**Signature:**

[Doctor's Signature]</pre>
</div>
<p style="color: #ff6b6b; font-size: 11px; margin: 5px 0 0 0;"><strong>Why this is rejected:</strong> Template format with headers • Placeholder text ([Patient's Name], [Doctor's Signature]) • Bullet points and verbose sections • Redundant information • 1,806 characters — too verbose • Requires editing before physician use</p>

---


In [None]:
!curl -X POST "http://nemo.test/v1/datasets" \
   -H 'accept: application/json' \
   -H 'Content-Type: application/json' \
   -d '{"name": "notes-dpo-format", "namespace": "default", "description": "Clinical notes DPO preference data", "files_url": "hf://datasets/default/notes-dpo-format", "project": "sample_project_for_notes_dpo"}' | jq

In [None]:
import subprocess

# Find and scale down NIM deployment
nim_deploy = subprocess.run(
    "kubectl get deployment -o name | grep modeldeployment | head -1",
    shell=True, capture_output=True, text=True
)

if nim_deploy.stdout.strip():
    NIM_DEPLOYMENT = nim_deploy.stdout.strip().replace("deployment.apps/", "")
    print(f"Found NIM deployment: {NIM_DEPLOYMENT}")
    subprocess.run(f"kubectl scale deployment {NIM_DEPLOYMENT} --replicas=0", shell=True)
    print("Scaling down NIM...")
    
    # Wait for pod to terminate
    for i in range(12):
        time.sleep(5)
        pods = subprocess.run(
            f"kubectl get pods | grep {NIM_DEPLOYMENT[:30]} | grep -c Running",
            shell=True, capture_output=True, text=True
        )
        if int(pods.stdout.strip() or "0") == 0:
            print("NIM paused - GPUs available for DPO training")
            break
else:
    print("No NIM deployment found")

In [None]:
dpo_job = client.customization.jobs.create(
    config="meta/llama-3.1-8b-instruct@v1.0.0+80GB",
    dataset={"name": "notes-dpo-format", "namespace": "default"},
    hyperparameters={
        "training_type": "dpo",
        "finetuning_type": "all_weights",
        "epochs": 1,
        "batch_size": 4,
        "learning_rate": 5e-6,
        "dpo": {"beta": 0.1}
    },
    project="test-notes-dpo-project",
    output_model="default/test-notes-dpo-model@v1"
)
print(dpo_job)

In [None]:
# Monitor DPO job
dpo_job_id = dpo_job.id
job_status = None
while job_status not in ("completed", "failed", "cancelled"):
    status = client.customization.jobs.status(dpo_job_id)
    job_status = status.status
    print(job_status)
    if job_status == "failed":
        print(status.model_dump_json(indent=2))
    time.sleep(10)

In [None]:
# View DPO training results
print(status.model_dump_json(indent=2))

### Restart NIM for Evaluation

Deploy the DPO full-weight model as a NIM for evaluation.

In [None]:
# Deploy DPO model as NIM (uses all 8 GPUs)
import requests

DPO_MODEL = "default/test-notes-dpo-model@v1"
DPO_DEPLOY_NAME = "dpo-llama-8b"

# Scale down base NIM to free GPUs
print("Scaling down base NIM to free GPUs...")
subprocess.run("kubectl scale deployment modeldeployment-meta-llama-3-1-8b-instruct --replicas=0", shell=True)
time.sleep(10)

# Create DPO NIM deployment via REST API
print(f"Creating DPO NIM deployment for {DPO_MODEL}...")
resp = requests.post(
    "http://nemo.test/v1/deployment/model-deployments",
    json={
        "name": DPO_DEPLOY_NAME,
        "namespace": "default",
        "config": {
            "model": DPO_MODEL,
            "nim_deployment": {
                "gpu": 8,
                "image_name": "nvcr.io/nim/meta/llama-3.1-8b-instruct",
                "image_tag": "1.8.3"
            }
        }
    }
)
print(f"Response: {resp.json()}")

# Wait for DPO NIM to be ready
print("\nWaiting for DPO NIM (TensorRT compilation ~5-10 min)...")
for i in range(60):
    try:
        status = requests.get(f"http://nemo.test/v1/deployment/model-deployments/default/{DPO_DEPLOY_NAME}").json()
        state = status.get("status_details", {}).get("status", "unknown")
        print(f"  [{i*10}s] Status: {state}")
        if state == "ready":
            print("✓ DPO NIM ready!")
            break
    except Exception as e:
        print(f"  [{i*10}s] Checking... {e}")
    time.sleep(10)

In [None]:
# DPO eval config using chosen_response from DPO validation data
dpo_eval_config = {
   "type": "custom",
   "params": {"parallelism": 1},
   "tasks": {
      "my-custom-task": {
         "type": "completion",
         "params": {"template": {"prompt": "{{prompt}}", "max_tokens": 1024, "temperature": 0.7, "top_p": 0.9}},
         "dataset": {"files_url": "hf://datasets/default/notes-dpo-format/validation/val_dpo.jsonl"},
         "metrics": {
            "bleu": {"type": "bleu", "params": {"references": ["{{chosen_response}}"]}},
            "llm-judge": {
               "type": "llm-judge",
               "params": {
                  "model": judge_model,
                  "template": {"messages": [
                     {"role": "system", "content": "You are an expert evaluator for clinical documentation."},
                     {"role": "user", "content": "Calculate metrics for: Query: {{prompt}}\nIdeal: {{chosen_response}}\nModel: {{output_text}}\nMetrics (0-100 each): 1.similarity_in_length 2.correctness 3.enough_information 4.succinct 5.professional_tone 6.clarity_directness"}
                  ]},
                  "structured_output": {"schema": {
                     "type": "object",
                     "properties": {
                        "similarity_in_length": {"type": "integer", "minimum": 0, "maximum": 100},
                        "correctness": {"type": "integer", "minimum": 0, "maximum": 100},
                        "enough_information": {"type": "integer", "minimum": 0, "maximum": 100},
                        "succinct": {"type": "integer", "minimum": 0, "maximum": 100},
                        "professional_tone": {"type": "integer", "minimum": 0, "maximum": 100},
                        "clarity_directness": {"type": "integer", "minimum": 0, "maximum": 100}
                     },
                     "required": ["similarity_in_length", "correctness", "enough_information", "succinct", "professional_tone", "clarity_directness"],
                     "additionalProperties": False
                  }},
                  "scores": {
                     "similarity_in_length": {"type": "integer", "parser": {"type": "json", "json_path": "similarity_in_length"}},
                     "correctness": {"type": "integer", "parser": {"type": "json", "json_path": "correctness"}},
                     "enough_information": {"type": "integer", "parser": {"type": "json", "json_path": "enough_information"}},
                     "succinct": {"type": "integer", "parser": {"type": "json", "json_path": "succinct"}},
                     "professional_tone": {"type": "integer", "parser": {"type": "json", "json_path": "professional_tone"}},
                     "clarity_directness": {"type": "integer", "parser": {"type": "json", "json_path": "clarity_directness"}}
                  }
               }
            }
         }
      }
   }
}

dpo_eval_job = client.evaluation.jobs.create(
   target={
      "type": "model",
      "model": {
         "api_endpoint": {
            "url": "http://modeldeployment-default-dpo-llama-8b:8000/v1/completions",
            "model_id": "default/test-notes-dpo-model"
         }
      }
   },
   config=dpo_eval_config
)
print(dpo_eval_job)

In [None]:
# Wait for DPO evaluation to complete
dpo_eval_job_id = dpo_eval_job.id
job_status = None
while job_status not in ("completed", "failed", "cancelled"):
   job = client.evaluation.jobs.retrieve(dpo_eval_job_id)
   job_status = job.status
   print(job_status)
   if job_status == "failed":
       print(job)
   time.sleep(10)

In [None]:
# View DPO evaluation results
results = client.evaluation.jobs.results(dpo_eval_job_id)
print(results.model_dump_json(indent=2))

In [None]:
# Download DPO evaluation results
results_zip = client.evaluation.jobs.download_results(dpo_eval_job_id)
results_zip.write_to_file('result_after_dpo.zip')
print("Download completed.")

###  Real Example: Post-SFT vs Post-DPO — *Same Case from Cell 89*

This continues the **Carpenter Hand Injury Case** from Cell 89, showing how DPO refines the SFT output into a more concise, physician-preferred style.

####  Post-DPO Model Evaluation Metrics

| Metric | Score | Interpretation |
|--------|-------|----------------|
| **Similarity in Length** | 95 | Excellent match to preferred length |
| **Correctness** | 98 | Clinical information highly accurate |
| **Enough Information** | 96 | All key details present |
| **Succinct** | 90 | Significantly reduced verbosity |
| **Professional Tone** | 98 | Clinical documentation style |
| **Clarity & Directness** | 97 | Clear, direct communication |

---

#### Example Conversation (Input) — *Same as Cell 89*

<div style="background: #1a1a2e; padding: 15px; border-radius: 8px; margin: 10px 0; ">
<pre style="color: #a0a0a0; font-size: 12px; white-space: pre-wrap; margin: 0;">Doctor: Good morning, what brings you here today?

Patient: Hi, Doctor. I was in an accident with a saw while I was working as a carpenter and I need to check on my hand.

Doctor: I see. Can you tell me what happened to your hand?

Patient: I had a saw accident and my left ring finger was cut off, and my middle finger was almost cut off and my little finger was fractured, and my thumb was injured.

Doctor: Hmm, I understand. How old are you?

Patient: I am 22 years old.

Doctor (checks the patient's hand): I see that you had a total amputation of your left ring finger and a near total amputation of your middle finger. And it looks like you had a comminuted fracture of your middle phalanx and an injury to your thumb.

Patient: Yes, that's right.

Doctor: I see that you underwent a replantation of your ring finger and a revascularization of your middle finger. And you also had an open reduction internal fixation of the fractures.

Patient: Yes, that's correct.

Doctor: How have you been feeling since the surgery?

Patient: I've been feeling good. I've been going through rehabilitation for the past three months.

Patient: It's been healing well. I've been able to go back to work as a carpenter.</pre>
</div>

---

####  Post-SFT Output — *Same as Cell 89* (1,806 characters)

<div style="background: #2a2a3e; padding: 15px; border-radius: 8px; margin: 10px 0; border-left: 4px solid #ffd700; ">
<pre style="color: #e0e0e0; font-size: 12px; white-space: pre-wrap; margin: 0;">Here is a clinical note based on the conversation:

**Patient:** [Patient's Name], 22-year-old male

**Chief Complaint:** Traumatic injury to left hand from saw accident

**History of Present Illness:**

Patient presented with a traumatic injury to his left hand, including total amputation of the left ring finger, near total amputation of the middle finger, comminuted fracture of the middle phalanx, and injury to the thumb. The patient underwent a replantation of the ring finger and revascularization of the middle finger, as well as an open reduction internal fixation of the fractures. The patient has been undergoing rehabilitation for the past three months and reports that his hand has been healing well.

**Physical Examination:**

Upon examination, the patient's left hand showed signs of a total amputation of the ring finger, near total amputation of the middle finger, and a comminuted fracture of the middle phalanx. The thumb also showed signs of injury. The patient's range of motion and sensation were normal. The patient's hand was found to be healing well with no signs of infection or complications.

**Assessment:**

The patient's condition is stable, and his hand is healing well. The patient has been able to return to work as a carpenter.

**Plan:**

The patient will continue with his rehabilitation and take care of his hand.

**Medications:** None prescribed.
**Follow-up:** The patient will follow up with the doctor in two weeks.
**Signature:** [Doctor's Signature]</pre>
<p style="color: #ffd700; font-size: 11px; margin: 8px 0 0 0;"><strong>Style issues:</strong>  Template format with headers •  Redundant sections •  Placeholder text [Patient's Name], [Doctor's Signature] •  1,806 characters — verbose</p>
</div>

---

####  Post-DPO Output (Physician-preferred concise style) — 623 characters

<div style="background: #1a3a2e; padding: 15px; border-radius: 8px; margin: 10px 0; border-left: 4px solid #4ade80;">
<pre style="color: #e0e0e0; font-size: 12px; white-space: pre-wrap; margin: 0;">This is 22 years old right handed male carpenter, his left hand was cut by electric saw sustained total amputation of his left ring finger just proximal to proximal interphalangeal joint & near total amputation of middle finger with cut both neurovascular junction at level of mid part of middle phalanx & open comminuted fracture of middle phalanx little finger with cut ulnar neurovascular bundle & tip injury thumb.

Replantation of the ring, revascularization of middle & open reduction internal fixation of the fractures was done. After a period of rehabilitation for three months patient was back to his original work.</pre>
<p style="color: #4ade80; font-size: 11px; margin: 8px 0 0 0;"><strong>Style improvements:</strong>  Direct clinical narrative •  No template headers •  No placeholders •  Precise anatomical detail •  Concise at 623 characters (65% reduction from SFT) •  Professional surgical note style</p>
</div>

---

###  Style Refinement: 65% Length Reduction

```
Post-SFT:  ████████████████████████████████████████  1,806 chars
Post-DPO:  ██████████████                            623 chars  (65% reduction)
```

<div style="background: #16213e; padding: 15px; border-radius: 8px; border-left: 4px solid #ffd700; margin: 15px 0;">
<strong style="color: #ffd700;"> Key Insight:</strong> <span style="color: #e0e0e0;">SFT teaches the model <em>what</em> information to include (FORMAT). DPO refines <em>how</em> to present it, preferring concise, direct clinical prose with precise anatomical terminology over verbose templates (STYLE).</span>
</div>

<div style="background: #1a1a2e; padding: 12px; border-radius: 6px; margin: 10px 0;">
<strong style="color: #00d4ff;">The Complete Pipeline (Same Carpenter Hand Injury Example Throughout):</strong>
<div style="color: #e0e0e0; margin-top: 8px;">
<strong style="color: #ff6b6b;">Baseline (Cell 89)</strong> → Template format, repetitive, asks for input, missing details<br/>
<strong style="color: #ffd700;">↓ SFT (Cell 89)</strong> → Learns format, includes all details, but verbose with headers (1,806 chars)<br/>
<strong style="color: #4ade80;">↓ DPO (This cell)</strong> → Concise narrative, precise terminology, physician-ready (623 chars)
</div>
</div>
