# OIE Single Note Conversion 

This tutorial notebook shows how to send a single clinical note to a local LLM and obtain JSON/YAML/XML representations. It uses the same local Hugging Face models (phi-4, phi-3.5, Llama 3, Qwen 3) and the shared `LocalLLMExtractor` used by the CLI.

## 1. Environment and paths

In [7]:
from pathlib import Path
import torch
import sys

PROJECT_ROOT = Path.cwd().parent
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

SAMPLES_DIR = PROJECT_ROOT / 'samples'
OUTPUT_DIR = PROJECT_ROOT / 'outputs_single_note'
OUTPUT_DIR.mkdir(exist_ok=True)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
device, PROJECT_ROOT


('cuda',
 PosixPath('/home/cssmuadm/NLP/Talk_Artifacts/slm_med_talk/OIE_toolkit'))

## 2. Choose a local model

In [8]:
# Example local instruction-tuned models for OIE:
#   microsoft/phi-4
#   microsoft/Phi-3.5-mini-instruct
#   meta-llama/Meta-Llama-3.1-8B-Instruct
#   Qwen/Qwen3-14B-Instruct
MODEL_NAME = 'microsoft/phi-4'
MODEL_NAME


'microsoft/phi-4'

## 3. Build the shared extractor (same config as CLI)

In [9]:
from oie_toolkit.generation import LocalGenerationConfig, LocalLLMExtractor

config = LocalGenerationConfig(
    model_name=MODEL_NAME,
    temperature=None,   # greedy decoding (do_sample=False)
    top_p=None,         # no nucleus sampling by default
    max_new_tokens=8192,
    torch_dtype='bfloat16',
    device_map='cuda' if device == 'cuda' else 'cpu',
)
extractor = LocalLLMExtractor(config=config)
extractor


Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

Device set to use cuda


<oie_toolkit.generation.LocalLLMExtractor at 0x7f0fbb3527a0>

## 4. Inspect the source note

In [10]:
note_path = SAMPLES_DIR / 'note1.txt'
note_text = note_path.read_text(encoding='utf-8')
print(f'Loaded {note_path.name} (note_id={note_path.stem})')
note_text[:1000]

Loaded note1.txt (note_id=note1)


'Patient Name: Ana Patel\nDOB: 1987-03-18\nEncounter Date: 2024-10-21\nMRN: CLN-1001\n\nChief Complaint:\n  Persistent headache and intermittent blurry vision over the past week.\n\nHistory of Present Illness:\n  Patient reports throbbing pain behind the eyes, worsened by screen time.\n  Denies nausea or vomiting. Sleep schedule irregular due to night shifts.\n\nMedications:\n  - ibuprofen 400 mg PRN\n  - oral contraceptive pill (ethinyl estradiol/norgestimate)\n\nAllergies:\n  - penicillin (rash)\n\nVitals:\n  BP 128/82 mmHg, HR 78 bpm, Temp 98.4 F, BMI 24.6\n\nAssessment:\n  Likely tension-type headache triggered by eye strain and poor sleep hygiene.\n\nPlan:\n  Initiate naproxen 500 mg BID with food for 5 days.\n  Recommend ergonomic workstation adjustments and 20-20-20 eye rest rule.\n  Schedule optometry referral; follow-up in two weeks.\n'

## 5. Preview the JSON prompt

In [11]:
from oie_toolkit import build_conversion_prompt

prompt = build_conversion_prompt(note_text, note_id=note_path.stem, output_format='json')
print(prompt[:1200])

    Given the following document: \n Patient Name: Ana Patel
DOB: 1987-03-18
Encounter Date: 2024-10-21
MRN: CLN-1001

Chief Complaint:
  Persistent headache and intermittent blurry vision over the past week.

History of Present Illness:
  Patient reports throbbing pain behind the eyes, worsened by screen time.
  Denies nausea or vomiting. Sleep schedule irregular due to night shifts.

Medications:
  - ibuprofen 400 mg PRN
  - oral contraceptive pill (ethinyl estradiol/norgestimate)

Allergies:
  - penicillin (rash)

Vitals:
  BP 128/82 mmHg, HR 78 bpm, Temp 98.4 F, BMI 24.6

Assessment:
  Likely tension-type headache triggered by eye strain and poor sleep hygiene.

Plan:
  Initiate naproxen 500 mg BID with food for 5 days.
  Recommend ergonomic workstation adjustments and 20-20-20 eye rest rule.
  Schedule optometry referral; follow-up in two weeks.
. Extract all data in JSON format.
    Make sure that the JSON document is valid, provide reasonably detailed names for fields.

    If s

## 6. Generate JSON/YAML/XML with the local model

In [None]:
artifacts = {}
for fmt in ['json', 'yaml', 'xml']:
    payload = extractor.convert_note(note_text, note_id=note_path.stem, fmt=fmt)
    target = OUTPUT_DIR / f'{note_path.stem}.{fmt}'
    target.write_text(payload + '', encoding='utf-8')
    artifacts[fmt] = payload[:400]
artifacts
