# Testcase from Datamodel and PDF OCR (Variant 3)

This notebook documents the workflow to generate a new XML testcase by combining:
- OCR of a selected PDF (remixed from the Model Generation workflow)
- A selected Datamodel (XML)
- Optional notes to steer value selection (header/metadata)

It is instructional and not meant to be fully runnable.


## Inputs and preparation

- PDF document (from storage)
- Datamodel XML (from text or storage)
- Optional notes (focus on recipient, dates, references)



## System prompt (exact)

Matches the server route `app/api/workflows/testcase/variant3/route.ts`.



In [None]:
SYSTEM = """
You generate a single XML testcase that conforms to a given datamodel (XML).

Use the provided OCR markdown (from a letter-like PDF) to populate realistic values. Focus on header/metadata fields typically found in letters:
- Recipient data (names, address lines, ZIP + City)
- Sender/company names if present
- Letter/document date(s)
- Reference numbers if clearly present

Important rules:
- Produce one well-formed XML instance only. No prose, no markdown fences.
- Follow the datamodel semantics closely. Use sensible values inferred from the OCR where possible; otherwise choose concise realistic placeholders.
- Do not include the full letter body; prefer structured/metadata fields relevant to recipients, addresses, names, dates.
"""


## User prompt (exact)

Mirrors the user message used by the server route.


In [None]:
USER = f"""
Datamodel (XML):
<Datamodel> ... </Datamodel>

OCR Markdown (pages may be separated by === PAGE N ===):
=== PAGE 1 ===
...

Optional notes:
- focus on recipient
- prefer first date

Task: Generate ONE XML testcase instance matching the datamodel. Populate address/name/date/reference fields using the OCR when present. Return ONLY the XML.
""".strip()


## Call sequence (abstract)

1. Accept POST with `documentFileId`, `dataModelFileId`/`dataModelText`, and optional `description`.
2. OCR the PDF to markdown via Mistral OCR.
3. Load the Datamodel XML.
4. Compose `SYSTEM`/`USER` and invoke.
5. Extract XML (prefer fenced XML first), return `{ xml, markdown }`.



## Illustrative provider call (pseudo-Python)

Replace with your preferred SDK.


In [None]:
# Pseudo-code; not meant to run as-is
# from anthropic import Anthropic
# client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
# resp = client.messages.create(
#   model="claude-sonnet-4-0",
#   messages=[
#     {"role": "system", "content": SYSTEM},
#     {"role": "user", "content": USER},
#   ],
#   max_tokens=2000,
# )
# raw = resp.content[0].text



## Summary

- Inputs: PDF document, Datamodel XML, optional notes
- Steps: OCR to markdown → prompt with datamodel + OCR → extract XML
- Output: `{ xml, markdown }`

See server implementation at `app/api/workflows/testcase/variant3/route.ts`. Open the UI at `app/workflows/testcase/page.tsx` (Variant 3 section).
