# NER with the LLM Claude using a few-shot-prompt

This script facilitates semi-automatic named entity recognition (NER) in TEI/XML-encoded historical French texts using the large language model Claude. It operates in two main phases:

Annotation Phase: The script loads a TEI-encoded XML file, extracts each paragraph (\<p> element), and submits it to Claude via a few-shot-prompt. The prompt instructs the model to identify and wrap only person and place names using \<persName> and \<placeName>, while preserving all other existing XML tags. The output is validated for structural correctness and reintegrated into the TEI document, which is then saved with annotated entities.

In [1]:
import os
import anthropic
from lxml import etree

# === STEP 1: Claude Setup ===
api_key = ""
client = anthropic.Anthropic(api_key=api_key)

# === STEP 2: Load TEI Document ===
tei_path = "input/french_plain.xml"
tree = etree.parse(tei_path)
ns = {'tei': 'http://www.tei-c.org/ns/1.0'}
paragraphs = tree.xpath('//tei:p', namespaces=ns)

# === STEP 3: Claude Annotation Function ===
def annotate_whole_paragraph_with_claude(paragraph_xml):
    prompt = (
       "You will receive a <p> element from a TEI/XML document. "
        "Only wrap PERSON names in <persName> and PLACE names in <placeName>. "
        "Do not remove, change, or add anything else. Keep all tags (e.g. <app>, <rdg>, <ptr>, etc.) exactly as they are.\n\n"
        "Do NOT wrap the output in any additional <root> or <document> tag.\n\n"
        "Annotate also indirectly mentioned entities like 'ma tante', 'mes enfants', 'mon beau-frère' or 'son page'."
        "When you come across entities like 'comte de Schwerin' or 'madame de Strattmann' include the role title."
        "Do not annotate religious entities."
        "Input 1: Dès que nous fûmes à Wesel, on m’en<ptr target=\"facs_51\"/>voya chez mon"
        "oncle qui ne demeurait qu’à douze lieues de chez ma tante."
        "Output 1: Dès que nous fûmes à <placeName>Wesel</placeName>, on m’en<ptr target=\"facs_51\"/>"
        "voya chez <persName>mon oncle</persName> qui ne demeurait qu’à douze lieues de"
        "chez <persName>ma tante</persName>."
        "Input 2: Le comte de Schwerin se remettait, de même que mon fils, et les alarmes cessèrent."
        "Output 2: Le <persName>comte de Schwerin</persName> se remettait, de même que"
        "<persName>mon fils</persName>, et les alarmes cessèrent."
        "Input 3: <seg ana=\"#R11\">La comtesse de Dönhoff</seg> se releva bientôt de ses couches et on commença les assemblées. Il y en avait"
        "chez le duc de H. gouverneur de Königsberg, chez le feld-maréchal comte de Dohna et enfin chez toute la noblesse."
        "Output 3: <seg ana=\"#R11\">La <persName>comtesse de Dönhoff</persName></seg> se releva bientôt de ses couches et on commença les assemblées. Il y en avaitchez le <persName>duc de H. gouverneur de Königsberg</persName>, chez le <persName>feld-maréchal comte de Dohna</persName> etenfin chez toute la noblesse."
        "Input 4: Je ne dois pas omettre ici que le discours de ma bonne et fidèle marchande d’étain, sur le sujet du Grand Doyen le baron de Schenck, me tenait toujours à l’esprit. Je consultai le R.P. Nahseret il conféraavec des personnes de probité."
        "Output 4: Je ne dois pas omettre ici que le discours de ma bonne et fidèle <persName>marchande</persName> d’étain, sur le sujet du <persName>Grand Doyen le baron de Schenck</persName>, me tenait toujours àl’esprit. Je consultai le <persName>R.P. Nahser</persName> et il conféraavec des personnes de probité."
        "Input 5: Ces deux messieurs et monsieur de Dankelmann étaient notre compagnie ordinaire, et y soupaient avec le ministre et le secrétaire."
        "Output 5: Ces deux messieurs et <persName>monsieur de Dankelmann</persName> étaient notre compagnie ordinaire, et y soupaient avec <persName>le ministre</persName> et <persName>le secrétaire</persName>."
        "Input 6: Je fus trouver ma chère Comtesse, où je trouvai le R.P. Brean."
        "Output 6: Je fus trouver <persName>ma chère Comtesse</persName>, où je trouvai le <persName>R.P. Brean</persName>."
        f"Input:\n{paragraph_xml}\n\n"
        "Output:\n"
    )

    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=20000,
        temperature=0.0,
        system="You are an expert in TEI/XML and named entity recognition.",
        messages=[
            {
                "role": "user",
                "content": prompt
            }
        ]
    )

    return message.content[0].text.strip()  # Claude's response is a list of message blocks

# === STEP 4: Process and Annotate Each <p> Element ===
failures = []

for p in paragraphs:
    xml_id = p.attrib.get('{http://www.w3.org/XML/1998/namespace}id', '')
    print(f"🔍 Annotating paragraph {xml_id}...")

    paragraph_xml = etree.tostring(p, encoding="unicode")

    try:
        annotated_xml = annotate_whole_paragraph_with_claude(paragraph_xml)

        try:
            new_p = etree.fromstring(annotated_xml.encode("utf-8"))
            parent = p.getparent()
            parent.replace(p, new_p)
        except Exception as parse_error:
            print(f"XML parsing failed for {xml_id}: {parse_error}")
            failures.append((xml_id, "Invalid XML from Claude"))
            continue

    except Exception as e:
        print(f"Claude call failed for paragraph {xml_id}: {e}")
        failures.append((xml_id, str(e)))
        continue

# === STEP 5: Write Result ===
output_path = "output/annotated_with_claude.xml"
tree.write(output_path, encoding="utf-8", xml_declaration=True, pretty_print=True)
print(f"\nTEI document saved to: {output_path}")

# === STEP 6: Report Failures ===
if failures:
    print("\nThese paragraphs failed to annotate:")
    for fid, reason in failures:
        print(f"- {fid}: {reason}")


🔍 Annotating paragraph ...
Claude call failed for paragraph : "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted"
🔍 Annotating paragraph ...
Claude call failed for paragraph : "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted"
🔍 Annotating paragraph P.5...
Claude call failed for paragraph P.5: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted"
🔍 Annotating paragraph P.18...
Claude call failed for paragraph P.18: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted"
🔍 Annotating paragraph P.31...
Claude call