# Structured information extraction with a specialized small LLM

In this notebook we'll show how to do structured information extraction for the Parasol Insurance case data with a specialized Large Language Model (LLM). This LLM was fine-tuned specifically for extracting structured information and can thus be much smaller than a similarly capable generic LLM. That means we can also run it on CPU if latency is a not a critical factor.
See the Parasol Insurance Lab extraction notebook for a comparison: https://github.com/rh-aiservices-bu/parasol-insurance/blob/main/lab-materials/03/03-03-information-extraction.ipynb
In the Parasol Insurance Lab notebook the LLM is specifically prompted to extract the desired information, one data point at a time. Having the extracted information in a structured way has the benefit that you can work with it programmatically, e.g query a database for the extracted policy number. Proprietary LLM APIs can do structured extraction quite easily but this is hard to replicate with smaller generic open-weight models.

### Insurance Case Data

First let's have a look at the unstructured insurance cases:

In [1]:
import json
# Load cases
cases_jsonl = '../testing/claims/insurance_claim_reports.jsonl'
cases = []

# Open the JSONL file and load each line as a dictionary
with open(cases_jsonl, 'r', encoding='utf-8') as file:
    for line in file:
        cases.append(json.loads(line.strip()))

In [2]:
print(cases[0]['description'])


        Dear Parasol Insurance,

        My name is Eric Cline, and I am writing to file a claim for a recent car accident that occurred on 2024-01-06, 
        at approximately 6:30 PM. My policy number is BC-857143475.

        The accident took place at the intersection of Elm Ln and Pine Blvd. I was driving my vehicle, a yellow Ford Traverse with license plate 
        number 614 7962. At the same time, another vehicle, a blue BMW Civic with license plate number 094-RGL, 
        collided with my car. The driver, Samantha Juarez, failed to adhere to traffic rules, resulting in damage to both vehicles.

        I promptly exchanged information with the other driver and took photos of the accident scene, including damages to both vehicles.
        Attached to this email are the photos, a copy of the police report, and the estimate for the repair costs.

        Kindly assist in processing this claim and let me know the next steps. You can reach me at 242.261.8544 or bairddennis@vazq

### Framework Setup

First we need to import all necessary libraries. We'll use agno (a lightweight agent framework) and Ollama (a user-friendly tool to run LLMs on GPU and even CPU commodity hardware).

In [3]:
!uv pip install rich agno ollama -q

In [4]:
from rich.pretty import pprint
from agno.agent import Agent
from agno.models.ollama import Ollama

With Ollama we can either develop on a laptop or use the serving runtime on OpenShift.

In [5]:
OLLAMA_HOST="localhost"
#OLLAMA_HOST="semantic-sonnenschirm-predictor.demo.svc.cluster.local"

In [6]:
# Test Ollama client connectivity
from ollama import Client
client = Client(
  host='http://'+OLLAMA_HOST+':11434',
)

Test the connectivity to Ollama and list all available models. Ollama can load models on the fly, which helps a lot in the earlier more experimental stages of development.

In [7]:
client.list()

ListResponse(models=[Model(model='sroecker/nuextract-tiny-v1.5:latest', modified_at=datetime.datetime(2025, 11, 11, 11, 48, 4, 601307, tzinfo=TzInfo(3600)), digest='be09317bfbbb1a839333bddaa0b13dbd59e1438ce11cebcd92430878a5cc9c3f', size=994157142, details=ModelDetails(parent_model='', format='gguf', family='qwen2', families=['qwen2'], parameter_size='494.03M', quantization_level='F16')), Model(model='Osmosis/Osmosis-Structure-0.6B:latest', modified_at=datetime.datetime(2025, 11, 11, 9, 12, 53, 635362, tzinfo=TzInfo(3600)), digest='f24ec096ac55ebf5641642423312930cca59ce05f31cc7e46212bcbb89a8070e', size=1198178321, details=ModelDetails(parent_model='', format='gguf', family='qwen3', families=['qwen3'], parameter_size='596M', quantization_level='unknown')), Model(model='granite4:1b', modified_at=datetime.datetime(2025, 11, 4, 12, 45, 36, 869590, tzinfo=TzInfo(3600)), digest='26e4cb132798939e63dbeae925ff3dd999aeac034edebefd325b6c5ff44df58c', size=3267420327, details=ModelDetails(parent_mod

### Structured extraction

We'll use a small specialized LLM fine-tuned by NuMind to extract information in a structured way. 

"NuExtract is a family of small open-source models that do only one thing: they extract information from documents and return a structured output (JSON). It turns out that, because they only do this one thing, they are very good at it." from https://numind.ai/blog/nuextract-1-5---multilingual-infinite-context-still-small-and-better-than-gpt-4o

"NuExtract-tiny-v1.5 is a fine-tuning of Qwen/Qwen2.5-0.5B, trained on a private high-quality dataset for structured information extraction. It supports long documents and several languages (English, French, Spanish, German, Portuguese, and Italian). To use the model, provide an input text and a JSON template describing the information you need to extract."
from https://huggingface.co/numind/NuExtract-1.5-tiny

<img src="nuextract-benchmark.png" width="512"/>

The zero-shot results of the NuExtract v1.5 model even beats GPT-4o, while the tiny version is much smaller and still has competitive accuracy.

First we pull the model, then we define the "agent" and make sure to set the temperate (think of it as the variability and randomness of the sampling of tokens) to zero for consistent results.

In [8]:
client.pull("sroecker/nuextract-tiny-v1.5")

ProgressResponse(status='success', completed=None, total=None, digest=None)

In [9]:
extract_agent = Agent(
    model=Ollama(id="sroecker/nuextract-tiny-v1.5", options={"temperature": 0}, host=OLLAMA_HOST),
    description="You extract information.",
    structured_outputs=True,
)

Now we define a simple helper function that produces the template needed for NuExtract models. The template can be freely modified to extract information in the desired structure. First we start with something simple, the basic customer information:

In [10]:
def predict_nuextract(input_text):
    template = """
    {
        "Customer": {
            "Name": "",
            "Policy Number": "",
            "Telephone Number": "",
            "Email Address": "",
        }
    }
    """
    template = f"""<|input|>\n ### Template:\n{template}\n### Text:\n{input_text}\n\n<|output|>"""

    return template

Let's have a look at the exact prompt that we are feeding to the model:

In [11]:
print(predict_nuextract(cases[0]['description']))

<|input|>
 ### Template:

    {
        "Customer": {
            "Name": "",
            "Policy Number": "",
            "Telephone Number": "",
            "Email Address": "",
        }
    }
    
### Text:

        Dear Parasol Insurance,

        My name is Eric Cline, and I am writing to file a claim for a recent car accident that occurred on 2024-01-06, 
        at approximately 6:30 PM. My policy number is BC-857143475.

        The accident took place at the intersection of Elm Ln and Pine Blvd. I was driving my vehicle, a yellow Ford Traverse with license plate 
        number 614 7962. At the same time, another vehicle, a blue BMW Civic with license plate number 094-RGL, 
        collided with my car. The driver, Samantha Juarez, failed to adhere to traffic rules, resulting in damage to both vehicles.

        I promptly exchanged information with the other driver and took photos of the accident scene, including damages to both vehicles.
        Attached to this email are t

With this template we can just run the LLM. For the developer this is as simple as calling an API:

In [12]:
result = extract_agent.run(predict_nuextract(cases[0]['description']))

Let's parse the result and extract information, like the customer name or policy number, from the structured output:

In [13]:
parsed_result = json.loads(result.content)
pprint(parsed_result)

In [14]:
parsed_result['Customer']['Name']

'Eric Cline'

In [15]:
parsed_result['Customer']['Policy Number']

'BC-857143475'

In order to extract additional information we can simply edit the template. Like in InstructLab with YAML you only need to know how to edit JSON. No complicated regular expression or programming knowledge needed. Let's add the customer vehicle information and the date, time and location of the  accident:

In [16]:
def predict_nuextract(input_text):
    template = """
    {
        "Customer": {
            "Name": "",
            "Address": "",
            "Policy Number": "",
            "Telephone Number": "",
            "Email Address": "",
            "Vehicle": "",
        },
        "Case": {
            "Accident Location": "",
            "Date and Time": "",
        }
    }
    """
    template = f"""<|input|>\n ### Template:\n{template}\n### Text:\n{input_text}\n\n<|output|>"""

    return template

Let's check the result with the new template:

In [17]:
result = extract_agent.run(predict_nuextract(cases[0]['description']))

In [18]:
parsed_result = json.loads(result.content)
pprint(parsed_result)

In this case the customer address was not recognized and extracted even though it is clearly contained in the case data. In order to test the performance more programmatically we've generated 100 synthetic test cases including ground truth data to check the performance of different model flavours (e.g tiny & "smol") and the effect of quantization quantizations. We've found that the FP16 quantization of the NuExtract tiny model yields the best performance/speed trade-off with 99% accuracy.