# NuExtract 2.0 Inference

In this notebook we will provide examples of how to use the NuExtract 2.0 models for inference.

First, let's load a model.

In [1]:
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq

model_name = "numind/NuExtract-2.0-2B"
# model_name = "numind/NuExtract-2.0-8B"

processor = AutoProcessor.from_pretrained(model_name, 
                                          trust_remote_code=True, 
                                          padding_side='left',
                                          use_fast=True)
model = AutoModelForVision2Seq.from_pretrained(model_name, 
                                               trust_remote_code=True, 
                                               torch_dtype=torch.bfloat16,
                                               attn_implementation="flash_attention_2",
                                               device_map="auto")

preprocessor_config.json:   0%|          | 0.00/765 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/6.94k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/613 [00:00<?, ?B/s]

chat_template.json:   0%|          | 0.00/2.12k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.49k [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/57.6k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.93G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.69G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/260 [00:00<?, ?B/s]

## Preparing Model Inputs

Before using the model, we also need to make sure our prompts are properly formatted to work with NuExtract. NuExtract expects all input information to come as a single user chat prompt, formatted as follows:

```python
f"""
# Template:
{template}
# Context:
{text}
"""
```
and if in-context examples are provided:
```python
f"""
# Template:
{template}
# Examples
## Input:
{input1}
## Output:
{output1}
## Input:
{input2}
## Output:
{output2}
# Context:
{text}
"""
```

If you are working with image inputs, you should use image placeholders for `text`, `output1`, etc. Later, we will inject tokens representing the actual image content in the location of these placeholders.

The following functions will make this formatting more convenient for us.

In [2]:
def construct_messages(document, template, examples=None, image_placeholder="<|vision_start|><|image_pad|><|vision_end|>"):
    """
    Construct the individual NuExtract message texts, prior to chat template formatting.
    """
    images = []
    # add few-shot examples if needed
    if examples is not None and len(examples) > 0:
        icl = "# Examples:\n"
        for row in examples:
            example_input = row['input']
            
            if not isinstance(row['input'], str):
                example_input = image_placeholder
                images.append(row['input'])
                
            icl += f"## Input:\n{example_input}\n## Output:\n{row['output']}\n"
    else:
        icl = ""
        
    # if input document is an image, set text to an image placeholder
    text = document
    if not isinstance(document, str):
        text = image_placeholder
        images.append(document)
    text = f"""# Template:\n{template}\n{icl}# Context:\n{text}"""
    
    messages = [
        {
            "role": "user",
            "content": [{"type": "text", "text": text}] + images,
        }
    ]
    return messages

## Inference
### Basic Example

Now we are ready to run the model!

Let's start with a basic text-only example, where we want to extract peoples' names from a short text.

In [3]:
from qwen_vl_utils import process_vision_info

template = """{"names": ["verbatim-string]}"""
document = "John went to the restaurant with Mary. James went to the cinema."

# prepare the user message content
messages = construct_messages(document, template)
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True,
)

image_inputs = process_vision_info(messages)[0]
inputs = processor(
    text=[text],
    images=image_inputs,
    padding=True,
    return_tensors="pt",
).to("cuda")

Our NuExtract message is now formatted in standard chat template formatting; the tokenized version (`inputs`) will be given directly to the model.

In [4]:
print(text)

<|im_start|>user
# Template:
{"names": ["verbatim-string]}
# Context:
John went to the restaurant with Mary. James went to the cinema.<|im_end|> 
<|im_start|>assistant



The other `image_inputs` are empty in this case because this is a text-only example.

In [5]:
print(image_inputs)

None


Now let's actually run the model.

In [6]:
# we choose greedy sampling here, which works well for most information extraction tasks
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}

# Inference: Generation of the output
generated_ids = model.generate(
    **inputs,
    **generation_config
)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)



['{"names": ["John", "Mary", "James"]}']


### In-Context Examples

Sometimes the model might not perform as well as we want because our task is challenging or involves some degree of ambiguity. Alternatively, we may want the model to follow some specific formatting, or just give it a bit more help. In cases like this it can be valuable to provide "in-context examples" to help NuExtract better understand the task.

To do so, we can provide a list `examples` to `construct_messages` which contain dictionaries of input/output pairs. In the example below, we show to the model that we want the extracted names to be in captial letters with `++` on either side (for the sake of illustration).

In [7]:
template = """{"names": ["verbatim-string"]}"""
text = "John went to the restaurant with Mary. James went to the cinema."
examples = [
    {
        "input": "Stephen is the manager at Susan's store.",
        "output": """{"names": ["++STEPHEN++", "++SUSAN++"]}"""
    }
]

messages = construct_messages(text, template, examples)

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True,
)

image_inputs = process_vision_info(messages)[0]
inputs = processor(
    text=[text],
    images=image_inputs,
    padding=True,
    return_tensors="pt",
).to("cuda")

We can see below that the in-context example has now been included in the model prompt, specifically between the template and context components.

In [8]:
print(text)

<|im_start|>user
# Template:
{"names": ["verbatim-string"]}
# Examples:
## Input:
Stephen is the manager at Susan's store.
## Output:
{"names": ["++STEPHEN++", "++SUSAN++"]}
# Context:
John went to the restaurant with Mary. James went to the cinema.<|im_end|> 
<|im_start|>assistant



In [9]:
# we choose greedy sampling here, which works well for most information extraction tasks
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}

# Inference: Generation of the output
generated_ids = model.generate(
    **inputs,
    **generation_config
)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

['{"names": ["++JOHN++", "++MARY++", "++JAMES++"]}']


To get even better performance, add multiple in-context examples to your input.

### Image Inputs

If we want to give image inputs to NuExtract, instead of text, we simply provide a dictionary specifying the desired image file as the message content, instead of a string. E.g. `{"type": "image", "image": "file://image.jpg"}`.

You can also specify an image URL (e.g. `{"type": "image", "image": "http://path/to/your/image.jpg"}`) or base64 encoding (e.g. `{"type": "image", "image": "data:image;base64,/9j/..."}`).

In the example below, we give an image of a receipt (`1.jpg`) and ask the model to extract the name of the store. We also provide an ICL example of a receipt from Walmart (`0.jpg`).

In [10]:
template = """{"store": "verbatim-string"}"""
examples = [
    {
        "input": {"type": "image", "image": "file://0.jpg"},
        "output": """{"store": "WALMART"}"""
    }
]
document = {"type": "image", "image": "file://1.jpg"}

messages = construct_messages(document, template, examples)

text = processor.tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True,
)

image_inputs = process_vision_info(messages)[0]
inputs = processor(
    text=[text],
    images=image_inputs,
    padding=True,
    return_tensors="pt",
).to("cuda")

Just like in the text-only case above, our in-context example has been included in the prompt before the main context.

In [11]:
print(text)

<|im_start|>user
# Template:
{"store": "verbatim-string"}
# Examples:
## Input:
<|vision_start|><|image_pad|><|vision_end|>
## Output:
{"store": "WALMART"}
# Context:
<|vision_start|><|image_pad|><|vision_end|><|im_end|> 
<|im_start|>assistant



Now if we look at `image_inputs` we will see that it contains actual images. When we pass this along with `text` to the `processor` it will automatically encode the images and inject a tokenized representation into the image placeholders within `text`.

In [12]:
print(image_inputs)

[<PIL.Image.Image image mode=RGB size=588x896 at 0x7FD1CCFF1A20>, <PIL.Image.Image image mode=RGB size=476x980 at 0x7FD1CCFF1750>]


In [13]:
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}

# Inference: Generation of the output
generated_ids = model.generate(
    **inputs,
    **generation_config
)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

['{"store": "TRADER JOE\'S"}']


### Batched Inference

Finally, we can run batched inference over a list of input examples, regardless of whether they contain text, images, and/or ICL examples.

In [14]:
inputs = [
    # image input with no ICL examples
    {
        "document": {"type": "image", "image": "file://0.jpg"},
        "template": """{"store_name": "verbatim-string"}""",
    },
    # image input with 1 ICL example
    {
        "document": {"type": "image", "image": "file://1.jpg"},
        "template": """{"store_name": "verbatim-string"}""",
        "examples": [
            {
                "input": {"type": "image", "image": "file://0.jpg"},
                "output": """{"store_name": "Walmart"}""",
            }
        ],
    },
    # text input with no ICL examples
    {
        "document": "John went to the restaurant with Mary. James went to the cinema.",
        "template": """{"names": ["verbatim-string"]}""",
    },
    # text input with ICL example
    {
        "document": "John went to the restaurant with Mary. James went to the cinema.",
        "template": """{"names": ["verbatim-string"]}""",
        "examples": [
            {
                "input": "Stephen is the manager at Susan's store.",
                "output": """{"names": ["STEPHEN", "SUSAN"]}"""
            }
        ],
    },
]

messages = [
    construct_messages(
        x["document"], 
        x["template"], 
        x["examples"] if "examples" in x else None
    ) for x in inputs
]

# apply chat template to each example individually
texts = [
    processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
    for msg in messages
]

image_inputs= process_vision_info(messages)[0]
inputs = processor(
    text=texts,
    images=image_inputs,
    padding=True,
    return_tensors="pt",
).to("cuda")


generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}

# Batch Inference
generated_ids = model.generate(**inputs, **generation_config)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_texts = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
for y in output_texts:
    print(y)

{"store_name": "WAL-MART"}
{"store_name": "Trader Joe's"}
{"names": ["John", "Mary", "James"]}
{"names": ["JOHN", "MARY", "JAMES"]}
