Skip to content

[bug] JSON schema validation throws an exception when an LLM generates an array instead of an object #594

@chomechome

Description

@chomechome

Describe the bug
When an LLM generates a JSON array instead of an object, guard.__call__ and guard.parse fail with an exception. This breaks the reask cycle and necessitates extra try/except blocks in the calling code.

Traceback (most recent call last):
  File "<redacted>", line 16, in <module>
    guard.parse(llm_output)
  File "<redacted>/lib/python3.9/site-packages/guardrails/guard.py", line 810, in parse
    return guard_context.run(
  File "<redacted>/lib/python3.9/site-packages/guardrails/guard.py", line 797, in __parse
    return self._sync_parse(
  File "<redacted>/lib/python3.9/site-packages/guardrails/guard.py", line 861, in _sync_parse
    call = runner(call_log=call_log, prompt_params=prompt_params)
  File "<redacted>/lib/python3.9/site-packages/guardrails/run.py", line 226, in __call__
    raise e
  File "<redacted>/lib/python3.9/site-packages/guardrails/run.py", line 176, in __call__
    iteration = self.step(
  File "<redacted>/lib/python3.9/site-packages/guardrails/utils/telemetry_utils.py", line 216, in to_trace_or_not_to_trace
    return fn(*args, **kwargs)
  File "<redacted>/lib/python3.9/site-packages/guardrails/run.py", line 338, in step
    raise e
  File "<redacted>/lib/python3.9/site-packages/guardrails/run.py", line 321, in step
    validated_output = self.validate(
  File "<redacted>/lib/python3.9/site-packages/guardrails/run.py", line 619, in validate
    validated_output = output_schema.validate(
  File "<redacted>/lib/python3.9/site-packages/guardrails/schema/json_schema.py", line 306, in validate
    raise TypeError(f"Argument `data` must be a dictionary, not {type(data)}.")
TypeError: Argument `data` must be a dictionary, not <class 'list'>.

Here is what the LLM generated (simplified from the real example):

[
  {"field": "hello, world!"}
]

To Reproduce

import pydantic
from guardrails import Guard


class Foo(pydantic.BaseModel):
    field: str


llm_output = '''```json
[
  {"field": "hello, world!"}
]
```'''

guard = Guard.from_pydantic(output_class=Foo)
guard.parse(llm_output)

Expected behavior
Since the schema assumes top-level type to be object, this should not match the schema and should return ValidationOutcome with validation_passed == False.

Library version:
guardrails-ai==0.4.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions