# Fuzzy JSON Parser - Robust JSON Parsing with Error Correction

The fuzzy JSON parser handles malformed JSON from LLM outputs and other unreliable sources. It provides:

**Core Features:**
- **Multi-Stage Parsing**: Progressive fallback strategy (direct → clean → fix)
- **Quote Correction**: Converts single quotes to double quotes
- **Bracket Balancing**: Auto-closes unclosed brackets and braces
- **Whitespace Normalization**: Cleans up irregular spacing
- **Trailing Comma Removal**: Strips commas before closing brackets
- **Unquoted Key Fixing**: Ensures object keys are quoted

In [1]:
from lionherd_core.libs.string_handlers import fuzzy_json

## 1. Basic Usage - Valid JSON

Works with standard JSON strings.

In [2]:
# Valid JSON object
valid_obj = '{"name": "Alice", "age": 30, "active": true}'
result = fuzzy_json(valid_obj)
print(f"Type: {type(result)}")
print(f"Data: {result}")

Type: <class 'dict'>
Data: {'name': 'Alice', 'age': 30, 'active': True}


In [3]:
# Valid JSON array
valid_array = '[{"id": 1}, {"id": 2}, {"id": 3}]'
result = fuzzy_json(valid_array)
print(f"Type: {type(result)}")
print(f"Length: {len(result)}")
print(f"Data: {result}")

Type: <class 'list'>
Length: 3
Data: [{'id': 1}, {'id': 2}, {'id': 3}]


## 2. Single Quote Correction

Automatically converts single quotes to double quotes - common in Python-style output.

In [4]:
# Single quotes instead of double
single_quotes = "{'name': 'Bob', 'role': 'engineer'}"
result = fuzzy_json(single_quotes)
print(f"✓ Parsed: {result}")

✓ Parsed: {'name': 'Bob', 'role': 'engineer'}


In [5]:
# Mixed quotes
mixed_quotes = "{'user': \"alice\", 'id': 123}"
result = fuzzy_json(mixed_quotes)
print(f"✓ Parsed: {result}")

✓ Parsed: {'user': 'alice', 'id': 123}


## 3. Trailing Comma Removal

Strips trailing commas that cause JSON parse errors.

In [6]:
# Trailing comma in object
trailing_obj = '{"x": 1, "y": 2,}'
result = fuzzy_json(trailing_obj)
print(f"✓ Parsed: {result}")

✓ Parsed: {'x': 1, 'y': 2}


In [7]:
# Trailing comma in array of objects
trailing_array = '[{"id": 1}, {"id": 2},]'
result = fuzzy_json(trailing_array)
print(f"✓ Parsed: {result}")

✓ Parsed: [{'id': 1}, {'id': 2}]


## 4. Unquoted Key Fixing

Ensures object keys are properly quoted - common in JavaScript-style output.

In [8]:
# Unquoted keys
unquoted = '{name: "Charlie", age: 25}'
result = fuzzy_json(unquoted)
print(f"✓ Parsed: {result}")

✓ Parsed: {'name': 'Charlie', 'age': 25}


In [9]:
# Complex nested unquoted keys
nested_unquoted = '{user: {name: "Dave", role: "admin"}, active: true}'
result = fuzzy_json(nested_unquoted)
print(f"✓ Parsed: {result}")

✓ Parsed: {'user': {'name': 'Dave', 'role': 'admin'}, 'active': True}


## 5. Bracket Balancing

Auto-closes unclosed brackets and braces - critical for truncated LLM output.

In [10]:
# Unclosed object
unclosed_obj = '{"status": "running", "progress": 75'
result = fuzzy_json(unclosed_obj)
print(f"✓ Parsed: {result}")

✓ Parsed: {'status': 'running', 'progress': 75}


In [11]:
# Unclosed array
unclosed_array = '[{"id": 1}, {"id": 2}'
result = fuzzy_json(unclosed_array)
print(f"✓ Parsed: {result}")

✓ Parsed: [{'id': 1}, {'id': 2}]


In [12]:
# Deeply nested unclosed
deep_unclosed = '{"data": {"items": [{"value": 123'
result = fuzzy_json(deep_unclosed)
print(f"✓ Parsed: {result}")

✓ Parsed: {'data': {'items': [{'value': 123}]}}


## 6. Whitespace Normalization

Handles irregular spacing and formatting.

In [13]:
# Excessive whitespace
spaced = '{  "key"  :   "value"  ,  "num"  :  42  }'
result = fuzzy_json(spaced)
print(f"✓ Parsed: {result}")

✓ Parsed: {'key': 'value', 'num': 42}


In [14]:
# Mixed newlines and tabs
multiline = """{\n  "line1": "text",\n\t"line2": "more"\n}"""
result = fuzzy_json(multiline)
print(f"✓ Parsed: {result}")

✓ Parsed: {'line1': 'text', 'line2': 'more'}


## 7. Combined Error Correction

Real-world LLM output often has multiple issues simultaneously.

In [15]:
# Multiple issues: single quotes, unquoted keys, trailing comma, unclosed
messy = "{'user': {name: 'Eve', tags: ['admin', 'user',],}"
result = fuzzy_json(messy)
print(f"✓ Parsed: {result}")
print(f"Type: {type(result)}")
print(f"Nested access: {result['user']['name']}")

✓ Parsed: {'user': {'name': 'Eve', 'tags': ['admin', 'user']}}
Type: <class 'dict'>
Nested access: Eve


In [16]:
# LLM-style output with Python syntax
llm_output = "{'response': 'success', 'data': {count: 5, items: [1, 2, 3"
result = fuzzy_json(llm_output)
print(f"✓ Parsed: {result}")

✓ Parsed: {'response': 'success', 'data': {'count': 5, 'items': [1, 2, 3]}}


## 8. Error Cases

Not all strings are recoverable - fuzzy_json has limits.

In [17]:
# Empty string
try:
    fuzzy_json("")
except ValueError as e:
    print(f"✓ Empty string rejected: {e}")

✓ Empty string rejected: Input string is empty


In [18]:
# Non-string input
try:
    fuzzy_json({"already": "dict"})
except TypeError as e:
    print(f"✓ Non-string rejected: {e}")

✓ Non-string rejected: Input must be a string


In [19]:
# Extra closing brackets (not fixable)
try:
    fuzzy_json('{"key": "value"}}')  # One too many closing braces
except ValueError as e:
    print(f"✓ Extra closing bracket detected: {e}")

✓ Extra closing bracket detected: Extra closing bracket found.


In [20]:
# Mismatched brackets (not fixable)
try:
    fuzzy_json('[{"key": "value"]')  # [ opened, } closed
except ValueError as e:
    print(f"✓ Mismatched brackets detected: {e}")

✓ Mismatched brackets detected: Mismatched brackets.


In [21]:
# Completely invalid syntax
try:
    fuzzy_json("This is not JSON at all!")
except ValueError as e:
    print(f"✓ Invalid JSON rejected: {e}")

✓ Invalid JSON rejected: Invalid JSON string


In [22]:
# List of primitives (not list[dict])
try:
    fuzzy_json("[1, 2, 3]")  # List of ints, not list[dict]
except TypeError as e:
    print(f"✓ Primitive list rejected: {e}")

✓ Primitive list rejected: fuzzy_json returns dict or list[dict], got list with non-dict element at index 0: int


## 9. Performance Characteristics

Three-stage parsing strategy balances speed and recovery.

In [23]:
# Stage 1: Direct parse (fastest - no overhead)
valid = '{"fast": true}'
result = fuzzy_json(valid)
print(f"Stage 1 (direct): {result}")

Stage 1 (direct): {'fast': True}


In [24]:
# Stage 2: Quote + whitespace cleaning (light overhead)
needs_clean = "{'quotes': 'fixed',  'space': 'normalized'  }"
result = fuzzy_json(needs_clean)
print(f"Stage 2 (clean): {result}")

Stage 2 (clean): {'quotes': 'fixed', 'space': 'normalized'}


In [25]:
# Stage 3: Bracket balancing (full overhead)
needs_fix = '{"unclosed": "object"'
result = fuzzy_json(needs_fix)
print(f"Stage 3 (fix): {result}")

Stage 3 (fix): {'unclosed': 'object'}


## 10. Return Type Guarantees

Always returns dict or list[dict], never primitives.

In [26]:
# Object → dict
obj = fuzzy_json('{"key": "value"}')
print(f"Object type: {type(obj)} → {obj}")

Object type: <class 'dict'> → {'key': 'value'}


In [27]:
# Array → list
arr = fuzzy_json('[{"a": 1}, {"b": 2}]')
print(f"Array type: {type(arr)} → {arr}")

Array type: <class 'list'> → [{'a': 1}, {'b': 2}]


In [28]:
# Verify type annotation
from typing import get_type_hints

hints = get_type_hints(fuzzy_json)
print(f"Return type: {hints['return']}")

Return type: dict[str, typing.Any] | list[dict[str, typing.Any]]


## Summary Checklist

**Fuzzy JSON Essentials:**
- ✅ Three-stage fallback: direct → clean → fix
- ✅ Single quote → double quote conversion
- ✅ Auto-closes unclosed brackets/braces
- ✅ Strips trailing commas
- ✅ Quotes unquoted object keys
- ✅ Normalizes whitespace (spaces, newlines, tabs)
- ✅ Returns dict or list[dict] only
- ✅ Fast path for valid JSON (no overhead)
- ✅ Detects unfixable errors (extra brackets, mismatches)
- ✅ Uses orjson for performance

**Error Limits:**
- ❌ Cannot fix extra closing brackets
- ❌ Cannot fix mismatched bracket types
- ❌ Cannot parse non-JSON text
- ❌ Rejects empty strings and non-string input

**Use Cases:**
- LLM JSON output (often malformed)
- API responses with lenient formatting
- User-provided JSON (typos, Python syntax)
- Truncated JSON from streaming
- Log parsing with incomplete entries

**Next Steps:**
- See `lionherd_core.libs.string_handlers` for related utilities
- See `lionherd_core.base.Element` for structured entity serialization
- See `lionherd_core.ops` for LLM response parsing patterns