# LNDL Fuzzy Parsing - Handling Real LLM Outputs

LNDL fuzzy parsing handles typos and variations in LLM-generated structured outputs.

**Core Features:**
- **Fuzzy Name Matching**: Auto-corrects typos in model/field/spec names using Jaro-Winkler similarity
- **Configurable Thresholds**: Global or per-category thresholds (0.7-1.0)
- **Strict Mode**: Set threshold=1.0 for exact matching only
- **Tie Detection**: Prevents ambiguous matches (within 0.05 similarity)
- **Production Ready**: Default 0.85 threshold proven in production

**Architecture:**
1. Parse LNDL response (extract lvars, lacts, OUT{})
2. Pre-correct typos using fuzzy matching
3. Call strict resolver with corrected inputs (zero duplication)

In [1]:
from pydantic import BaseModel, Field

from lionherd_core.lndl.errors import AmbiguousMatchError, MissingFieldError
from lionherd_core.lndl.fuzzy import parse_lndl_fuzzy
from lionherd_core.types import Operable, Spec

## 1. Setup - Define Models and Operable

Create example models representing structured LLM outputs.

In [2]:
# Example models for document analysis
class Report(BaseModel):
    title: str
    summary: str = "No summary"  # Default for focused examples
    confidence: float = Field(default=0.9, ge=0.0, le=1.0)


class Article(BaseModel):
    headline: str
    content: str = "No content"  # Default for focused examples
    author: str = "Unknown"


# Create Operable with specs (Article optional for focused examples)
operable = Operable(
    [
        Spec(Report, name="report"),
        Spec(Article, name="article", required=False),
    ]
)

print(f"Available models: {list(operable.allowed())}")

Available models: ['article', 'report']


## 2. Basic Fuzzy Matching - Auto-Correct Typos

Default threshold (0.85) automatically corrects common typos.

In [3]:
# LLM response with typos in model/field/spec names
response_with_typos = """
<lvar Reprot.titel title>AI Safety Analysis</lvar>
<lvar Reprot.summry summary>Comprehensive review of AI safety practices.</lvar>
<lvar Reprot.confidence conf>0.95</lvar>

OUT{reprot: [title, summary, conf]}
"""

# Parse with default fuzzy matching
result = parse_lndl_fuzzy(response_with_typos, operable)

# Auto-corrected: Reprot→Report, titel→title, summry→summary, reprot→report
report = result.report
print(f"Title: {report.title}")
print(f"Summary: {report.summary}")
print(f"Confidence: {report.confidence}")
print("✓ All typos auto-corrected")

Title: AI Safety Analysis
Summary: Comprehensive review of AI safety practices.
Confidence: 0.95
✓ All typos auto-corrected


## 3. Strict Mode - Exact Matching Only

Set threshold=1.0 to require exact matches (no fuzzy correction).

In [4]:
# Same response with typos
response_with_typos = """
<lvar Reprot.titel title>AI Safety Analysis</lvar>
OUT{reprot: [title]}
"""

# Strict mode - raises error on typos
try:
    result = parse_lndl_fuzzy(response_with_typos, operable, threshold=1.0)
except MissingFieldError as e:
    print(f"✓ Strict mode error: {e}")
    print(f"Error type: {type(e).__name__}")

✓ Strict mode error: Model 'Reprot' not found. Available: ['Report', 'Article'] (strict mode: exact match required)
Error type: MissingFieldError


In [5]:
# Correct response passes strict mode
response_correct = """
<lvar Report.title title>AI Safety Analysis</lvar>
OUT{report: [title]}
"""

result = parse_lndl_fuzzy(response_correct, operable, threshold=1.0)
print("✓ Strict mode passed with exact names")
print(f"Title: {result.report.title}")

✓ Strict mode passed with exact names
Title: AI Safety Analysis


## 4. Custom Thresholds - Fine-Grained Control

Override global threshold for specific categories (field, lvar, model, spec).

In [6]:
# Response with minor field typo, major model typo
response_mixed = """
<lvar Report.titl title>Analysis</lvar>
OUT{report: [title]}
"""

# Lenient field matching (0.75), strict model matching (0.95)
result = parse_lndl_fuzzy(
    response_mixed,
    operable,
    threshold_field=0.75,  # Accept more field variations
    threshold_model=0.95,  # Require near-exact model names
)

print("✓ Field typo corrected (titl→title)")
print(f"Title: {result.report.title}")

✓ Field typo corrected (titl→title)
Title: Analysis


In [7]:
# Major model typo fails with strict threshold
response_bad_model = """
<lvar Rprt.title title>Analysis</lvar>
OUT{report: [title]}
"""

try:
    result = parse_lndl_fuzzy(response_bad_model, operable, threshold_model=0.95)
except MissingFieldError as e:
    print("✓ Major model typo rejected with strict threshold")
    print(f"Error: {e}")

✓ Major model typo rejected with strict threshold
Error: Model 'Rprt' not found above threshold 0.95. Available: ['Report', 'Article']


## 5. Ambiguous Matches - Tie Detection

Prevents ambiguous corrections when multiple candidates score within 0.05 similarity.

In [8]:
# Create models with similar field names
class Document(BaseModel):
    content: str = "default content"  # Similar to 'context'
    context: str = "default context"


operable_ambiguous = Operable([Spec(Document, name="doc")])

# Typo equally close to both 'content' and 'context'
response_ambiguous = """
<lvar Document.contnt val>text</lvar>
OUT{doc: [val]}
"""

try:
    result = parse_lndl_fuzzy(response_ambiguous, operable_ambiguous, threshold=0.75)
except AmbiguousMatchError as e:
    print(f"✓ Ambiguous match detected: {e}")
    print(f"Error type: {type(e).__name__}")

## 6. Real-World Example - Complete LLM Response

Parsing a realistic LLM response with multiple typos and variations.

In [9]:
# Realistic LLM response with multiple typos
llm_response = """
Based on the analysis, here are my findings:

<lvar Reprot.titel title>Quarterly AI Safety Review</lvar>
<lvar Reprot.summry summary>This quarter showed significant improvements in AI safety protocols. Key areas include:
1. Enhanced monitoring systems
2. Improved incident response
3. Better alignment metrics</lvar>
<lvar Reprot.confidence conf>0.92</lvar>

OUT{reprot: [title, summary, conf]}
"""

# Parse with default fuzzy matching
result = parse_lndl_fuzzy(llm_response, operable)

report = result.report
print(f"Title: {report.title}")
print(f"\nSummary: {report.summary[:100]}...")
print(f"\nConfidence: {report.confidence}")
print("\n✓ Successfully parsed LLM response with multiple typos")

Title: Quarterly AI Safety Review

Summary: This quarter showed significant improvements in AI safety protocols. Key areas include:
1. Enhanced ...

Confidence: 0.92

✓ Successfully parsed LLM response with multiple typos


## 7. Lvar Reference Correction

Fuzzy matching also corrects typos in lvar references within OUT{} arrays.

In [10]:
# Typos in both lvar names and references
response_ref_typos = """
<lvar Report.title title>AI Safety</lvar>
<lvar Report.summary summary>Detailed analysis</lvar>
<lvar Report.confidence conf>0.88</lvar>

OUT{report: [titel, summry, conf]}
"""

# Auto-corrects: titel→title, summry→summary
result = parse_lndl_fuzzy(response_ref_typos, operable)

report = result.report
print(f"Title: {report.title}")
print(f"Summary: {report.summary}")
print(f"Confidence: {report.confidence}")
print("\n✓ Lvar reference typos auto-corrected")

Title: AI Safety
Summary: Detailed analysis
Confidence: 0.88

✓ Lvar reference typos auto-corrected


## 8. Threshold Comparison - Same Input, Different Thresholds

See how different thresholds affect parsing outcomes.

In [11]:
# Response with moderate typo
response_moderate = """
<lvar Report.titl title>Test</lvar>
OUT{report: [title]}
"""

# Test different thresholds
thresholds = [0.70, 0.85, 0.95, 1.0]

for threshold in thresholds:
    try:
        result = parse_lndl_fuzzy(response_moderate, operable, threshold=threshold)
        print(f"✓ threshold={threshold}: Success (titl→title corrected)")
    except MissingFieldError:
        print(f"✗ threshold={threshold}: Failed (typo too far from 'title')")

✓ threshold=0.7: Success (titl→title corrected)
✓ threshold=0.85: Success (titl→title corrected)
✓ threshold=0.95: Success (titl→title corrected)
✗ threshold=1.0: Failed (typo too far from 'title')


## 9. Multiple Models - Cross-Model Correction

Fuzzy matching handles typos across different models in the same response.

In [12]:
# Response with lvars from multiple models
multi_model_response = """
<lvar Reprot.titel rpt_title>Safety Report</lvar>
<lvar Reprot.summry rpt_summary>Analysis findings</lvar>
<lvar Reprot.confidence rpt_conf>0.91</lvar>

<lvar Articl.headline art_head>Breaking News</lvar>
<lvar Articl.contnt art_body>Important update</lvar>
<lvar Articl.authr art_author>John Doe</lvar>

OUT{
    reprot: [rpt_title, rpt_summary, rpt_conf],
    articl: [art_head, art_body, art_author]
}
"""

# Parse with fuzzy matching
result = parse_lndl_fuzzy(multi_model_response, operable)

print("Report:")
print(f"  Title: {result.report.title}")
print(f"  Summary: {result.report.summary}")
print(f"  Confidence: {result.report.confidence}")

print("\nArticle:")
print(f"  Headline: {result.article.headline}")
print(f"  Content: {result.article.content}")
print(f"  Author: {result.article.author}")

print("\n✓ Typos corrected across multiple models")

Report:
  Title: Safety Report
  Summary: Analysis findings
  Confidence: 0.91

Article:
  Headline: Breaking News
  Content: Important update
  Author: John Doe

✓ Typos corrected across multiple models


## 10. Error Diagnostics - Helpful Messages

Errors provide clear context and available options.

In [13]:
# Completely wrong field name (no match)
response_no_match = """
<lvar Report.xyz val>Test</lvar>
OUT{report: [val]}
"""

try:
    result = parse_lndl_fuzzy(response_no_match, operable, threshold=0.85)
except MissingFieldError as e:
    print("Error message:")
    print(f"  {e}")
    print("\n✓ Error shows threshold and available fields")

Error message:
  Field (model report) 'xyz' not found above threshold 0.85. Available: ['title', 'summary', 'confidence']

✓ Error shows threshold and available fields


In [14]:
# Strict mode error - different message
response_typo = """
<lvar Report.titel val>Test</lvar>
OUT{report: [val]}
"""

try:
    result = parse_lndl_fuzzy(response_typo, operable, threshold=1.0)
except MissingFieldError as e:
    print("Strict mode error:")
    print(f"  {e}")
    print("\n✓ Indicates strict mode: exact match required")

Strict mode error:
  Field 'titel' not found in model Report. Available: ['title', 'summary', 'confidence'] (strict mode: exact match required)

✓ Indicates strict mode: exact match required


## Summary Checklist

**Fuzzy LNDL Parsing Essentials:**
- ✅ Auto-corrects typos in model/field/spec/lvar names using Jaro-Winkler similarity
- ✅ Default threshold (0.85) proven in production
- ✅ Strict mode (threshold=1.0) for exact matching
- ✅ Per-category thresholds (field, lvar, model, spec) for fine-grained control
- ✅ Tie detection prevents ambiguous matches (within 0.05 similarity)
- ✅ Clear error messages with available options
- ✅ Handles complex LLM responses with multiple models
- ✅ Zero-duplication architecture (pre-correct → strict resolver)

**Threshold Guidelines:**
- **0.70-0.80**: Lenient (accepts major variations)
- **0.85**: Production default (balanced tolerance)
- **0.90-0.95**: Strict (minor typos only)
- **1.0**: Exact matching (no fuzzy correction)

**Model name threshold**: Defaults to max(threshold, 0.90) - stricter for model names

**Next Steps:**
- See `parse_lndl` for strict parsing (no fuzzy matching)
- See `resolve_references_prefixed` for the underlying strict resolver
- See `LNDLOutput` for action execution lifecycle