# LNDL Parser - Lion Directive Language Parsing

LNDL (Lion Directive Language) is a markup language for LLM-to-system communication. The parser extracts structured outputs from AI responses using XML-like tags and Python-style syntax.

**Core Features:**
- **Lvar Tags**: Variable declarations `<lvar name>value</lvar>`
- **Lact Tags**: Action/tool calls `<lact name>function(...)</lact>`
- **OUT{} Block**: Structured output specification
- **Namespace Support**: Prefixed variables `<lvar Model.field alias>value</lvar>`
- **Balanced Parsing**: Handles nested braces and quoted strings
- **Action Execution**: Lazy execution - only referenced actions run

In [1]:
from lionherd_core.lndl.errors import MissingOutBlockError
from lionherd_core.lndl.parser import (
    extract_lacts,
    extract_lacts_prefixed,
    extract_lvars,
    extract_lvars_prefixed,
    extract_out_block,
    parse_out_block_array,
    parse_value,
)

## 1. Extracting Lvars (Legacy Format)

Legacy format uses simple names: `<lvar name>content</lvar>`

In [2]:
# Simple lvar extraction
response = """
<lvar title>Introduction to AI</lvar>
<lvar summary>This is a comprehensive guide to artificial intelligence covering neural networks, machine learning, and deep learning.</lvar>
<lvar author>Dr. Smith</lvar>
"""

lvars = extract_lvars(response)
print(f"Extracted {len(lvars)} lvars:")
for name, value in lvars.items():
    print(f"  {name}: {value[:50]}..." if len(value) > 50 else f"  {name}: {value}")

Extracted 3 lvars:
  title: Introduction to AI
  summary: This is a comprehensive guide to artificial intell...
  author: Dr. Smith


In [3]:
# Multiline content preserved
multiline = """
<lvar code>
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)
</lvar>
"""

lvars = extract_lvars(multiline)
print("Multiline lvar content:")
print(lvars["code"])

Multiline lvar content:
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)


## 2. Extracting Lvars (Prefixed Format)

Prefixed format supports namespacing: `<lvar Model.field alias>value</lvar>`

In [4]:
# Namespaced lvars with aliases
response = """
<lvar Report.title t>Quarterly Financial Report</lvar>
<lvar Report.summary s>Revenue increased by 15% year-over-year.</lvar>
<lvar Report.author>John Doe</lvar>
"""

lvars = extract_lvars_prefixed(response)
print(f"Extracted {len(lvars)} prefixed lvars:")
for local_name, metadata in lvars.items():
    print(f"  {local_name}: {metadata.model}.{metadata.field} = {metadata.value[:40]}...")

Extracted 3 prefixed lvars:
  t: Report.title = Quarterly Financial Report...
  s: Report.summary = Revenue increased by 15% year-over-year....
  author: Report.author = John Doe...


In [5]:
# Default to field name when no alias provided
response = "<lvar Report.author>John Doe</lvar>"
lvars = extract_lvars_prefixed(response)
metadata = lvars["author"]

print(f"Local name: {metadata.local_name}")  # Uses field name as default
print(f"Model: {metadata.model}")
print(f"Field: {metadata.field}")
print(f"Value: {metadata.value}")

Local name: author
Model: Report
Field: author
Value: John Doe


## 3. Extracting Lacts (Legacy Format)

Legacy format for action/tool declarations: `<lact name>function(...)</lact>`

In [6]:
# Simple action declarations
response = """
<lact search>search(query="artificial intelligence", limit=10)</lact>
<lact validate>validate_email(email="user@example.com")</lact>
<lact compute>calculate_sum(numbers=[1, 2, 3, 4, 5])</lact>
"""

lacts = extract_lacts(response)
print(f"Extracted {len(lacts)} actions:")
for name, call_str in lacts.items():
    print(f"  {name}: {call_str}")

Extracted 3 actions:
  search: search(query="artificial intelligence", limit=10)
  validate: validate_email(email="user@example.com")
  compute: calculate_sum(numbers=[1, 2, 3, 4, 5])


## 4. Extracting Lacts (Prefixed Format)

Prefixed format with namespace support and metadata: `<lact Model.field alias>function(...)</lact>`

In [7]:
# Namespaced actions
response = """
<lact Report.summary s>generate_summary(text="Long article...", max_words=100)</lact>
<lact Report.keywords k>extract_keywords(text="Article content", count=5)</lact>
<lact validate>validate_data(data={"key": "value"})</lact>
"""

lacts = extract_lacts_prefixed(response)
print(f"Extracted {len(lacts)} prefixed actions:")
for local_name, metadata in lacts.items():
    model_field = f"{metadata.model}.{metadata.field}" if metadata.model else "(direct)"
    print(f"  {local_name}: {model_field}")
    print(f"    Call: {metadata.call[:60]}...")

Extracted 3 prefixed actions:
  s: Report.summary
    Call: generate_summary(text="Long article...", max_words=100)...
  k: Report.keywords
    Call: extract_keywords(text="Article content", count=5)...
  validate: (direct)
    Call: validate_data(data={"key": "value"})...


In [8]:
# Direct action (no namespace)
response = '<lact search>search(query="AI", limit=5)</lact>'
lacts = extract_lacts_prefixed(response)
metadata = lacts["search"]

print(f"Local name: {metadata.local_name}")
print(f"Model: {metadata.model}")  # None for direct actions
print(f"Field: {metadata.field}")  # None for direct actions
print(f"Call: {metadata.call}")

Local name: search
Model: None
Field: None
Call: search(query="AI", limit=5)


In [9]:
# Warning for Python reserved keywords
import warnings

response = '<lact print>print("hello")</lact>'

with warnings.catch_warnings(record=True) as w:
    warnings.simplefilter("always")
    lacts = extract_lacts_prefixed(response)

    if w:
        print(f"Warning: {w[0].message}")
    print(f"Action extracted: {list(lacts.keys())}")
    print("(Works in LNDL despite being a Python builtin)")

Action extracted: ['print']
(Works in LNDL despite being a Python builtin)


## 5. Extracting OUT{} Blocks

OUT blocks specify structured output. Parser handles balanced braces and string escaping.

In [10]:
# Simple OUT block
response = """
Here's the analysis:

OUT{
    title: report_title,
    summary: report_summary
}
"""

out_content = extract_out_block(response)
print("Extracted OUT block:")
print(out_content)

Extracted OUT block:
title: report_title,
    summary: report_summary


In [11]:
# Nested braces handled correctly
response = """
OUT{
    config: {"nested": {"key": "value"}},
    data: [1, 2, 3]
}
"""

out_content = extract_out_block(response)
print("Nested braces preserved:")
print(out_content)

Nested braces preserved:
config: {"nested": {"key": "value"}},
    data: [1, 2, 3]


In [12]:
# Strings with braces ignored during scanning
response = """
OUT{
    message: "This {is} a {string} with braces",
    code: "function() { return {}; }"
}
"""

out_content = extract_out_block(response)
print("Braces in strings ignored:")
print(out_content)

Braces in strings ignored:
message: "This {is} a {string} with braces",
    code: "function() { return {}; }"


In [13]:
# OUT block in lndl code fence
response = """
Here's the output:

```lndl
OUT{
    title: t,
    summary: s
}
```
"""

out_content = extract_out_block(response)
print("Extracted from code fence:")
print(out_content)

Extracted from code fence:
title: t,
    summary: s


In [14]:
# Missing OUT block raises error
response = "Just some text without an OUT block"

try:
    extract_out_block(response)
except MissingOutBlockError as e:
    print(f"Error: {e}")

Error: No OUT{} block found in response


In [15]:
# Unbalanced braces detected
response = "OUT{ field: value"  # Missing closing brace

try:
    extract_out_block(response)
except MissingOutBlockError as e:
    print(f"Error: {e}")

Error: Unbalanced OUT{} block


## 6. Parsing OUT{} Block Arrays

OUT blocks support array syntax for multiple variables per field: `field:[var1, var2]`

In [16]:
# Array syntax
out_content = """
title: t,
tags: [tag1, tag2, tag3],
author: a
"""

fields = parse_out_block_array(out_content)
print("Parsed fields:")
for field_name, value in fields.items():
    print(f"  {field_name}: {value} (type: {type(value).__name__})")

Parsed fields:
  title: ['t'] (type: list)
  tags: ['tag1', 'tag2', 'tag3'] (type: list)
  author: ['a'] (type: list)


In [17]:
# Literal values detected
out_content = """
title: "Literal String",
count: 42,
ratio: 3.14,
enabled: true,
disabled: false,
empty: null
"""

fields = parse_out_block_array(out_content)
print("Literal values:")
for field_name, value in fields.items():
    print(f"  {field_name}: {value!r} (type: {type(value).__name__})")

Literal values:
  title: '"Literal String"' (type: str)
  count: '42' (type: str)
  ratio: '3.14' (type: str)
  enabled: 'true' (type: str)
  disabled: 'false' (type: str)
  empty: 'null' (type: str)


In [18]:
# Variable references wrapped in lists
out_content = "title: report_title, author: author_name"

fields = parse_out_block_array(out_content)
print("Variable references:")
for field_name, value in fields.items():
    print(f"  {field_name}: {value} (wrapped in list for consistency)")

Variable references:
  title: ['report_title'] (wrapped in list for consistency)
  author: ['author_name'] (wrapped in list for consistency)


## 7. Parsing Values

Convert string representations to Python objects: numbers, booleans, lists, dicts.

In [19]:
# Numbers
print("Integer:", parse_value("42"), type(parse_value("42")))
print("Float:", parse_value("3.14"), type(parse_value("3.14")))
print("Negative:", parse_value("-10"), type(parse_value("-10")))

Integer: 42 <class 'int'>
Float: 3.14 <class 'float'>
Negative: -10 <class 'int'>


In [20]:
# Booleans and None
print("True:", parse_value("true"), type(parse_value("true")))
print("False:", parse_value("false"), type(parse_value("false")))
print("Null:", parse_value("null"), type(parse_value("null")))

# Case insensitive
print("TRUE:", parse_value("TRUE"))
print("False:", parse_value("False"))

True: True <class 'bool'>
False: False <class 'bool'>
Null: None <class 'NoneType'>
TRUE: True
False: False


In [21]:
# Lists and dicts
print("List:", parse_value("[1, 2, 3]"), type(parse_value("[1, 2, 3]")))
print("Dict:", parse_value('{"key": "value"}'), type(parse_value('{"key": "value"}')))
print("Nested:", parse_value('[{"a": 1}, {"b": 2}]'))

List: [1, 2, 3] <class 'list'>
Dict: {'key': 'value'} <class 'dict'>
Nested: [{'a': 1}, {'b': 2}]


In [22]:
# Strings (fallback)
print("String:", parse_value("hello world"), type(parse_value("hello world")))
print("Mixed:", parse_value("test-123-abc"))

String: hello world <class 'str'>
Mixed: test-123-abc


## 8. Full LNDL Workflow Example

Complete example showing variable extraction, action declaration, and OUT block parsing.

In [None]:
# Simulated LLM response with LNDL markup
llm_response = """
I'll analyze the document and provide a structured report.

<lvar Report.title t>Annual Performance Analysis 2024</lvar>
<lvar Report.executive_summary es>The company achieved record revenue of $1.2B,
representing 18% year-over-year growth. Key drivers included product innovation
and market expansion.</lvar>

<lact Report.keywords k>extract_keywords(text=es, count=5, min_score=0.8)</lact>
<lact Report.sentiment sent>analyze_sentiment(text=es)</lact>
<lact validate>validate_report(title=t, summary=es)</lact>

```lndl
OUT{
    title: t,
    executive_summary: es,
    keywords: k,
    sentiment: sent,
    status: "draft"
}
```

Note: The validation action is declared but not referenced in OUT{}, so it won't execute.
"""

print("=== LNDL Response Parsing ===")
print()

In [24]:
# Step 1: Extract lvars
lvars = extract_lvars_prefixed(llm_response)
print(f"1. Extracted {len(lvars)} lvars:")
for name, meta in lvars.items():
    value_preview = meta.value[:60] + "..." if len(meta.value) > 60 else meta.value
    print(f"   {name} ({meta.model}.{meta.field}): {value_preview}")
print()

1. Extracted 2 lvars:
   t (Report.title): Annual Performance Analysis 2024
   es (Report.executive_summary): The company achieved record revenue of $1.2B, 
representing ...



In [25]:
# Step 2: Extract lacts
lacts = extract_lacts_prefixed(llm_response)
print(f"2. Extracted {len(lacts)} action declarations:")
for name, meta in lacts.items():
    namespace = f"{meta.model}.{meta.field}" if meta.model else "(direct)"
    print(f"   {name} ({namespace}): {meta.call}")
print()

2. Extracted 3 action declarations:
   k (Report.keywords): extract_keywords(text=es, count=5, min_score=0.8)
   sent (Report.sentiment): analyze_sentiment(text=es)
   validate ((direct)): validate_report(title=t, summary=es)



In [26]:
# Step 3: Extract OUT block
out_content = extract_out_block(llm_response)
print("3. Extracted OUT{} block:")
print(out_content)
print()

3. Extracted OUT{} block:
title: t,
    executive_summary: es,
    keywords: k,
    sentiment: sent,
    status: "draft"



In [27]:
# Step 4: Parse OUT block
fields = parse_out_block_array(out_content)
print("4. Parsed OUT{} fields:")
for field_name, value in fields.items():
    value_type = (
        "literal" if isinstance(value, str) and not isinstance(value, list) else "variable(s)"
    )
    print(f"   {field_name}: {value} ({value_type})")
print()

4. Parsed OUT{} fields:
   title: ['t'] (variable(s))
   executive_summary: ['es'] (variable(s))
   keywords: ['k'] (variable(s))
   sentiment: ['sent'] (variable(s))
   status: "draft" (literal)



In [None]:
# Step 5: Identify actions to execute
actions_to_execute = set()
for _field_name, value in fields.items():
    if isinstance(value, list):
        for var_name in value:
            if var_name in lacts:
                actions_to_execute.add(var_name)

print("5. Actions to execute (referenced in OUT{}):")
for action_name in actions_to_execute:
    meta = lacts[action_name]
    print(f"   {action_name}: {meta.call}")

not_executed = set(lacts.keys()) - actions_to_execute
if not_executed:
    print(f"\n   Not executed (not referenced in OUT): {', '.join(not_executed)}")

## 9. Edge Cases and Error Handling

In [29]:
# Escaped quotes in lvar content
response = r'<lvar code>print("Hello \"World\"")</lvar>'
lvars = extract_lvars(response)
print("Escaped quotes preserved:")
print(lvars["code"])

Escaped quotes preserved:
print("Hello \"World\"")


In [30]:
# Complex nested function calls
response = (
    '<lact process>transform(data=load_data(path="/data/file.csv"), filters=["a", "b"])</lact>'
)
lacts = extract_lacts(response)
print("Nested function calls:")
print(lacts["process"])

Nested function calls:
transform(data=load_data(path="/data/file.csv"), filters=["a", "b"])


In [31]:
# Whitespace handling - content is stripped
response = """
<lvar title>   Lots of Whitespace   </lvar>
"""
lvars = extract_lvars(response)
print("Content whitespace stripped:")
print(f"'{lvars['title']}'" if lvars else "No match")

Content whitespace stripped:
'Lots of Whitespace'


In [32]:
# Mixed literal and variable values in OUT
out_content = """
status: "published",
version: 2,
title: t,
tags: [tag1, tag2],
priority: high
"""

fields = parse_out_block_array(out_content)
print("Mixed literals and variables:")
for field_name, value in fields.items():
    value_type = type(value).__name__
    print(f"  {field_name}: {value!r} ({value_type})")

Mixed literals and variables:
  status: '"published"' (str)
  version: '2' (str)
  title: ['t'] (list)
  tags: ['tag1', 'tag2'] (list)
  priority: ['high'] (list)


## 10. Performance Notes

The parser uses regex with `DOTALL` mode for extracting tag content. Performance characteristics:

In [33]:
import time

# Simulate large response
large_response = """
<lvar content>{}</lvar>
""".format("x" * 10000)

start = time.perf_counter()
lvars = extract_lvars(large_response)
elapsed = time.perf_counter() - start

print(f"Parsed 10KB lvar in {elapsed * 1000:.2f}ms")
print(f"Content length: {len(lvars['content'])} characters")
print("\nRecommended limits:")
print("  - Maximum response size: 50KB")
print("  - Optimal: <10KB per response")
print("  - For >100KB: Consider streaming parsers")

Parsed 10KB lvar in 0.08ms
Content length: 10000 characters

Recommended limits:
  - Maximum response size: 50KB
  - Optimal: <10KB per response
  - For >100KB: Consider streaming parsers


## Summary Checklist

**LNDL Parser Essentials:**
- ✅ Extract variables with `extract_lvars()` (legacy) or `extract_lvars_prefixed()` (namespaced)
- ✅ Extract actions with `extract_lacts()` (legacy) or `extract_lacts_prefixed()` (namespaced)
- ✅ Extract OUT{} blocks with balanced brace scanning
- ✅ Parse OUT{} arrays with `parse_out_block_array()`
- ✅ Convert string values to Python objects with `parse_value()`
- ✅ Lazy action execution - only referenced actions run
- ✅ Namespace support for structured outputs (Model.field)
- ✅ Handles nested braces, quoted strings, multiline content
- ✅ Warns on Python reserved keyword conflicts

**Action Execution Workflow:**
1. Parse response → extract lvars, lacts, OUT{}
2. Parse OUT{} → identify referenced variables/actions
3. Execute only referenced actions
4. Substitute results and validate

**Next Steps:**
- See `lndl.core` for high-level parsing with Spec/Operable integration
- See `types.ActionCall` for action execution lifecycle
- See `types.LNDLOutput` for complete output handling