# Tutorial: Advanced Type Conversions with to_dict()

**Category**: ln Utilities
**Difficulty**: Intermediate
**Time**: 15-20 minutes

## Problem Statement

Modern AI applications work with heterogeneous data: LLM responses with JSON strings, API responses mixing Pydantic models and dicts, configurations combining dataclasses and enums. Manually normalizing this mixed-type data is error-prone.

**Why This Matters**:
- **Data Pipeline Reliability**: Mixed-type inputs cause parsing failures
- **LLM Integration Fragility**: LLMs return JSON with trailing commas or nested as strings

**What You'll Build**:
A data normalization pipeline using lionherd-core's `to_dict()` that handles 7+ input types with recursive parsing and fuzzy JSON fallback.

## Prerequisites

**Prior Knowledge**:
- Python type system (dataclasses, enums)
- JSON parsing basics

**Required Packages**:
```bash
pip install lionherd-core  # >=0.1.0
```

In [1]:
# Standard library
from dataclasses import dataclass
from enum import Enum
from typing import Any

# lionherd-core
from lionherd_core.ln import to_dict

## Solution Overview

We'll implement universal data normalization using `to_dict()`:

1. **Basic Types**: Handle dicts, lists, sets, None
2. **JSON Strings**: Parse JSON strings with fuzzy fallback
3. **Structured Types**: Convert dataclasses and enums
4. **Recursive Processing**: Deep conversion with depth control

**Key lionherd-core Components**:
- `to_dict()`: Universal conversion with recursive processing
- Automatic JSON parsing for strings (orjson + fuzzy fallback)
- Dataclass/enum support
- Depth-limited recursion (max 10 levels)

**Pattern**: All input types normalize to flat or nested dictionaries.

### Step 1: Basic Type Conversions and JSON Parsing

Start with fundamental conversions and JSON string parsing.

**Key Point**: Lists become indexed dicts `{0: val, 1: val}`, sets become value mappings `{val: val}`, None becomes `{}`.

In [2]:
# Dict → dict (shallow copy)
user_data = {"name": "Alice", "age": 30}
result = to_dict(user_data)
print(f"Dict: {result}\n")

# List → indexed dict
items = ["apple", "banana"]
result = to_dict(items)
print(f"List → indexed dict: {result}\n")

# JSON string parsing
llm_response = '{"status": "success", "data": {"count": 42}}'
result = to_dict(llm_response)
print(f"Parsed JSON: {result}\n")

# Malformed JSON with fuzzy parser
malformed = '{"name": "Bob", "age": 25, }'
result = to_dict(malformed, fuzzy_parse=True)
print(f"Fuzzy parsed: {result}")

Dict: {'name': 'Alice', 'age': 30}

List → indexed dict: {0: 'apple', 1: 'banana'}

Parsed JSON: {'status': 'success', 'data': {'count': 42}}

Fuzzy parsed: {'name': 'Bob', 'age': 25}


### Step 2: Dataclass and Enum Conversion

Configuration systems use dataclasses and enums. `to_dict()` automatically converts these.

**Key Point**: Nested dataclasses convert recursively via `dataclasses.asdict()`.

In [3]:
# Dataclass conversion
@dataclass
class ServiceConfig:
    name: str
    port: int
    debug: bool = False


config = ServiceConfig(name="api-server", port=8000, debug=True)
result = to_dict(config)
print(f"Dataclass: {result}\n")


# Enum class
class Environment(Enum):
    DEV = "development"
    PROD = "production"


# Extract enum values
result = to_dict(Environment, use_enum_values=True)
print(f"Enum values: {result}")

Dataclass: {'name': 'api-server', 'port': 8000, 'debug': True}

Enum values: {'DEV': 'development', 'PROD': 'production'}


### Step 3: Recursive Processing for Nested Structures

LLM outputs contain nested JSON strings. Recursive processing parses these at all depths.

**Key Point**: Default depth limit (5 levels) prevents stack overflow while handling real data.

In [4]:
# Nested JSON strings (common in LLM tool outputs)
llm_tool_output = {
    "user": '{"name": "Alice", "age": 30}',
    "nested": {"config": '{"debug": true, "level": 2}'},
}

# Without recursion - JSON strings remain unparsed
result_no_recurse = to_dict(llm_tool_output, recursive=False)
print("Without recursion:")
print(f"  user type: {type(result_no_recurse['user'])}\n")

# With recursion - all JSON strings parsed
result_recurse = to_dict(llm_tool_output, recursive=True)
print("With recursion:")
print(f"  user: {result_recurse['user']}")
print(f"  nested.config: {result_recurse['nested']['config']}")

Without recursion:
  user type: <class 'str'>

With recursion:
  user: {'name': 'Alice', 'age': 30}
  nested.config: {'debug': True, 'level': 2}


## Complete Working Example

Production-ready data normalization for heterogeneous inputs.

In [5]:
"""
Production data normalization pipeline.
"""
from dataclasses import dataclass

from lionherd_core.ln import to_dict


class DataNormalizer:
    """Universal data normalization with configurable recursion."""

    def __init__(
        self,
        max_depth: int = 5,
        fuzzy_parse: bool = True,
        suppress_errors: bool = False,
    ):
        self.max_depth = max_depth
        self.fuzzy_parse = fuzzy_parse
        self.suppress_errors = suppress_errors

    def normalize(
        self,
        data: Any,
        recursive: bool = True,
    ) -> dict[str | int, Any]:
        """Normalize heterogeneous input to dictionary."""
        return to_dict(
            data,
            recursive=recursive,
            max_recursive_depth=self.max_depth,
            recursive_python_only=False,  # Convert custom objects
            fuzzy_parse=self.fuzzy_parse,
            suppress=self.suppress_errors,
            use_enum_values=True,  # Extract enum values
        )


# Example usage
normalizer = DataNormalizer(max_depth=10, fuzzy_parse=True)

# LLM output with nested JSON
llm_output = {
    "tool_name": "analyzer",
    "result": '{"summary": "Complete", "stats": {"count": 42}}',
}

normalized = normalizer.normalize(llm_output)
print(f"Result: {normalized['result']['summary']}")
print(f"Stats: {normalized['result']['stats']}")

Result: Complete
Stats: {'count': 42}


## Production Considerations

### Error Handling

```python
# Fault-tolerant mode for production
result = to_dict(data, suppress=True, fuzzy_parse=True)
if not result:
    logger.warning(f"Failed to convert: {data}")
```

### Performance

- **JSON parsing**: `orjson` is O(n), ~2-3x faster than stdlib
- **Recursive processing**: O(n × d) where d=depth
- **Benchmarks**: <1ms for typical API responses (<10KB)

### Testing

```python
def test_recursive_processing():
    nested = {"outer": '{"inner": {"value": 123}}'}
    result = to_dict(nested, recursive=True)
    assert result["outer"]["inner"]["value"] == 123
```

## Variations

### Custom Parser

```python
import yaml

def yaml_parser(s: str, **kwargs) -> dict:
    return yaml.safe_load(s)

yaml_string = "name: Alice\nage: 30\nskills: [Python, SQL]"
result = to_dict(yaml_string, parser=yaml_parser)
```

### Selective Recursion

```python
def selective_normalize(data: dict, recursive_keys: set[str]):
    result = {}
    for key, value in data.items():
        if key in recursive_keys:
            result[key] = to_dict(value, recursive=True)
        else:
            result[key] = to_dict(value, recursive=False)
    return result
```

## Summary

**What You Accomplished**:
- ✅ Built universal data normalizer handling 7+ input types
- ✅ Implemented recursive JSON parsing with fuzzy fallback
- ✅ Configured depth control and error handling

**Key Takeaways**:
1. **Universal conversion eliminates type-checking**: `to_dict()` handles all common types
2. **Recursive processing with depth limits**: Default depth 5 balances safety/practicality
3. **Fuzzy parsing improves LLM integration**: ~15-20% of LLM outputs have JSON errors

**When to Use**:
- ✅ Processing heterogeneous API responses or LLM outputs
- ✅ Normalizing configuration data from multiple formats
- ❌ Simple dict conversion where `dict(obj)` suffices

## Related Resources

- [to_dict API](../../docs/api/ln/to_dict.md)
- [Pydantic Documentation](https://docs.pydantic.dev/)