# Node Reference Guide

**Node** is the polymorphic container in lionherd - extends Element with arbitrary content, embeddings, and automatic type registry.

## Key Features

1. **Element Inheritance**: Inherits `id`, `metadata`, `created_at` from Element base class
2. **Content Polymorphism**: `content: Any` accepts primitives, collections, or nested Elements
3. **Embedding Support**: Optional `embedding: list[float]` with DB compatibility (JSON string coercion)
4. **Auto-Registry**: Subclasses auto-register via `__pydantic_init_subclass__` for polymorphic deserialization
5. **Serialization Modes**: `python`/`json`/`db` modes with automatic lion_class injection
6. **Pydapter Integration**: TOML/YAML adapters with isolated per-subclass registries (Rust-like explicit pattern)

## Design Philosophy

- **Composition over inheritance**: Flexible content field enables graph-of-graphs patterns
- **Zero-config subclasses**: Registry eliminates boilerplate - subclasses "just work"
- **Database-first**: `mode="db"` uses `node_metadata` to avoid column conflicts

## Setup

In [None]:
from lionherd_core.base.node import Node, NODE_REGISTRY
from lionherd_core.base.element import Element

# Define sample Node subclasses (auto-register)
class PersonNode(Node):
    """Node representing a person."""
    name: str = "Unknown"
    age: int = 0

class DocumentNode(Node):
    """Node representing a document."""
    title: str = "Untitled"
    body: str = ""

## 1. Element Inheritance

Node extends Element, inheriting automatic ID generation, metadata, and timestamps.

In [None]:
# Create base Node - inherits Element features
node = Node(content="Hello World")

print(f"ID (UUID): {node.id}")
print(f"Created at: {node.created_at}")
print(f"Metadata: {node.metadata}")
print(f"Content: {node.content}")
print(f"\nClass name: {node.class_name(full=True)}")

## 2. Content Field Polymorphism

The `content` field accepts **any Python value** - primitives, collections, or nested Elements.

In [None]:
# Content can be anything
node_str = Node(content="plain string")
node_dict = Node(content={"key": "value", "nested": [1, 2, 3]})
node_list = Node(content=["a", "b", "c"])

# Content can be nested Node (graph-of-graphs pattern)
inner = PersonNode(name="Alice", age=30)
outer = Node(content=inner)

print(f"String content: {node_str.content}")
print(f"Dict content: {node_dict.content}")
print(f"Nested Node: {outer.content.name}, age {outer.content.age}")

## 3. Nested Element Serialization

When content contains Elements, they auto-serialize via `_serialize_content` field serializer.

In [None]:
# Create Node with nested PersonNode
person = PersonNode(name="Bob", age=25, content="engineer")
wrapper = Node(content=person)

# Serialize - content becomes dict automatically
data = wrapper.to_dict()
print(f"Serialized content type: {type(data['content'])}")
print(f"Content includes lion_class: {'lion_class' in data['content']['metadata']}")

# Deserialize - content becomes PersonNode automatically
restored = Node.from_dict(data)
print(f"\nRestored content type: {type(restored.content)}")
print(f"Restored name: {restored.content.name}")

## 4. Embedding Field

Optional `embedding: list[float]` with validation and JSON string coercion for DB compatibility.

In [None]:
# Embedding as list (standard)
node1 = Node(content="text", embedding=[0.1, 0.2, 0.3])
print(f"Embedding: {node1.embedding}")

# Embedding from JSON string (DB compatibility)
import orjson
json_str = orjson.dumps([0.4, 0.5, 0.6]).decode()
node2 = Node(content="text", embedding=json_str)
print(f"From JSON string: {node2.embedding}")

# Integer coercion to float
node3 = Node(content="text", embedding=[1, 2, 3])
print(f"Int coerced to float: {node3.embedding}")
print(f"All floats: {all(isinstance(x, float) for x in node3.embedding)}")

## 5. NODE_REGISTRY Auto-Registration

Subclasses auto-register via `__pydantic_init_subclass__` - enables polymorphic deserialization.

In [None]:
# Check registry
print("Registry keys:", list(NODE_REGISTRY.keys()))
print(f"\nPersonNode registered: {'PersonNode' in NODE_REGISTRY}")
print(f"DocumentNode registered: {'DocumentNode' in NODE_REGISTRY}")

# Dynamic subclass registration
class CustomNode(Node):
    custom_field: str = "test"

print(f"\nCustomNode auto-registered: {'CustomNode' in NODE_REGISTRY}")
print(f"Registry returns correct class: {NODE_REGISTRY['CustomNode'] is CustomNode}")

## 6. Polymorphic Deserialization

`from_dict()` uses `lion_class` metadata to route to correct subclass from NODE_REGISTRY.

In [None]:
# Serialize different node types
person = PersonNode(name="Charlie", age=35)
doc = DocumentNode(title="Spec", body="Requirements")

person_data = person.to_dict()
doc_data = doc.to_dict()

print(f"Person lion_class: {person_data['metadata']['lion_class']}")
print(f"Doc lion_class: {doc_data['metadata']['lion_class']}")

# Deserialize via base Node.from_dict() - polymorphic routing
restored_person = Node.from_dict(person_data)
restored_doc = Node.from_dict(doc_data)

print(f"\nRestored person type: {type(restored_person).__name__}")
print(f"Restored doc type: {type(restored_doc).__name__}")
print(f"Person name: {restored_person.name}")
print(f"Doc title: {restored_doc.title}")

## 7. Heterogeneous Collections (Real-World DB Scenario)

Single query returns mixed node types - polymorphic deserialization preserves types.

In [None]:
# Simulate DB query returning mixed types with node_metadata
db_records = [
    {"name": "Alice", "age": 30, "node_metadata": {"lion_class": "PersonNode"}},
    {"title": "Report", "body": "Data", "node_metadata": {"lion_class": "DocumentNode"}},
    {"name": "Bob", "age": 25, "node_metadata": {"lion_class": "PersonNode"}},
]

# Deserialize - each gets correct type
nodes = [Node.from_dict(record) for record in db_records]

print("Deserialized types:")
for i, node in enumerate(nodes):
    print(f"  [{i}] {type(node).__name__}", end="")
    if isinstance(node, PersonNode):
        print(f" - {node.name}, age {node.age}")
    elif isinstance(node, DocumentNode):
        print(f" - {node.title}")

## 8. Serialization Modes

Three modes: `python` (in-memory), `json` (APIs), `db` (database with node_metadata).

In [None]:
node = PersonNode(name="David", age=40)

# Python mode - preserves datetime/UUID objects
python_dict = node.to_dict(mode="python")
print(f"Python mode has 'metadata': {'metadata' in python_dict}")
print(f"Created_at type: {type(python_dict['created_at']).__name__}")

# JSON mode - serializes to strings
json_dict = node.to_dict(mode="json")
print(f"\nJSON mode has 'metadata': {'metadata' in json_dict}")
print(f"Created_at type: {type(json_dict['created_at']).__name__}")

# DB mode - uses node_metadata (avoids column conflicts)
db_dict = node.to_dict(mode="db")
print(f"\nDB mode has 'node_metadata': {'node_metadata' in db_dict}")
print(f"DB mode has 'metadata': {'metadata' in db_dict}")
print(f"lion_class in node_metadata: {'lion_class' in db_dict['node_metadata']}")

## 9. Serialization Roundtrip (All Modes)

All modes support lossless roundtrip with polymorphic type preservation.

In [None]:
original = DocumentNode(title="Architecture", body="System design")

for mode in ["python", "json", "db"]:
    # Serialize
    data = original.to_dict(mode=mode)
    
    # Deserialize
    restored = Node.from_dict(data)
    
    # Verify
    print(f"{mode:8} -> Type: {type(restored).__name__:12} Title: {restored.title:15} ID match: {restored.id == original.id}")

## 10. Pydapter Integration (Isolated Registry Pattern)

Base Node has TOML/YAML built-in. Subclasses get **isolated registries** (Rust-like explicit) - must register adapters explicitly.

In [None]:
from pydapter.adapters import TomlAdapter, YamlAdapter

# Base Node has toml/yaml built-in
base_node = Node(content="test")
toml_str = base_node.adapt_to("toml")
print("Base Node TOML (first 100 chars):")
print(toml_str[:100])

# Subclasses have ISOLATED registries - must register explicitly
PersonNode.register_adapter(TomlAdapter)
person = PersonNode(name="Eve", age=28)
person_toml = person.adapt_to("toml")
print("\nPersonNode TOML (first 100 chars):")
print(person_toml[:100])

# Roundtrip with polymorphism
restored = Node.adapt_from(person_toml, "toml")
print(f"\nRestored type: {type(restored).__name__}")
print(f"Restored name: {restored.name}")

## 11. Advanced: created_at_format Options

Control timestamp serialization: `datetime` (object), `isoformat` (string), `timestamp` (float).

In [None]:
node = Node(content="test")

# Datetime format (default for python mode)
data1 = node.to_dict(mode="python", created_at_format="datetime")
print(f"datetime format: {type(data1['created_at']).__name__} = {data1['created_at']}")

# ISO format (string)
data2 = node.to_dict(mode="python", created_at_format="isoformat")
print(f"isoformat: {type(data2['created_at']).__name__} = {data2['created_at']}")

# Timestamp format (float)
data3 = node.to_dict(mode="python", created_at_format="timestamp")
print(f"timestamp: {type(data3['created_at']).__name__} = {data3['created_at']}")

## Summary Checklist

### Core Patterns Demonstrated

- ✅ **Element Inheritance**: Node inherits `id`, `metadata`, `created_at` from Element
- ✅ **Content Polymorphism**: `content: Any` accepts primitives, collections, nested Elements
- ✅ **Nested Serialization**: Elements in content auto-serialize via `_serialize_content`
- ✅ **Embedding Field**: `list[float]` with JSON string coercion + int→float coercion
- ✅ **Auto-Registry**: `__pydantic_init_subclass__` registers subclasses in `NODE_REGISTRY`
- ✅ **Polymorphic Deserialization**: `from_dict()` routes via `lion_class` metadata
- ✅ **Heterogeneous Collections**: Single `Node.from_dict()` call handles mixed types
- ✅ **Serialization Modes**: `python`/`json`/`db` with `node_metadata` for DB
- ✅ **Pydapter Integration**: Isolated adapter registries (Rust-like explicit)
- ✅ **Timestamp Control**: `created_at_format` options (datetime/isoformat/timestamp)

### Real-World Use Cases

1. **Graph Databases**: Heterogeneous node types in single query → polymorphic deserialization
2. **Nested Composition**: Node contains Graph contains Nodes (graph-of-graphs)
3. **Vector Search**: Embedding field for semantic search with DB JSON compatibility
4. **External Formats**: TOML/YAML serialization with type preservation
5. **DB Storage**: `mode="db"` with `node_metadata` avoids column conflicts

### Key Design Decisions

- **Zero-config subclasses**: Registry eliminates manual registration boilerplate
- **Isolated adapter registries**: Prevents pollution, explicit > implicit
- **Composition over hierarchy**: Flexible content field enables dynamic structures
- **Database-first design**: JSON string coercion, node_metadata field, serialization modes