Adaptive Runtime

Runtime Intelligence Layer for Stateful AI Systems

Not a chatbot framework. Not an LLM wrapper. Not a workflow builder.

An adaptive runtime intelligence layer — the missing piece between your AI logic and production reality.

The Problem

Most AI frameworks solve the model problem.
Nobody solves the runtime problem.

Your AI agent in development:   Works perfectly.
Your AI agent in production:    Crashes. Forgets state. Retries blindly. Dies silently.

Production AI systems fail because of:

💥 No crash recovery — state lost on restart
🧠 No memory — agent forgets context between sessions
🔁 Retry chaos — blind retries with no back-off
📉 No confidence scoring — decisions made without certainty
🌊 No contextual awareness — can't adapt to changing conditions

Adaptive Runtime fixes this.

See It Running

[16:08:13][RUNTIME]          Event received: service_overload
[16:08:13][CONTEXT_ENGINE]   risk=high  stability=low  pressure=0.65
[16:08:13][CONFIDENCE_ENGINE] confidence=0.84
[16:08:13][DECISION_ENGINE]  ACTION: RESTART_SERVICE
[16:08:13][STATE_ENGINE]     State persisted
[16:08:13][RECOVERY_ENGINE]  Checkpoint #3 created

  → restart_service  [high]  conf=0.840

[16:08:14][RUNTIME]          Event received: anomaly_detected
[16:08:14][CONTEXT_ENGINE]   risk=low   stability=stable  pressure=0.32
[16:08:14][CONFIDENCE_ENGINE] confidence=0.62
[16:08:14][DECISION_ENGINE]  ACTION: FLAG_FOR_REVIEW
[16:08:14][STATE_ENGINE]     State persisted

  → flag_for_review  [low]   conf=0.620

The runtime thinks, decides, remembers, and recovers — automatically.

How It Works

Event (CPU spike, anomaly, timeout, auth failure...)
  │
  ▼
┌─────────────────┐
│  Context Engine │  → Analyzes conditions: risk, stability, pressure score
└────────┬────────┘
         │
         ▼
┌──────────────────────┐
│  Confidence Engine   │  → Calculates adaptive confidence (with decay + history)
└────────┬─────────────┘
         │
         ▼
┌──────────────────┐
│  Decision Engine │  → Selects action: restart / throttle / rollback / recover...
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│   State Engine   │  → Persists state to SQLite (survives crashes)
└────────┬─────────┘
         │
         ▼
┌──────────────────────┐
│   Recovery Engine    │  → Creates checkpoint, handles retry with back-off
└──────────────────────┘

Quick Start

pip install pydantic aiosqlite

import asyncio
from adaptive_runtime import Runtime

async def main():
    runtime = Runtime(agent_id="my-agent")
    await runtime.start()

    result = await runtime.process({
        "type": "service_overload",
        "severity": 0.82,
        "cpu": 94,
        "memory": 88,
    })

    print(result.action)      # "restart_service"
    print(result.confidence)  # 0.7831
    print(result.reason)      # "high_resource_pressure"
    print(result.priority)    # "high"

    await runtime.stop()

asyncio.run(main())

That's it. No API keys. No cloud setup. No GPU. Runs on a $5 VPS.

Killer Example: Adaptive Monitoring System

import asyncio
from adaptive_runtime import Runtime

async def monitor():
    runtime = Runtime(agent_id="prod-monitor", checkpoint_every=5)

    # Subscribe to critical events
    @runtime.bus.subscribe("anomaly_detected")
    async def on_anomaly(event):
        print(f"  ⚠ Anomaly handler fired — severity={event['severity']}")

    await runtime.start()

    # Simulate real production events
    events = [
        {"type": "service_overload", "severity": 0.91, "cpu": 96, "memory": 92},
        {"type": "anomaly_detected",  "severity": 0.74, "error_rate": 0.6},
        {"type": "auth_failure",      "severity": 0.55},
        {"type": "timeout",           "severity": 0.45, "latency_ms": 4200},
        {"type": "recovery_needed",   "severity": 0.30},
    ]

    for event in events:
        result = await runtime.process(event)
        print(f"  [{result.priority.upper()}] {event['type']:25s} → {result.action}")

    # Runtime remembers everything
    history = await runtime.event_history(limit=5)
    print(f"\n  Last {len(history)} events remembered across sessions.")

    await runtime.stop()

asyncio.run(monitor())

Output:

  [HIGH]    service_overload          → scale_up_immediate
  [NORMAL]  anomaly_detected          → flag_for_review
  ⚠ Anomaly handler fired — severity=0.74
  [NORMAL]  auth_failure              → trigger_security_audit
  [LOW]     timeout                   → cache_warmup
  [LOW]     recovery_needed           → run_recovery

  Last 5 events remembered across sessions.

Why Not LangChain?

This question will come up. Here's the honest answer:

	LangChain / AutoGen	Adaptive Runtime
Purpose	LLM orchestration	Runtime behavior
Core abstraction	Prompt chains	Stateful events
Intelligence	Language model	Probabilistic engine
Dependencies	Heavy (openai, tiktoken, ...)	Minimal (pydantic, aiosqlite)
GPU required	Sometimes	Never
Crash recovery	❌	✅ Built-in
State persistence	External setup	✅ Built-in SQLite
Confidence scoring	❌	✅ Adaptive
Runs on $5 VPS	Barely	✅ Designed for it
Use case	Chat, RAG, agents	Runtime resilience

TL;DR: LangChain makes LLMs useful. Adaptive Runtime makes AI systems reliable.
They solve different problems. Use both, or use this standalone.

Runtime Philosophy

Most AI problems in production are not model problems.
They are runtime problems.

Adaptive Runtime is built around the belief that future AI systems need:

Memory — state that survives crashes and restarts
Resilience — self-healing with checkpoints and retry logic
Contextual behavior — decisions that adapt to real conditions
Confidence awareness — knowing how certain a decision is
Lightweight cognition — intelligence without neural dependency

Not just prompts. Not just workflows. Runtime intelligence.

The 5 Core Engines

1. State Engine

Persistent agent memory. Survives crashes. SQLite by default.

await state_engine.save_state({"health": "ok", "version": "1.2"})
state = await state_engine.load_state()          # Restored after restart
await state_engine.patch_state({"last": "ok"})   # Partial update

2. Context Engine

Transforms raw signals into contextual understanding — no ML needed.

ctx = context_engine.analyze({
    "type": "service_overload", "cpu": 94, "memory": 88, "severity": 0.82
})
# → risk="high", stability="low", context="resource_pressure", pressure=0.65

3. Confidence Engine

Adaptive probabilistic scoring with historical weighting and decay.

conf = confidence_engine.calculate(event, context_risk="high")
# → conf.final = 0.7831  (lower when risk is high, adapts from history)

confidence_engine.record_outcome(success=True, confidence=0.78, context_risk="high")

4. Decision Engine

Explainable rule-based action selection. Extensible with custom rules.

decision = decision_engine.decide(event, "resource_pressure", "high", 0.78)
# → action="restart_service", reason="high_resource_pressure", priority="high"

# Add your own rules:
custom_rules = [("my_context", "high", 0.70, "my_action", "my_reason")]
engine = DecisionEngine(custom_rules=custom_rules)

5. Recovery Engine

Crash recovery, checkpoint snapshots, exponential back-off retry.

await recovery_engine.create_checkpoint(state)    # Save checkpoint
state = await recovery_engine.restore_latest()    # Restore after crash
result = await recovery_engine.retry(fn, fallback=fallback_fn)  # Retry with back-off

Designed for Constrained Environments

✅ Raspberry Pi
✅ $5 VPS (512MB RAM)  
✅ Old laptop
✅ Edge devices
✅ Offline / air-gapped systems
✅ Serverless (cold start friendly)

No GPU. No cloud lock-in. No heavy ML frameworks.
Just Python + asyncio + SQLite.

Project Structure

adaptive_runtime/
│
├── core/
│   ├── state_engine.py       # State persistence and memory
│   ├── context_engine.py     # Event → contextual classification
│   ├── confidence_engine.py  # Adaptive probabilistic confidence
│   ├── decision_engine.py    # Rule-based action selection
│   └── recovery_engine.py    # Crash recovery + retry orchestration
│
├── runtime/
│   ├── runtime_manager.py    # Main orchestrator (Runtime class)
│   ├── event_bus.py          # Async pub/sub event bus
│   └── cache.py              # TTL-based in-memory cache
│
├── storage/
│   ├── sqlite_store.py       # Async SQLite persistence
│   └── memory_store.py       # In-process ephemeral store (testing)
│
├── observability/
│   ├── logger.py             # Structured color logger
│   └── metrics.py            # Lightweight in-memory metrics
│
├── examples/
│   ├── agent_demo.py         # Basic event processing
│   ├── monitoring_demo.py    # Continuous monitoring + event bus
│   └── automation_demo.py    # Retry + crash recovery
│
└── tests/
    └── test_engines.py       # 12 unit tests — all engines

Run the Examples

# Clone
git clone https://github.com/stateflow-dev/adaptive-runtime.git
cd adaptive-runtime

# Install
pip install pydantic aiosqlite

# Run demos
python examples/agent_demo.py
python examples/monitoring_demo.py
python examples/automation_demo.py

# Run tests
pip install pytest pytest-asyncio
pytest tests/ -v
# → 12 passed

Roadmap

	Feature	Status
✅	5 Core Engines	Tier 1 — Released
✅	SQLite + Memory store	Tier 1 — Released
✅	Async event bus	Tier 1 — Released
✅	Retry + crash recovery	Tier 1 — Released
🔜	REST API adapter (FastAPI)	Tier 2
🔜	Multi-agent orchestration	Tier 2
🔜	Plugin system	Tier 2
🔜	Real-time dashboard	Tier 2
🔜	Distributed runtime	Tier 3

Benchmarks

Measured on a mid-range Windows laptop (Python 3.10, SQLite, no GPU).

Metric	Result
Cold start	446 ms
Idle memory	29 MB
CPU idle usage	<0%
SQLite save latency	36.5 ms avg (n=50)
SQLite load latency	2.7 ms avg (n=50)
Event processing	109.2 ms avg (n=50)
GPU required	❌ Never

Runs comfortably on a $5 VPS (512MB RAM). No GPU. No cloud lock-in.

Contributing

Issues and PRs welcome. Please open an issue first for major changes.

License

"The biggest AI problems in production are not model problems.
They are runtime problems."

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
core		core
examples		examples
observability		observability
runtime		runtime
storage		storage
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
demo.yml		demo.yml
demo_record.py		demo_record.py
pyproject.toml		pyproject.toml
render1779972984877.gif		render1779972984877.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Runtime

The Problem

See It Running

How It Works

Quick Start

Killer Example: Adaptive Monitoring System

Why Not LangChain?

Runtime Philosophy

The 5 Core Engines

1. State Engine

2. Context Engine

3. Confidence Engine

4. Decision Engine

5. Recovery Engine

Designed for Constrained Environments

Project Structure

Run the Examples

Roadmap

Benchmarks

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adaptive Runtime

The Problem

See It Running

How It Works

Quick Start

Killer Example: Adaptive Monitoring System

Why Not LangChain?

Runtime Philosophy

The 5 Core Engines

1. State Engine

2. Context Engine

3. Confidence Engine

4. Decision Engine

5. Recovery Engine

Designed for Constrained Environments

Project Structure

Run the Examples

Roadmap

Benchmarks

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages