Formally verified state machine for LLM orchestration — TLA+ model checking with 136K states, 8 safety invariants, zero violations #196257
Replies: 2 comments 1 reply
-
|
💬 Your Product Feedback Has Been Submitted 🎉 Thank you for taking the time to share your insights with us! Your feedback is invaluable as we build a better GitHub experience for all our users. Here's what you can expect moving forward ⏩
Where to look to see what's shipping 👀
What you can do in the meantime 💻
As a member of the GitHub community, your participation is essential. While we can't promise that every suggestion will be implemented, we want to emphasize that your feedback is instrumental in guiding our decisions and priorities. Thank you once again for your contribution to making GitHub even better! We're grateful for your ongoing support and collaboration in shaping the future of our platform. ⭐ |
Beta Was this translation helpful? Give feedback.
-
|
eu quero muito usar o git copilot mas eu nao tenho condicao eu sou esrudante atualmente estpu desempregado |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
🏷️ Discussion Type
Product Feedback
Body
We applied formal verification (TLA+ model checking) to an LLM orchestration system and wanted to share the results with the community.
Context
RAG Runtime Kernel is an open-source, filesystem-backed state management system for LLMs. It uses a deterministic state machine (BOOTING → READY → WORKING → CHECKPOINTING → CLOSING) with event sourcing, write-ahead logging, and atomic writes to give any LLM persistent, crash-recoverable memory.
The v3.2 Runtime Bridge (8 Python modules, 337 unit tests, 5811 lines) implements ENFORCED mode — hard runtime validation of every state transition.
Why formal verification?
Unit tests cover the cases you think of. Formal verification covers all reachable states, including adversarial crash/recovery interleavings that are nearly impossible to reach with conventional testing.
We wrote a 555-line TLA+ specification encoding the exact transition table, WAL semantics, proposal lifecycle, and crash/recovery behavior from the Python implementation.
TLC model checker results
All 8 safety invariants passed:
What we found during verification
TLC discovered a genuine BOOTING↔RECOVERY infinite loop where
RecoveryCompletenondeterministically chose BOOTING over READY forever. This was a real livelock that unit tests hadn't caught. We fixed it by strengthening fairness from WF to SF onRecoveryComplete(READY).Liveness properties (EventualProgress, EventualTermination, ProposalEventuallyResolved) are defined but deferred to Phase 2 — the bounded WAL model creates false positives when the WAL fills up.
Takeaway
If you're building systems where LLMs control state transitions, formal verification is practical and catches real bugs. The TLA+ spec took a few hours to write and found a livelock that 337 unit tests missed.
The full TLA+ spec, TLC configuration, and results are in the
formal/directory. AGPL-3.0 licensed.Interested in feedback from anyone working on formal methods for AI/LLM systems.
Guidelines
Beta Was this translation helpful? Give feedback.
All reactions