Formally verified state machine for LLM orchestration — TLA+ model checking with 136K states, 8 safety invariants, zero violations #196257

arcadamarket · 2026-05-19T00:54:38Z

arcadamarket
May 19, 2026

🏷️ Discussion Type

Product Feedback

Body

We applied formal verification (TLA+ model checking) to an LLM orchestration system and wanted to share the results with the community.

Context

RAG Runtime Kernel is an open-source, filesystem-backed state management system for LLMs. It uses a deterministic state machine (BOOTING → READY → WORKING → CHECKPOINTING → CLOSING) with event sourcing, write-ahead logging, and atomic writes to give any LLM persistent, crash-recoverable memory.

The v3.2 Runtime Bridge (8 Python modules, 337 unit tests, 5811 lines) implements ENFORCED mode — hard runtime validation of every state transition.

Why formal verification?

Unit tests cover the cases you think of. Formal verification covers all reachable states, including adversarial crash/recovery interleavings that are nearly impossible to reach with conventional testing.

We wrote a 555-line TLA+ specification encoding the exact transition table, WAL semantics, proposal lifecycle, and crash/recovery behavior from the Python implementation.

TLC model checker results

Metric	Value
States generated	136,193 total
Distinct states	84,261
Search depth	18
Time	6 seconds
Safety invariants checked	8
Violations found	0

All 8 safety invariants passed:

TypeInvariant — all variables hold declared types
TransitionSafety — every state reachable from BOOTING via legal edges only
SingleWriter — at most one proposal staged at a time
WALConsistency — WAL is append-only, monotone, never lags behind state
TerminalSafety — CLOSING is stable (no exit, no crash flag, no pending proposal)
NoDeadlock — non-terminal, non-crashed states always have enabled actions
CrashRecoveryConsistency — crashed=TRUE implies state=RECOVERY
WALPrecedesStateChange — WAL entry exists before state advances

What we found during verification

TLC discovered a genuine BOOTING↔RECOVERY infinite loop where RecoveryComplete nondeterministically chose BOOTING over READY forever. This was a real livelock that unit tests hadn't caught. We fixed it by strengthening fairness from WF to SF on RecoveryComplete(READY).

Liveness properties (EventualProgress, EventualTermination, ProposalEventuallyResolved) are defined but deferred to Phase 2 — the bounded WAL model creates false positives when the WAL fills up.

Takeaway

If you're building systems where LLMs control state transitions, formal verification is practical and catches real bugs. The TLA+ spec took a few hours to write and found a livelock that 337 unit tests missed.

The full TLA+ spec, TLC configuration, and results are in the formal/ directory. AGPL-3.0 licensed.

Interested in feedback from anyone working on formal methods for AI/LLM systems.

Guidelines

I have read and understood this category's guidelines before making this post.

EDDOEDDO · 2026-05-19T00:55:17Z

github-actions[bot]
Bot May 19, 2026

💬 Your Product Feedback Has Been Submitted 🎉

Thank you for taking the time to share your insights with us! Your feedback is invaluable as we build a better GitHub experience for all our users.

Here's what you can expect moving forward ⏩

Your input will be carefully reviewed and cataloged by members of our product teams.
- Due to the high volume of submissions, we may not always be able to provide individual responses.
- Rest assured, your feedback will help chart our course for product improvements.
Other users may engage with your post, sharing their own perspectives or experiences.
GitHub staff may reach out for further clarification or insight.
- We may 'Answer' your discussion if there is a current solution, workaround, or roadmap/changelog post related to the feedback.

Where to look to see what's shipping 👀

Read the Changelog for real-time updates on the latest GitHub features, enhancements, and calls for feedback.
Explore our Product Roadmap, which details upcoming major releases and initiatives.

What you can do in the meantime 💻

Upvote and comment on other user feedback Discussions that resonate with you.
Add more information at any point! Useful details include: use cases, relevant labels, desired outcomes, and any accompanying screenshots.

As a member of the GitHub community, your participation is essential. While we can't promise that every suggestion will be implemented, we want to emphasize that your feedback is instrumental in guiding our decisions and priorities.

Thank you once again for your contribution to making GitHub even better! We're grateful for your ongoing support and collaboration in shaping the future of our platform. ⭐

1 reply

EDDOEDDO May 19, 2026

Men

marcelomuckeiro2028-lang · 2026-05-19T04:19:55Z

marcelomuckeiro2028-lang
May 19, 2026

eu quero muito usar o git copilot mas eu nao tenho condicao eu sou esrudante atualmente estpu desempregado

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

Formally verified state machine for LLM orchestration — TLA+ model checking with 136K states, 8 safety invariants, zero violations #196257

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GitHub Community

Formally verified state machine for LLM orchestration — TLA+ model checking with 136K states, 8 safety invariants, zero violations #196257

Uh oh!

arcadamarket May 19, 2026

🏷️ Discussion Type

Body

Context

Why formal verification?

TLC model checker results

All 8 safety invariants passed:

What we found during verification

Takeaway

Guidelines

Replies: 2 comments · 1 reply

Uh oh!

github-actions[bot] Bot May 19, 2026

Uh oh!

EDDOEDDO May 19, 2026

Uh oh!

marcelomuckeiro2028-lang May 19, 2026

arcadamarket
May 19, 2026

Replies: 2 comments 1 reply

github-actions[bot]
Bot May 19, 2026

marcelomuckeiro2028-lang
May 19, 2026