Python — 84-step root cause analysis with sequential-thinking MCP.
Live site: withagents.dev/posts/post-13-sequential-thinking Field journal entry: withagents.dev/posts/post-13-sequential-thinking
Sequential thinking transforms chaotic debugging into systematic root cause analysis across multi-layer systems.
Two days. Four engineers. Nobody could find the bug. Then 84 sequential thinking steps traced it from a React component through 4 system layers to a single off-by-one error in a database query. This framework structures debugging as explicit thinking chains that maintain context across architectural boundaries, propagate quantitative constraints, and trace causality backward from symptoms to root causes.
graph TD
A[React Player] -->|chunk=0| B[API Gateway]
B -->|byte range| C[CDN Node]
C -->|offset from DB| D[PostgreSQL]
D -->|integer division| E["offset = file_size // 8"]
E -->|truncation| F[Skips WAV header bytes]
F -->|audio starts late| A
style E fill:#ef4444,color:#fff
style F fill:#ef4444,color:#fff
subgraph Thinking Chain
S1[Step 1: Quantify constraint] --> S23[Step 23: Eliminate race condition]
S23 --> S47[Step 47: Find CDN anomaly]
S47 --> S68[Step 68: Trace backward to DB]
S68 --> S84[Step 84: Root cause confirmed]
end
pip install -e .
seq-debug analyze --symptom "audio skips first 3 seconds" --frequency "12.5%" --layers react,api,cdn,dbfrom sequential_thinking_debugging.core import (
ThinkingChain, SystemLayer, QuantitativeConstraint, DebugSession
)
# Define system layers
layers = [
SystemLayer(name="React Player", entry_point="AudioPlayer.tsx"),
SystemLayer(name="API Gateway", entry_point="routes/audio.py"),
SystemLayer(name="CDN", entry_point="cdn-config.yaml"),
SystemLayer(name="PostgreSQL", entry_point="queries/audio_chunks.sql"),
]
# Start a debugging chain
session = DebugSession(layers=layers)
chain = session.create_chain(
symptom="Audio stories skip first 3 seconds",
constraint=QuantitativeConstraint(
description="Occurs 12.5% of the time (1 in 8 plays)",
value=0.125,
),
)
# Each step builds on previous context
chain.add_step("12.5% = 1/8. What creates an 8-state cycle in this system?")
chain.add_step("CDN uses 8-server rotation. Could one node behave differently?")
# ...
chain.add_step("Root cause: integer division truncation in offset = file_size // 8")
report = chain.generate_report()
print(f"Root cause found at step {report.root_cause_step} of {report.total_steps}")
print(f"Layers crossed: {report.layers_traced}")- Start with the quantitative constraint -- what specific frequency or pattern describes the failure?
- Map all system layers -- every boundary the data crosses
- Make hypotheses explicit -- state, predict, check, then move on
- Trace backward from symptom -- the bug lives at a layer boundary
- Do not stop at "unusual" -- unusual is not root cause; keep asking why
This is the companion repo for Agentic Development #17: 84 Thinking Steps to Find a One-Line Bug.
MIT