Skip to content

Commit 3d8112e

Browse files
committed
feat: add link post on dual-loop BDD and agent TDD workflows
1 parent 85ad6f6 commit 3d8112e

File tree

2 files changed

+79
-0
lines changed
  • www/links/2026
    • 02/03/where-are-the-test-driven-agentic-workflows-anyway
    • 03/15/dual-loop-bdd-new-red-green-tdd

2 files changed

+79
-0
lines changed

www/links/2026/02/03/where-are-the-test-driven-agentic-workflows-anyway/index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,8 @@ system prompts are interfering.
3232

3333
I honestly wouldn’t be surprised if next Tuesday Claude Code _does_ ship a TDD mode that
3434
explicitly takes the LLM through the TDD process.
35+
36+
**Update:** Justin Searls has had some success getting Opus 4.6 to follow a [dual-loop BDD
37+
workflow][dual-loop].
38+
39+
[dual-loop]: /links/2026/03/15/dual-loop-bdd-new-red-green-tdd/
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
tags:
3+
- article
4+
- link
5+
author: eric
6+
linkTitle: Dual-loop BDD is the new Red-green TDD
7+
date: 2026-03-15 10:47:46 -07:00
8+
linkUrl: https://justin.searls.co/posts/dual-loop-bdd-is-the-new-red-green-tdd/
9+
teaser: >-
10+
Agents prompted with “red-green TDD” write most of the tests first, then all the code —
11+
skipping the part of TDD that actually matters. Dual-loop BDD might be a better prompt
12+
strategy.
13+
---
14+
15+
[Justin Searls][link]:
16+
17+
> My initial prompt simply told the agent to practice “red-green TDD” (a phrase I had never
18+
> heard of until it was discovered that LLMs apparently interpret it as “real TDD”). This
19+
> approach turned out to be woefully insufficient. Why? Because agents follow the path of
20+
> least resistance and will invariably write a shitload of unit tests chasing the local
21+
> maximum of code coverage without any regard for the global maximum of making sure shit
22+
> actually works.
23+
24+
While I haven’t had _bad_ results from the `red-green TDD` prompt trick, Claude Code has not
25+
been doing what I expected with this instruction. What I see is that it:
26+
27+
1. Writes most, if not all, of the tests first.
28+
2. Runs the tests to see them fail.
29+
3. Writes the implementation.
30+
4. Runs tests again, and if they don’t pass, fixes the implementation and/or the tests in a
31+
single step. (Opus 4.6 will often say something along the lines of “I’ll fix the tests
32+
and the code simultaneously.”)
33+
34+
This is not TDD! At least not as I know it and use it. What I expect the agent to do when
35+
told to TDD is something more like this:
36+
37+
1. Write a single test.
38+
2. Run the tests to make sure it fails in a _nontrivial and expected way_.
39+
3. Write a minimal amount of code to make the test pass (and run the test again to make sure
40+
it passes).
41+
4. Refactor the code, and run the tests to make sure they still pass.
42+
5. Repeat until the planned change is complete.
43+
44+
This is why it has been puzzling to me to see folks say that the `red-green TDD` prompt is
45+
actually getting agents to do TDD. You’re not doing TDD if you are writing all the tests at
46+
once and seeing them all fail for trivial reasons like nonexistent
47+
symbols/methods/properties. You actually need to see the expectations of the tests fail for
48+
you to be sure that they are justifying their existence.
49+
50+
I suspect part of the confusion here is that TDD is a buzzword that has lost its original
51+
meaning, so when people say their agents are doing TDD they don’t know that they’re not.
52+
53+
Apparently though, Justin has gotten Opus 4.6 to do something a lot closer to what I have
54+
been expecting:
55+
56+
> So yesterday I updated the prompt with the more sophisticated dual-loop approach developed
57+
> by folks like [Dan North] and other adherents of [behavior-driven development] in the late
58+
> aughts. It is best illustrated by two concentric circles: you begin each feature with a
59+
> failing integration test, then dive into an inner loop for numerous red-green-refactor
60+
> iterations of unit tests, then pop back out again once the outer loop’s integration test
61+
> passes.
62+
63+
[Dan North]: https://dannorth.net/
64+
[behavior-driven development]: https://en.wikipedia.org/wiki/Behavior-driven_development
65+
66+
I was [expecting] that we’d need a mode or custom agent workflow to pull this off. I’m still
67+
doubtful that strictly adhering to a prompted workflow like this is a strength of agents,
68+
but I’ll have to give this a try.
69+
70+
On a related note, I’ve started wondering if agents really need such a fine-grained TDD
71+
loop. Maybe we just need to ask them to present the tests they plan to write during the
72+
planning phase, in BDD style.
73+
74+
[expecting]: /links/2026/02/03/where-are-the-test-driven-agentic-workflows-anyway/

0 commit comments

Comments
 (0)