|
| 1 | +--- |
| 2 | +tags: |
| 3 | + - article |
| 4 | + - link |
| 5 | +author: eric |
| 6 | +linkTitle: Dual-loop BDD is the new Red-green TDD |
| 7 | +date: 2026-03-15 10:47:46 -07:00 |
| 8 | +linkUrl: https://justin.searls.co/posts/dual-loop-bdd-is-the-new-red-green-tdd/ |
| 9 | +teaser: >- |
| 10 | + Agents prompted with “red-green TDD” write most of the tests first, then all the code — |
| 11 | + skipping the part of TDD that actually matters. Dual-loop BDD might be a better prompt |
| 12 | + strategy. |
| 13 | +--- |
| 14 | + |
| 15 | +[Justin Searls][link]: |
| 16 | + |
| 17 | +> My initial prompt simply told the agent to practice “red-green TDD” (a phrase I had never |
| 18 | +> heard of until it was discovered that LLMs apparently interpret it as “real TDD”). This |
| 19 | +> approach turned out to be woefully insufficient. Why? Because agents follow the path of |
| 20 | +> least resistance and will invariably write a shitload of unit tests chasing the local |
| 21 | +> maximum of code coverage without any regard for the global maximum of making sure shit |
| 22 | +> actually works. |
| 23 | +
|
| 24 | +While I haven’t had _bad_ results from the `red-green TDD` prompt trick, Claude Code has not |
| 25 | +been doing what I expected with this instruction. What I see is that it: |
| 26 | + |
| 27 | +1. Writes most, if not all, of the tests first. |
| 28 | +2. Runs the tests to see them fail. |
| 29 | +3. Writes the implementation. |
| 30 | +4. Runs tests again, and if they don’t pass, fixes the implementation and/or the tests in a |
| 31 | + single step. (Opus 4.6 will often say something along the lines of “I’ll fix the tests |
| 32 | + and the code simultaneously.”) |
| 33 | + |
| 34 | +This is not TDD! At least not as I know it and use it. What I expect the agent to do when |
| 35 | +told to TDD is something more like this: |
| 36 | + |
| 37 | +1. Write a single test. |
| 38 | +2. Run the tests to make sure it fails in a _nontrivial and expected way_. |
| 39 | +3. Write a minimal amount of code to make the test pass (and run the test again to make sure |
| 40 | + it passes). |
| 41 | +4. Refactor the code, and run the tests to make sure they still pass. |
| 42 | +5. Repeat until the planned change is complete. |
| 43 | + |
| 44 | +This is why it has been puzzling to me to see folks say that the `red-green TDD` prompt is |
| 45 | +actually getting agents to do TDD. You’re not doing TDD if you are writing all the tests at |
| 46 | +once and seeing them all fail for trivial reasons like nonexistent |
| 47 | +symbols/methods/properties. You actually need to see the expectations of the tests fail for |
| 48 | +you to be sure that they are justifying their existence. |
| 49 | + |
| 50 | +I suspect part of the confusion here is that TDD is a buzzword that has lost its original |
| 51 | +meaning, so when people say their agents are doing TDD they don’t know that they’re not. |
| 52 | + |
| 53 | +Apparently though, Justin has gotten Opus 4.6 to do something a lot closer to what I have |
| 54 | +been expecting: |
| 55 | + |
| 56 | +> So yesterday I updated the prompt with the more sophisticated dual-loop approach developed |
| 57 | +> by folks like [Dan North] and other adherents of [behavior-driven development] in the late |
| 58 | +> aughts. It is best illustrated by two concentric circles: you begin each feature with a |
| 59 | +> failing integration test, then dive into an inner loop for numerous red-green-refactor |
| 60 | +> iterations of unit tests, then pop back out again once the outer loop’s integration test |
| 61 | +> passes. |
| 62 | +
|
| 63 | +[Dan North]: https://dannorth.net/ |
| 64 | +[behavior-driven development]: https://en.wikipedia.org/wiki/Behavior-driven_development |
| 65 | + |
| 66 | +I was [expecting] that we’d need a mode or custom agent workflow to pull this off. I’m still |
| 67 | +doubtful that strictly adhering to a prompted workflow like this is a strength of agents, |
| 68 | +but I’ll have to give this a try. |
| 69 | + |
| 70 | +On a related note, I’ve started wondering if agents really need such a fine-grained TDD |
| 71 | +loop. Maybe we just need to ask them to present the tests they plan to write during the |
| 72 | +planning phase, in BDD style. |
| 73 | + |
| 74 | +[expecting]: /links/2026/02/03/where-are-the-test-driven-agentic-workflows-anyway/ |
0 commit comments