A collection of Test-Driven Development (TDD) exercises implementing classic programming katas, following Uncle Bob's Clean Code principles and Domain-Driven Design tactical patterns.
These katas demonstrate:
- Test-Driven Development: Production code written only to pass failing tests
- Clean Code Practices: Intent-revealing names, single responsibility, domain-focused abstractions
- Domain-Driven Design: Value Objects enforce domain constraints at the type level
- The Craftsman's Way: Quality is not a trade-off for speed; it is the only way to go fast
Implementation: lib/roman_numerals.dart
Tests: test/roman_numerals_test.dart
Converts integers (1-3999) to Roman numeral notation using a table-driven greedy algorithm.
Roman numerals represent numbers using seven basic symbols with specific combination rules:
Symbols:
I= 1,V= 5,X= 10,L= 50,C= 100,D= 500,M= 1000
Domain Rules:
- Additive Notation: Symbols placed in descending order are summed (e.g.,
VI= 6) - Subtractive Notation: Smaller symbol before larger subtracts (e.g.,
IV= 4) - Repetition Limit: Symbols repeat maximum three times (e.g.,
III= 3, notIIII) - Valid Range: Classical Roman numerals represent 1-3999
Subtractive Pairs (Domain Constraint): Only specific pairs use subtractive notation:
IV(4),IX(9)XL(40),XC(90)CD(400),CM(900)
Value Object Pattern: RomanNumeralInput enforces domain invariants (1-3999 range). Invalid inputs are impossible to construct.
Table-Driven Algorithm: The conversionRules table is the domain model—it directly represents Roman numeral encoding rules.
Greedy Decomposition: The algorithm mirrors how Romans actually encoded numbers: repeatedly subtract the largest applicable value.
import 'package:tdd_katas/roman_numerals.dart';
integerToRoman(1994); // Returns: 'MCMXCIV'
integerToRoman(0); // Throws: ArgumentErrorImplementation: lib/bowling_game.dart
Tests: test/bowling_game_test.dart
Calculates scores for a bowling game following official scoring rules with look-ahead bonus logic for strikes and spares.
A bowling game consists of 10 frames where players roll a ball to knock down pins:
Scoring Rules:
- Normal Frame: Sum of pins knocked down (e.g., 3 + 4 = 7)
- Spare (/): All 10 pins in 2 rolls → Score = 10 + next 1 roll
- Strike (X): All 10 pins in 1 roll → Score = 10 + next 2 rolls
- 10th Frame: Bonus rolls awarded if spare or strike achieved
State Management: Stores all rolls in a list and calculates score by iterating through frames, not individual rolls.
Look-Ahead Logic: Spares and strikes require examining future rolls for bonus calculation—the algorithm walks forward strategically.
Frame Advancement: Strikes consume 1 roll, spares/normal frames consume 2 rolls—the algorithm tracks position correctly.
import 'package:tdd_katas/bowling_game.dart';
final game = BowlingGame();
game.roll(10); // Strike!
game.roll(3);
game.roll(4);
// ... continue rolling
print(game.score()); // Calculates total with bonusesTests for simple cases (gutter game, one spare, one strike) drove an algorithm that automatically handles complex scenarios like perfect games (300 points) without explicit implementation. This is TDD's magic—correct abstractions emerge naturally.
Implementation: lib/gilded_rose.dart
Tests: test/gilded_rose_test.dart
A legacy code refactoring kata demonstrating how to safely transform deeply nested conditionals into clean, extensible code using characterization tests and the Strategy pattern.
An inn's inventory system that updates item quality daily based on complex business rules:
Item Types:
- Normal Items: Quality decreases by 1/day, 2/day after sell-by date
- Aged Brie: Quality increases by 1/day, 2/day after expiration (improves with age!)
- Sulfuras (Legendary): Never changes, quality always 80, never "expires"
- Backstage Passes: Complex appreciation:
- More than 10 days: +1 quality/day
- 10 days or less: +2 quality/day
- 5 days or less: +3 quality/day
- After concert (sellIn < 0): Quality drops to 0
- Conjured Items: Degrade twice as fast as normal items (2/day, 4/day after expiration)
Domain Constraints:
- Quality never negative (≥ 0)
- Quality never exceeds 50 (except Sulfuras at 80)
- Cannot modify the
Itemclass (goblin constraint!)
Characterization Testing: Before touching legacy code, created 17 tests to capture existing behavior as a "safety net." Includes a Golden Master test simulating 30 days.
Strategy Pattern: Each item type gets its own updater class implementing ItemUpdater interface. Eliminates conditional branching and enables Open-Closed Principle.
Helper Methods & Constants: _degradeQuality() and _improveQuality() with automatic clamping eliminate scattered boundary checks. Domain constants (_minQuality, _maxQuality) remove magic numbers.
Factory Pattern: _selectUpdater() method chooses the appropriate strategy based on item name, enabling polymorphic dispatch.
Phase 1: Understand Legacy Code
// 60+ lines of 7-8 level nested conditionals
// Nearly incomprehensible logic mixing all item typesPhase 2: Characterization Tests (GREEN Phase)
- 14 comprehensive tests covering all item types
- Golden Master test: 30-day simulation baseline
- Result: Safety net established ✅
Phase 3: Refactor with Confidence (3 steps)
Step 1 - Extract Methods (REFACTOR):
// Before: 60 lines of nested hell
// After: Clean if-else delegating to 4 private methods
_updateNormalItem(), _updateAgedBrie(),
_updateBackstagePasses(), _updateSulfuras()Step 2 - Strategy Pattern (REFACTOR):
abstract class ItemUpdater {
void update(Item item);
}
class NormalItemUpdater implements ItemUpdater { ... }
class AgedBrieUpdater implements ItemUpdater { ... }
// ... 4 concrete strategiesStep 3 - Domain Helpers (REFACTOR):
const int _minQuality = 0;
const int _maxQuality = 50;
void _degradeQuality(Item item, int amount) {
item.quality = (item.quality - amount).clamp(_minQuality, _maxQuality);
}Phase 4: Add Conjured Items (RED-GREEN)
RED: Added 3 failing tests for Conjured items behavior
GREEN: Created ConjuredItemUpdater class—one class, one condition
Result: Feature added in minutes thanks to refactoring!
import 'package:tdd_katas/gilded_rose.dart';
final items = [
Item('Normal Sword', 10, 20),
Item('Aged Brie', 2, 0),
Item('Sulfuras, Hand of Ragnaros', 0, 80),
Item('Backstage passes to a TAFKAL80ETC concert', 15, 20),
Item('Conjured Mana Cake', 3, 6),
];
final gildedRose = GildedRose(items);
gildedRose.updateQuality(); // Updates all items per domain rules-
Characterization Tests = Freedom: With tests in place, aggressive refactoring felt safe. Every change validated instantly.
-
Strategy Pattern = Extensibility: Adding Conjured items took 5 minutes. Before refactoring, it would have meant diving into nested conditionals and risking bugs.
-
Small Steps = Big Wins: Three refactoring commits transformed spaghetti into clean code. Each step kept tests GREEN, proving behavior preservation.
-
Open-Closed Principle in Action: New item types don't modify existing code—they just add new updater classes. The system is "open for extension, closed for modification."
Implementation: lib/string_calculator.dart
Tests: test/string_calculator_test.dart
A Bug Hunt Kata demonstrating how to use TDD to discover and fix bugs in existing code. Each bug is exposed with a RED test, then fixed with GREEN implementation.
A simple calculator that sums numbers from a string input with various delimiter support:
Features:
- Empty String: Returns 0
- Single Number: Returns that number (
"5"→ 5) - Comma Delimiter: Sums comma-separated numbers (
"1,2,3"→ 6) - Custom Delimiters: Supports format
"//[delimiter]\n[numbers]"("//;\n1;2"→ 3) - Ignore Large Numbers: Numbers > 1000 are ignored (
"2,1001"→ 2)
Different from Previous Katas: This wasn't built test-first. Instead, we started with buggy working code and used tests to expose and fix bugs one by one.
Bug Hunt Process:
- RED: Write test exposing a specific bug
- GREEN: Fix only that bug
- Commit: Document the bug found and fixed
- Repeat: Move to next bug
Bug #1: Empty String Returns Wrong Value
- Bug: Returned 1 instead of 0
- Test:
expect(calculator.add(''), equals(0)) - Fix: Changed return value from 1 to 0
- Commits: RED → GREEN
Bug #2: Single Number Off-By-One
- Bug: Added +1 to parsed number
- Test:
expect(calculator.add('5'), equals(5)) - Expected: 5, Actual: 6
- Fix: Removed
+ 1from parsing - Commits: RED → GREEN
Bug #3: Summation Loop Misses Last Element
- Bug: Loop condition
i < length - 1skipped last item - Test:
expect(calculator.add('1,2'), equals(3)) - Expected: 3, Actual: 1
- Fix: Changed to
i < length - Commits: RED → GREEN
Bug #4: Custom Delimiter Not Extracted
- Bug: Delimiter extraction line was commented out
- Test:
expect(calculator.add('//;\n1;2'), equals(3)) - Error: FormatException trying to parse '1;2'
- Fix: Uncommented
delimiter = parts[0].substring(2) - Commits: RED → GREEN
Bug #5: Missing Feature - Ignore Numbers > 1000
- Bug: All numbers included in sum
- Test:
expect(calculator.add('2,1001'), equals(2)) - Expected: 2, Actual: 1003
- Fix: Added
.where((n) => n <= 1000)filter - Commits: RED → GREEN
import 'package:tdd_katas/string_calculator.dart';
final calculator = StringCalculator();
calculator.add(''); // Returns: 0
calculator.add('5'); // Returns: 5
calculator.add('1,2,3'); // Returns: 6
calculator.add('//;\n1;2'); // Returns: 3
calculator.add('2,1001'); // Returns: 2 (1001 ignored)-
Tests as Bug Detectors: Each test acted like a spotlight, illuminating exactly ONE bug at a time. No guessing—the test tells you what's broken.
-
RED-GREEN Still Works: Even when fixing bugs (not adding features), the RED-GREEN rhythm provides safety. You're never fixing blind.
-
Regression Prevention: After fixing each bug, ALL previous tests stay green. This proves you didn't break something while fixing something else.
-
Incremental Debugging: Fixing one bug at a time with commits creates a clear audit trail. You can see exactly what each bug was and how it was fixed.
-
Real-World Skill: This mirrors production work—most code you touch is existing code with bugs, not greenfield TDD.
Implementation: lib/mars_rover.dart
Tests: test/mars_rover_test.dart
A Command Pattern kata simulating a robotic rover navigating a plateau on Mars. Demonstrates clean separation of concerns, value objects, and command-based control.
A rover explores a rectangular plateau with coordinate-based navigation:
Core Concepts:
- Position: (x, y) coordinates on the plateau grid
- Direction: Cardinal directions (N, E, S, W)
- Plateau: Grid with defined boundaries that wrap around (toroidal topology)
Commands:
L- Turn left 90 degrees (changes direction, not position)R- Turn right 90 degrees (changes direction, not position)M- Move forward one grid point in current direction
Example Navigation:
Starting: (0,0) facing North
Commands: "MMRMMLM"
- MM: Move to (0,2) facing North
- R: Turn to face East (still at 0,2)
- MM: Move to (2,2) facing East
- L: Turn to face North (still at 2,2)
- M: Move to (2,3) facing North
Result: (2,3) facing North
Direction Enum: Encapsulates rotation logic using modular arithmetic. Each direction knows how to turn left/right, eliminating conditional branching.
Value Objects:
Positionis immutable—movement returns new position instancesPlateauencapsulates boundary wrapping logic- Prevents invalid states at the type level
Command Pattern (Implicit): The execute() method delegates to command handlers (turnLeft(), turnRight(), moveForward()). Each command is isolated and testable.
Wrapping Logic: Plateau boundaries wrap around (toroidal topology). Moving past edge (e.g., x=5→6 on 5x5 grid) wraps to opposite side (x=0).
import 'package:tdd_katas/mars_rover.dart';
// Create rover at position (1,2) facing North on 5x5 plateau
final rover = Rover(
x: 1,
y: 2,
direction: 'N',
plateauWidth: 5,
plateauHeight: 5,
);
rover.execute('LMLMLMLMM');
print('Position: (${rover.x}, ${rover.y})'); // Position: (1, 3)
print('Direction: ${rover.direction}'); // Direction: N-
Enums as Behavior Carriers: Direction enum doesn't just store values—it encapsulates rotation logic. Turning left/right becomes
direction.turnLeft(), eliminating lookup tables. -
Value Objects Prevent Bugs: Immutable
Positionmeans movement can't corrupt state. New position calculated, validated, then assigned. Boundary wrapping isolated inPlateau. -
Switch Expressions Shine: Modern Dart's
switchexpression makes direction-based movement elegant and exhaustive. Compiler enforces handling all directions. -
Modular Arithmetic for Rotation:
(index + 1) % 4handles right rotation elegantly. No if-statements, no edge cases—math models the domain perfectly. -
Refactoring Without Fear: Two refactoring commits drastically improved code structure. Tests stayed green throughout, proving behavior preservation.
# Run all tests
dart test
# Run specific test file
dart test test/roman_numerals_test.dart
dart test test/bowling_game_test.dart
dart test test/gilded_rose_test.dart
dart test test/string_calculator_test.dart
dart test test/mars_rover_test.dart
# Run with coverage
dart test --coverage"Clean code that works." — Ron Jeffries
Every kata in this collection follows:
- Uncle Bob's Clean Code: Intent-revealing names, functions do one thing, no comments needed
- Kent Beck's TDD: Red-Green-Refactor discipline, tests first
- Eric Evans' DDD: Domain concepts drive the model, tactical patterns enforce boundaries
- The Boy Scout Rule: Every commit leaves the code cleaner than before
- Red: Write a failing test
- Green: Write the simplest code to pass
- Refactor: Clean up duplication, improve names
- Repeat: Let the design emerge from tests
Tests are organized by domain concepts, not technical structure:
Basic Symbols: Tests for the seven fundamental symbols (I, V, X, L, C, D, M)
Subtractive Notation: Tests for all six subtractive pairs, verifying the domain rule
Additive Combinations: Tests for repeated symbols and multi-symbol sequences
Complex Edge Cases: Stress tests combining multiple rules:
1994 → MCMXCIV(year notation)3999 → MMMCMXCIX(maximum valid value)444 → CDXLIV(all subtractive positions)
Constraint Validation: Boundary tests for the valid range (1-3999)
- Red: Tests for 1-5 (basic additive, first subtractive case)
- Green: Minimal implementation with conditionals
- Refactor: Extract symbol mapping, clarify intent
- Red: Tests for 6-10 (reveals pattern)
- Green: Extend conditionals
- Refactor: Recognize duplication → Table-driven approach emerges
- Red: Tests for 40-1000 (remaining symbols)
- Green: Extend conversion table (algorithm unchanged)
- Red: Edge cases and constraint tests
- Green: Add
RomanNumeralInputValue Object - Refactor: Extract validation, organize tests by domain concept
The Algorithm Never Changed: After the table-driven refactoring, adding 40-1000 required zero logic modifications. This validates the abstraction.
Type System as Domain Enforcer: RomanNumeralInput makes invalid states unrepresentable. You cannot construct a Roman numeral for 0 or 4000—the compiler prevents it.
Tests as Living Documentation: Test names use ubiquitous language from the Roman numeral domain. A domain expert could read the test file and recognize the rules they explained.
Tests are organized by scoring complexity, mirroring how the domain rules build on each other:
Basic Scoring:
- Gutter game (all zeros)
- All ones (simple addition)
Spare Bonus (next 1 roll):
- One spare in first frame
- All spares (150 points)
Strike Bonus (next 2 rolls):
- One strike in first frame
- Perfect game (300 points)
Complex Scenarios:
- Combinations of strikes, spares, and normal frames
- Red: Gutter game test
- Green: Return 0 (simplest implementation)
- Red: All ones test
- Green: Store rolls, sum them in
score() - Refactor: Extract
rollMany()helper, addsetUp() - Red: One spare test
- Green: Detect spare, add look-ahead bonus (+1 roll)
- Refactor: Extract
_isSpare()helper - Red: One strike test
- Green: Detect strike, add look-ahead bonus (+2 rolls)
- Refactor: Extract
_isStrike(), clean up frame advancement - Validate: Perfect game test passes without modification!
Emergent Design: The algorithm structure wasn't planned upfront. Tests for simple cases forced:
- Frame-based iteration (not roll-based)
- Index tracking (advancing by 1 or 2)
- Look-ahead logic (accessing future rolls)
The Perfect Game Moment: Writing code to handle "one spare" and "one strike" automatically handled "12 consecutive strikes" (300 points). The algorithm correctly models the domain, so all valid games work.
State vs. Behavior: Initially tempting to model Frame objects with state. TDD revealed a simpler truth: just store rolls and calculate on-demand. No frame objects needed.
Unlike the previous two katas (greenfield TDD), Gilded Rose simulates real-world legacy code refactoring. You inherit messy, working code with no tests and must:
- Understand what it does (without breaking it)
- Add tests to capture behavior
- Refactor safely
- Add new features
Tests are organized by item type behavior and include a Golden Master:
Normal Items:
- Quality degradation (1/day, 2/day after expiration)
- Quality never negative
Aged Brie:
- Quality appreciation (improves with age)
- Respects quality cap (≤ 50)
Sulfuras (Legendary Items):
- Never changes (quality, sellIn)
- Always quality 80
Backstage Passes:
- Threshold-based appreciation (10 days, 5 days)
- Drops to 0 after concert
Conjured Items (new feature):
- Degrades 2x faster than normal items
Golden Master Test:
- 30-day simulation with all item types
- Captures baseline output before refactoring
- Detects any behavioral regression
Phase 1: Create Legacy Code
- Intentionally nested 7-8 levels deep
- Mixed concerns (all item types in one method)
- Magic numbers scattered throughout
- Result: Represents realistic legacy code
Phase 2: Characterization Tests (GREEN)
- 14 comprehensive tests written before any refactoring
- Golden Master baseline captured
- Commit:
"GREEN: Add characterization tests" - Result: Safety net established ✅
Phase 3: Refactor in Small Steps (3 REFACTOR commits)
Step 1 - Extract Methods:
// Before: 60 lines, items[i] everywhere, deeply nested
for (var i = 0; i < items.length; i++) {
if (items[i].name != 'Aged Brie' && ...) {
if (items[i].quality > 0) {
if (items[i].name != 'Sulfuras...') {
// ... 5 more levels ...
// After: Clean delegation, readable
for (final item in items) {
if (item.name == 'Sulfuras, Hand of Ragnaros') {
_updateSulfuras(item);
} else if (item.name == 'Aged Brie') {
_updateAgedBrie(item);
// ...
}Commit: "REFACTOR: Extract item type methods from nested conditionals"
Step 2 - Introduce Strategy Pattern:
abstract class ItemUpdater {
void update(Item item);
}
class NormalItemUpdater implements ItemUpdater {
@override
void update(Item item) {
_degradeQuality(item, 1);
item.sellIn -= 1;
if (item.sellIn < 0) {
_degradeQuality(item, 1);
}
}
}
// ... 4 concrete strategies
ItemUpdater _selectUpdater(Item item) { ... }Commit: "REFACTOR: Introduce Strategy pattern for item types"
Step 3 - Extract Domain Helpers:
const int _minQuality = 0;
const int _maxQuality = 50;
void _degradeQuality(Item item, int amount) {
item.quality = (item.quality - amount).clamp(_minQuality, _maxQuality);
}
void _improveQuality(Item item, int amount) {
item.quality = (item.quality + amount).clamp(_minQuality, _maxQuality);
}Commit: "REFACTOR: Extract helper methods and domain constants"
All 14 tests stayed GREEN throughout! 🟢
Phase 4: Add Conjured Items (RED-GREEN-REFACTOR)
RED: Write failing tests
test('degrade in quality twice as fast as normal items', () {
final items = [Item('Conjured Mana Cake', 10, 20)];
GildedRose(items).updateQuality();
expect(items[0].quality, equals(18)); // -2 instead of -1
});Result: 2 tests FAILING ❌ (Expected: 18, Actual: 19)
Commit: "RED: Add failing tests for Conjured items"
GREEN: Implement minimal solution
class ConjuredItemUpdater implements ItemUpdater {
@override
void update(Item item) {
_degradeQuality(item, 2); // 2x normal rate
item.sellIn -= 1;
if (item.sellIn < 0) {
_degradeQuality(item, 2); // 4x total after expiration
}
}
}
ItemUpdater _selectUpdater(Item item) {
// ... existing checks ...
} else if (item.name.startsWith('Conjured')) {
return ConjuredItemUpdater();
} else {
return NormalItemUpdater();
}
}Result: All 17 tests PASSING ✅
Commit: "GREEN: Implement Conjured items degrading twice as fast"
REFACTOR: Already clean! No duplication, clear names, reusing helpers.
Decision: Skip refactor commit—code is already excellent.
| Metric | Before Refactoring | After Refactoring |
|---|---|---|
| Cyclomatic Complexity | ~25 (very high) | ~3 per class (low) |
| Lines per Method | 60+ | 5-10 |
| Max Nesting | 7-8 levels | 2 levels |
| Time to Add Feature | Hours (risky) | Minutes (safe) |
| Test Coverage | 0% → 100% | 100% (maintained) |
Characterization Tests Are Your Lifeline: Without tests, refactoring is guesswork. With tests, it's engineering. Every change validated in milliseconds.
Small Steps = Low Risk: Three refactoring commits, each preserving behavior. No "big bang" rewrite—steady, safe progress.
Strategy Pattern = Future-Proofing: Adding Conjured items demonstrated the payoff:
- Before refactoring: Would require diving into nested conditionals, risking bugs
- After refactoring: One new class, one condition, done in 5 minutes
Golden Master Testing: The 30-day simulation test caught edge cases that individual unit tests missed. It serves as a comprehensive regression detector.
Open-Closed Principle Validated: New item types extend the system without modifying existing updater classes. The design is "open for extension, closed for modification."
Refactoring ≠ Rewriting: We never changed what the code does, only how it's structured. Tests prove behavioral equivalence at every step.
| Aspect | Roman Numerals | Bowling Game | Gilded Rose | String Calculator | Mars Rover |
|---|---|---|---|---|---|
| Complexity | Beginner | Intermediate | Advanced | Beginner | Intermediate |
| Approach | Greenfield TDD | Greenfield TDD | Legacy refactoring | Bug hunting | Greenfield TDD |
| State | Stateless | Stateful | Stateful | Stateless | Stateful |
| Algorithm | Table-driven | Frame iteration | Strategy pattern | String parsing | Command pattern |
| Key Challenge | Pattern recognition | State & bonuses | Refactoring safely | Finding bugs | Navigation & wrapping |
| Design Pattern | Value Object | Implicit strategy | Explicit strategy | Filters & pipes | Command + Value Objects |
| Lines of Code | ~45 production | ~30 production | ~120 production | ~25 production | ~95 production |
| Test Count | ~15 tests | ~10 tests | ~17 tests | 6 tests | 23 tests |
| Aha! Moment | Table = data | Simple → complex | Refactor = safe | Tests find bugs | Enums carry behavior |
Roman Numerals: Roman Numerals:
- Converting domain rules into data structures
- Value Objects for enforcing constraints
- When to stop coding (algorithm emerges naturally)
Bowling Game:
- State management without over-engineering
- Look-ahead logic in sequential data
- How correct abstractions scale beyond test cases
Gilded Rose:
- Safely refactoring legacy code with characterization tests
- Strategy pattern for eliminating conditional complexity
- Open-Closed Principle for extensibility
- Working effectively with code you didn't write
String Calculator:
- Using tests to expose bugs in existing code
- Bug hunting with RED-GREEN discipline
- Incremental debugging with clear commits
- Regression prevention through test accumulation
Mars Rover:
- Command pattern for behavior delegation
- Value Objects for domain modeling (Position, Plateau, Direction)
- Enums as behavior carriers, not just constants
- Coordinate systems and wrapping logic
- Progressive refactoring with confidence
- Roman Numerals first: Learn TDD fundamentals without state complexity
- Bowling Game second: Apply TDD to stateful problems
- Mars Rover third: Master Command pattern and value objects
- Gilded Rose fourth: Refactor legacy code with tests as safety net
- String Calculator fifth: Practice bug hunting and fixing with TDD
- Next kata: Choose based on what you want to practice:
- Prime Factors: Mathematical decomposition, algorithmic thinking
- Tennis Scoring: State machines, domain language
- FizzBuzz: Classic conditional logic exercise
- Gilded Rose Kata
- Clean Code by Robert C. Martin
- Domain-Driven Design by Eric Evans
- Test-Driven Development by Kent Beck
This is a learning exercise. Use freely for educational purposes.
Following The Craftsman's Way: Quality is not negotiable.