Fuzzy Text Search in Java 21 — A TDD Course

A hands-on implementation course that builds a fuzzy text search engine from scratch using Java 21, functional programming, and Test-Driven Development.

What you'll learn

Section	Concept	What you build
0	`Result<E, A>`	Sealed Success/Failure type with `map`, `flatMap`, `fold`
1	Exact matching	`Matcher.exact()` — find a string, return `Result<MatchError, Match>`
2	Edit distance	`Levenshtein.distance()` — recursive, then memoized
3	Bounded distance	`BoundedEditDistance.compute()` — short-circuit early
4	Similarity scoring	Normalized 0–1 score, all errors via `Result`
5	N-gram tokenization	`NgramTokenizer.tokenize()` → `Stream<Ngram>`
6	N-gram index	`IndexBuilder.build()` → immutable inverted index
7	Candidate retrieval	`CandidateFinder.find()` → candidates ranked by overlap
8	BK-tree	Functional construction and pruned distance search
9	End-to-end search	`FuzzySearch.search()` — the full pipeline

By the end you will have written a production-style fuzzy search library that can search thousands of terms by edit distance, efficiently, with zero nulls and zero exceptions for control flow.

Prerequisites

Familiarity with Java syntax (lambdas, generics, Stream API)
Comfortable with JUnit 5
A JDK 21 installation

How it works

Every section follows the same pattern:

Comment block — explains the concept and the API contract
Failing tests — JUnit 5 + AssertJ tests that exercise the concept, including edge cases
You implement — make the tests pass with no external libraries, no mutation, no null

Tests are ordered so that each section only depends on the sections before it. You can stop after any section and have a working (if limited) library.

Setup

./gradlew test   # run all tests — they all fail (UnsupportedOperationException)

Start with Section00_ResultTest.java and work forward. Your goal is to turn 61 red tests green, one section at a time.

Constraints

Rule	Reason
No `null` anywhere	Use `Optional` or `Result` instead
No exceptions for control flow	Propagate errors via `Result`
No mutation (`var`, setters, mutable collections)	Immutable data throughout
No Spring, Vavr, or NLP libraries	Standard library only
Everything typed via `Result<E, A>`	Explicit, composable error handling

Project structure

app/src/main/java/fuzzy/   ← type definitions and method stubs
app/src/test/java/fuzzy/   ← the tests you make pass

Each source file in main contains the API contract (records, sealed interfaces, method signatures) with stub bodies that throw UnsupportedOperationException. Your job is to replace those stubs with implementations that satisfy the tests.

Estimated time

Section	Concept	Approx. time
0	`Result<E, A>`	30–60 min
1	Exact matching	20–30 min
2	Edit distance	60–90 min
3	Bounded distance	30–45 min
4	Similarity scoring	20–30 min
5	N-gram tokenization	20–30 min
6	N-gram index	30–45 min
7	Candidate retrieval	30–45 min
8	BK-tree	90–120 min
9	End-to-end search	30–45 min
Total		6–9 hours

These are rough estimates for a developer comfortable with Java and functional idioms but unfamiliar with the specific algorithms. The hardest sections are 2 (getting memoization right without mutable shared state), 8 (BK-tree search with the triangle inequality), and 9 (wiring the pipeline). Expect Section 8 alone to take 2+ hours if you're encountering BK-trees for the first time.

If you've never used sealed types or written monadic error handling before, budget an extra hour for Section 0 — it pays off in every section that follows.

Audience

Developers who want to understand:

Monadic error handling with a custom Result type
Levenshtein distance and why memoization matters
Inverted indexes and n-gram tokenization
BK-trees and metric-space pruning
How to compose small, pure functions into a complete search pipeline
TDD as a design discipline, not just a testing afterthought

Acknowledgments

The course format — one concept, one comment block, one failing test — is inspired by Galois Fields for Great Good by xorvoid, which demonstrates how a complex subject can be taught through progressive, self-contained TDD exercises.

This course (all sections, type stubs, and test suites) was generated with the help of an AI coding assistant — designed as a challenge where the human writes the implementations and the machine provides the constraints.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
gradle		gradle
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fuzzy Text Search in Java 21 — A TDD Course

What you'll learn

Prerequisites

How it works

Setup

Constraints

Project structure

Estimated time

Audience

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Fuzzy Text Search in Java 21 — A TDD Course

What you'll learn

Prerequisites

How it works

Setup

Constraints

Project structure

Estimated time

Audience

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages