A hands-on implementation course that builds a fuzzy text search engine from scratch using Java 21, functional programming, and Test-Driven Development.
| Section | Concept | What you build |
|---|---|---|
| 0 | Result<E, A> |
Sealed Success/Failure type with map, flatMap, fold |
| 1 | Exact matching | Matcher.exact() — find a string, return Result<MatchError, Match> |
| 2 | Edit distance | Levenshtein.distance() — recursive, then memoized |
| 3 | Bounded distance | BoundedEditDistance.compute() — short-circuit early |
| 4 | Similarity scoring | Normalized 0–1 score, all errors via Result |
| 5 | N-gram tokenization | NgramTokenizer.tokenize() → Stream<Ngram> |
| 6 | N-gram index | IndexBuilder.build() → immutable inverted index |
| 7 | Candidate retrieval | CandidateFinder.find() → candidates ranked by overlap |
| 8 | BK-tree | Functional construction and pruned distance search |
| 9 | End-to-end search | FuzzySearch.search() — the full pipeline |
By the end you will have written a production-style fuzzy search library that can search thousands of terms by edit distance, efficiently, with zero nulls and zero exceptions for control flow.
- Familiarity with Java syntax (lambdas, generics, Stream API)
- Comfortable with JUnit 5
- A JDK 21 installation
Every section follows the same pattern:
- Comment block — explains the concept and the API contract
- Failing tests — JUnit 5 + AssertJ tests that exercise the concept, including edge cases
- You implement — make the tests pass with no external libraries, no mutation, no null
Tests are ordered so that each section only depends on the sections before it. You can stop after any section and have a working (if limited) library.
./gradlew test # run all tests — they all fail (UnsupportedOperationException)Start with Section00_ResultTest.java and work forward. Your goal is to turn 61 red tests green, one section at a time.
| Rule | Reason |
|---|---|
No null anywhere |
Use Optional or Result instead |
| No exceptions for control flow | Propagate errors via Result |
No mutation (var, setters, mutable collections) |
Immutable data throughout |
| No Spring, Vavr, or NLP libraries | Standard library only |
Everything typed via Result<E, A> |
Explicit, composable error handling |
app/src/main/java/fuzzy/ ← type definitions and method stubs
app/src/test/java/fuzzy/ ← the tests you make pass
Each source file in main contains the API contract (records, sealed interfaces, method signatures) with stub bodies that throw UnsupportedOperationException. Your job is to replace those stubs with implementations that satisfy the tests.
| Section | Concept | Approx. time |
|---|---|---|
| 0 | Result<E, A> |
30–60 min |
| 1 | Exact matching | 20–30 min |
| 2 | Edit distance | 60–90 min |
| 3 | Bounded distance | 30–45 min |
| 4 | Similarity scoring | 20–30 min |
| 5 | N-gram tokenization | 20–30 min |
| 6 | N-gram index | 30–45 min |
| 7 | Candidate retrieval | 30–45 min |
| 8 | BK-tree | 90–120 min |
| 9 | End-to-end search | 30–45 min |
| Total | 6–9 hours |
These are rough estimates for a developer comfortable with Java and functional idioms but unfamiliar with the specific algorithms. The hardest sections are 2 (getting memoization right without mutable shared state), 8 (BK-tree search with the triangle inequality), and 9 (wiring the pipeline). Expect Section 8 alone to take 2+ hours if you're encountering BK-trees for the first time.
If you've never used sealed types or written monadic error handling before, budget an extra hour for Section 0 — it pays off in every section that follows.
Developers who want to understand:
- Monadic error handling with a custom
Resulttype - Levenshtein distance and why memoization matters
- Inverted indexes and n-gram tokenization
- BK-trees and metric-space pruning
- How to compose small, pure functions into a complete search pipeline
- TDD as a design discipline, not just a testing afterthought
The course format — one concept, one comment block, one failing test — is inspired by Galois Fields for Great Good by xorvoid, which demonstrates how a complex subject can be taught through progressive, self-contained TDD exercises.
This course (all sections, type stubs, and test suites) was generated with the help of an AI coding assistant — designed as a challenge where the human writes the implementations and the machine provides the constraints.