Skip to content

patsh90/fuzzy-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fuzzy Text Search in Java 21 — A TDD Course

A hands-on implementation course that builds a fuzzy text search engine from scratch using Java 21, functional programming, and Test-Driven Development.

What you'll learn

Section Concept What you build
0 Result<E, A> Sealed Success/Failure type with map, flatMap, fold
1 Exact matching Matcher.exact() — find a string, return Result<MatchError, Match>
2 Edit distance Levenshtein.distance() — recursive, then memoized
3 Bounded distance BoundedEditDistance.compute() — short-circuit early
4 Similarity scoring Normalized 0–1 score, all errors via Result
5 N-gram tokenization NgramTokenizer.tokenize()Stream<Ngram>
6 N-gram index IndexBuilder.build() → immutable inverted index
7 Candidate retrieval CandidateFinder.find() → candidates ranked by overlap
8 BK-tree Functional construction and pruned distance search
9 End-to-end search FuzzySearch.search() — the full pipeline

By the end you will have written a production-style fuzzy search library that can search thousands of terms by edit distance, efficiently, with zero nulls and zero exceptions for control flow.

Prerequisites

  • Familiarity with Java syntax (lambdas, generics, Stream API)
  • Comfortable with JUnit 5
  • A JDK 21 installation

How it works

Every section follows the same pattern:

  1. Comment block — explains the concept and the API contract
  2. Failing tests — JUnit 5 + AssertJ tests that exercise the concept, including edge cases
  3. You implement — make the tests pass with no external libraries, no mutation, no null

Tests are ordered so that each section only depends on the sections before it. You can stop after any section and have a working (if limited) library.

Setup

./gradlew test   # run all tests — they all fail (UnsupportedOperationException)

Start with Section00_ResultTest.java and work forward. Your goal is to turn 61 red tests green, one section at a time.

Constraints

Rule Reason
No null anywhere Use Optional or Result instead
No exceptions for control flow Propagate errors via Result
No mutation (var, setters, mutable collections) Immutable data throughout
No Spring, Vavr, or NLP libraries Standard library only
Everything typed via Result<E, A> Explicit, composable error handling

Project structure

app/src/main/java/fuzzy/   ← type definitions and method stubs
app/src/test/java/fuzzy/   ← the tests you make pass

Each source file in main contains the API contract (records, sealed interfaces, method signatures) with stub bodies that throw UnsupportedOperationException. Your job is to replace those stubs with implementations that satisfy the tests.

Estimated time

Section Concept Approx. time
0 Result<E, A> 30–60 min
1 Exact matching 20–30 min
2 Edit distance 60–90 min
3 Bounded distance 30–45 min
4 Similarity scoring 20–30 min
5 N-gram tokenization 20–30 min
6 N-gram index 30–45 min
7 Candidate retrieval 30–45 min
8 BK-tree 90–120 min
9 End-to-end search 30–45 min
Total 6–9 hours

These are rough estimates for a developer comfortable with Java and functional idioms but unfamiliar with the specific algorithms. The hardest sections are 2 (getting memoization right without mutable shared state), 8 (BK-tree search with the triangle inequality), and 9 (wiring the pipeline). Expect Section 8 alone to take 2+ hours if you're encountering BK-trees for the first time.

If you've never used sealed types or written monadic error handling before, budget an extra hour for Section 0 — it pays off in every section that follows.

Audience

Developers who want to understand:

  • Monadic error handling with a custom Result type
  • Levenshtein distance and why memoization matters
  • Inverted indexes and n-gram tokenization
  • BK-trees and metric-space pruning
  • How to compose small, pure functions into a complete search pipeline
  • TDD as a design discipline, not just a testing afterthought

Acknowledgments

The course format — one concept, one comment block, one failing test — is inspired by Galois Fields for Great Good by xorvoid, which demonstrates how a complex subject can be taught through progressive, self-contained TDD exercises.

This course (all sections, type stubs, and test suites) was generated with the help of an AI coding assistant — designed as a challenge where the human writes the implementations and the machine provides the constraints.

About

Fuzzy Text Search in Java 21 — A TDD Course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors