feat: add fuzzy matching for TermQuery with Fuzziness enum#93
Conversation
Add fuzziness parameter to TermQuery supporting Auto (0/1/2 edit distance based on term length) and Exact(n) modes. When fuzziness is set, matching uses Levenshtein edit distance instead of exact equality. - Add Fuzziness enum (Auto, Exact(usize)) to cloudsearch-common - Add fuzziness field to TermQuery with serde skip_serializing_if - Implement fuzzy_term_match() and levenshtein_distance() in cloudsearch-index - Update all TermQuery usages across the codebase to include fuzziness: None
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Add levenshtein_distance unit tests (empty, identical, one_edit, case_sensitive, complex) - Add fuzzy_term_match unit tests (exact match, Auto mode, numeric values) - Add three integration tests in coverage.rs covering edit distance matching, Auto threshold, and threshold rejection - Fix score_query to properly reject fuzzy matches that return Some(false)
- Add Fuzziness serde rename_all = "lowercase" for JSON auto/exact - Parse "fuzziness" key in parse_term_query (accepts "auto" or integer) - Add ~suffix query string syntax: field:value~auto, field:value~2 - Add query_has_fuzzy_term helper to detect fuzzy in nested Bool - Validate search_after + fuzzy query combination in validate_search_request - Add validate_search_request_rejects_fuzzy_with_search_after test - Add doc comment on fuzzy_term_match return value semantics
- Add comment explaining why unreachable!() is intentional in fuzzy_term_match (guarded by is_none() check above, compilation failure preferred to wrong answers) - Add parse_term_query unit tests for fuzziness parsing (auto, AUTO, integer, zero, missing, wrong type, unknown string) - Add note about ~ suffix vs wildcard detection order in query_string parser
Summary
Fuzzinessenum (AutoorExact(usize)) tocloudsearch-commonfuzziness: Option<Fuzziness>field toTermQueryfuzzy_term_match()andlevenshtein_distance()incloudsearch-indexfuzzinessis set, uses edit distance threshold instead of exact equalityFuzziness behavior
Auto— 0 edit distance for 1-2 char terms, 1 for 3-5 chars, 2 for 6+ charsExact(n)— allow edit distance <= nAPI Examples
JSON API:
{"term": {"field": "name", "value": "admin", "fuzziness": "auto"}} {"term": {"field": "name", "value": "admn", "fuzziness": 2}}Query string:
Test plan
cargo test --workspace— all 387 tests passcargo clippy --workspace --all-targets— clean