skimmatch is an in-process fzf/skim-style fuzzy finder for Python,
implemented in Rust.
It is designed for ranked abbreviation matching over a fixed list of candidate
strings. You give it strings such as filenames, references, titles, symbols, or
command labels; users type short abbreviation-style queries; skimmatch
returns the best candidates, scores, and optional highlight positions.
from skimmatch import Matcher
candidates = [
"Follmer and Schied, Stochastic Finance, 2011",
"Mildenhall and Major, Pricing Insurance Risk",
"Wang distortion risk measures",
"Archive reference catalogue",
]
matcher = Matcher(candidates)
for result in matcher.search("wang distortion", limit=3):
print(result)Example result:
{
"index": 2,
"score": 260,
"text": "Wang distortion risk measures",
"matches": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10],
}Scores are backend scores where higher is better. The exact numeric value should be treated as ranking information, not as a stable cross-version metric.
skimmatch solves the same broad problem as interactive fuzzy finders such as
fzf and skim: finding good abbreviation matches quickly.
For example, a query like:
fs sf 2011
can match:
Follmer and Schied, Stochastic Finance, 2011
because the query characters and tokens appear in useful positions and in the right order.
This is different from edit-distance fuzzy matching. Libraries such as
RapidFuzz, Levenshtein, or token-ratio matchers are excellent for typo
correction, deduplication, OCR cleanup, and record linkage. skimmatch is aimed
at fast candidate selection, interactive search, and highlightable abbreviation
matching.
- In-process Python extension: no external
fzfexecutable required. - Rust matching backends using
SkimMatcherV2,nucleo-matcher, andfrizbee. - Preloaded candidate lists for fast repeated queries.
- Single-token and multi-token search modes.
- Optional highlight indices for UI rendering.
- Legacy tuple-returning APIs for compatibility with the earlier
rustfuzzshape. - Structured
Matcher.search(...)API for new code. - Backend argument already present, so future backends can be added without changing the public matcher classes.
When published on PyPI:
pip install skimmatchFrom a local checkout:
uv pip install -e .or build with maturin:
uv run maturin developThe current package metadata targets Python 3.13 or newer.
Use Matcher for new code.
from skimmatch import Matcher
candidates = [
"Buhlmann, Mathematical Methods in Risk Theory",
"Cramer, Collective Risk Theory",
"Mildenhall and Major, Pricing Insurance Risk",
"Kaas, Goovaerts, Dhaene, and Denuit, Modern Actuarial Risk Theory",
]
matcher = Matcher(candidates)
results = matcher.search("risk theory", limit=5)
for result in results:
print(result["index"], result["score"], result["text"])By default, search:
- splits the query on whitespace;
- requires every query token to match;
- returns up to 20 results;
- includes candidate text;
- includes highlight positions.
matcher = Matcher(candidates, backend="nucleo", threads=None) # or "skim" or "frizbee"
results = matcher.search(
query,
limit=20,
highlights=True,
include_text=True,
multi=True,
)Each result is a dictionary containing:
{
"index": 0, # original candidate index
"score": 123, # backend score, higher is better
"text": "...", # included when include_text=True
"matches": [0, 3], # included when highlights=True
}query
The search string. In multi-token mode, whitespace-separated tokens are matched independently and every token must match the candidate.
limit
The maximum number of results to return. limit=0 returns an empty list.
highlights
When true, results include matches, a sorted and deduplicated list of matched
positions. Turn this off when you only need ranking; score-only matching does
less work.
include_text
When true, each result includes the original candidate string. Turn this off if you already have the candidate list and want smaller result objects.
multi
When true, the query is split on whitespace and all tokens are required. When false, the whole query is sent to the matcher as one pattern.
threads
Constructor option. For backend="nucleo" and backend="frizbee",
threads=None uses all available cores. Pass threads=1 for single-threaded
matching, or any positive integer to cap the number of worker threads. The
skim backend currently ignores this option.
The package also exports compatibility classes with tuple return shapes:
from skimmatch import FuzzyMatcher, FuzzyMatcherMulti, FuzzyMatcherMultiHiTreats the whole query as one pattern.
matcher = FuzzyMatcher(candidates, threads=None)
indices, scores = matcher.query("sf", top_k=10)Splits the query on whitespace. Every token must match.
matcher = FuzzyMatcherMulti(candidates)
indices, scores = matcher.query("pricing insurance", top_k=10)Like FuzzyMatcherMulti, but also returns highlight positions.
matcher = FuzzyMatcherMultiHi(candidates)
indices, scores, highlights = matcher.query("pricing insurance", top_k=10)The available backends are:
backend="skim"
backend="nucleo"
backend="frizbee"backend="skim" uses SkimMatcherV2 from the Rust fuzzy-matcher crate and
is kept for compatibility.
backend="nucleo" uses nucleo-matcher, the lower-level matcher from the
nucleo ecosystem. It is the default backend. It is a modern fzf-like backend
and may rank candidates differently from skim. Scores are backend-specific
and should not be compared between backends.
backend="frizbee" uses frizbee, a SIMD matcher with typo-resistant matching
support. skimmatch currently runs it with typo tolerance disabled for a closer
comparison with the other fzf-style backends. It matches against bytes, so
highlight lists are intentionally empty for this backend until Unicode offset
semantics are defined.
Good matches tend to reward:
- characters appearing in order;
- compact alignments;
- word-boundary matches;
- punctuation-separated and camel-case transitions;
- early matches;
- consecutive query-character matches;
- candidates that match every query token in multi-token mode.
skimmatch returns candidates sorted by descending score. Ties are ordered by
the original candidate index for deterministic output.
skimmatch is a good fit for:
- command palettes;
- file pickers;
- bibliography and reference search;
- symbol search;
- autocomplete over known labels;
- terminal or web UI candidate selection;
- fast repeated queries over a preloaded list.
It is probably not the right tool for:
- typo correction;
- deduplication;
- record linkage;
- token-sort similarity;
- OCR cleanup;
- semantic search;
- embedding-based retrieval.
Those are useful problems, but they are different from fzf/skim-style abbreviation matching.
Candidate strings are copied into Rust once when the matcher is constructed.
Repeated calls to query or search scan that Rust-owned list and return only
the final top results to Python.
For best performance:
- construct one matcher and reuse it across queries;
- set
highlights=Falsewhen you only need indices and scores; - set
include_text=Falsewhen you already have the candidate strings; - use
limitto keep returned result objects small.
This project is a Python package with a Rust extension built by maturin.
Run the tests:
uv run pytest tests/test_skimmatch.py -qCheck Rust formatting:
cargo fmt --checkImportant files:
src/lib.rs: Rust/PyO3 extension implementation.python/skimmatch/__init__.py: Python re-exports.tests/test_skimmatch.py: API and behavior tests.pyproject.toml: Python packaging and maturin configuration.Cargo.toml: Rust crate configuration.
The public API accepts a backend argument. Today "skim", "nucleo", and
"frizbee" are implemented. frizbee is experimental and currently exposes
score/ranking behavior without highlight positions.
Unknown backend names currently raise ValueError.
MIT.