Skip to content

ijt/trigram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

trigram

Build Status License Documentation

This Rust crate contains functions for fuzzy string matching.

It exports two functions. The similarity function returns the similarity of two strings, and the find_words_iter function returns an iterator of matches for a smaller string (needle) in a larger string (haystack).

The similarity of strings is computed based on their trigrams, meaning their 3-character substrings: https://en.wikipedia.org/wiki/Trigram.

Trying it out

Here is how to run the examples:

$ cargo run --example similarity color colour
...
0.44444445

$ cargo run --example find_words_iter
bufalo
buffalow
Bungalo
biffalo
buffaloo
huffalo
snuffalo
fluffalo

Usage

Add this to your Cargo.toml:

[dependencies]
trigram = "0.2.2"

and call it like this:

use trigram::similarity;

fn main() {
	println!("{}", similarity(&"rustacean", &"crustacean"));
}

Background

The similarity function in this crate is a reverse-engineered approximation of the similarity function in the Postgresql pg_trgm extension: https://www.postgresql.org/docs/9.1/pgtrgm.html. It gives exactly the same answers in many cases, but may disagree in others (none known). If you find a case where the answers don't match, please file an issue about it!

A good introduction to the Postgres version of this is given on Stack Overflow: https://stackoverflow.com/a/43161051/484529.

About

Trigram-based string similarity in Rust

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages