Ukkonen's Approximate String Matching algorithm
JavaScript
Switch branches/tags
Latest commit 1a46b9f Nov 18, 2017 @sunesimonsen sunesimonsen 1.4.0

Readme.md

Ukkonen - Approximate String Matching

npm version Build Status

This project implements the Approximate String Matching algorithm by Esko Ukkonen extended with ideas from An Extension of Ukkonen's Enhanced Dynamic Programming ASM Algorith by Hal Berghel and David Roach.

Ukkonen's algorithm is very competitive with the Levenshtein distance and for longer strings it is much more performant than Levenshtein distance.

In addition to being a competitive alternative to Levenshtein distance, Ukkonen's algorithm also allows you to provide a threshold for the distance which increases the performance even more for texts that are longer than the threshold.

HTML diffing using Levenshtein HTML diffing using Ukkonen's algorithm

Above you can see the different of using Levenshtein distance and Ukkonen's algorithm for matching sub-trees when diffing HTML.

Install

npm install --save ukkonen

Usage

You can find the distance between the strings Ukkonen and Levenshtein the following way:

var ukkonen = require('ukkonen')

assert.equal(ukkonen('Ukkonen', 'Levenshtein'), 8)

If you want to limit the distance by a given threshold:

var ukkonen = require('ukkonen')

assert.equal(ukkonen('Ukkonen', 'Levenshtein', 6), 6)
assert.equal(ukkonen('Ukkonen', 'Levenshtein', 10), 8)

Platform support

The library is ES5 and will work with any JavaScript bundler in the browser as well as Node versions with ES5 support.

Benchmark

I have benchmarked the library against the fastest Levenshtein distance implementation on NPM.

              Edit distance one word
 245,499 op/s » ukkonen
 502,333 op/s » leven

              Edit distance on sentence with small differences
 767,359 op/s » ukkonen
 139,628 op/s » leven

              Edit distance on paragraphs with small differences
 237,857 op/s » ukkonen
   2,670 op/s » leven

              Edit distance on longer texts with small differences
 112,547 op/s » ukkonen
     683 op/s » leven

              Edit distance on longer texts with many differences
     372 op/s » ukkonen
     416 op/s » leven

              Edit distance on longer texts with small differences and a threshold of 10
 127,725 op/s » ukkonen
     678 op/s » leven

              Edit distance on longer texts with many differences and a threshold of 40
  84,959 op/s » ukkonen
     425 op/s » leven

Acknowledgements

Obviously the authors of the papers describing the algorithm Esko Ukkonen, Hal Berghel and David Roach.

I stole a lot of ideas from Sindre Sorhus's leven library and I also used it to test my implementation against.

License

MIT © Sune Simonsen