Skip to content
N-grams approximate string matching implementation in pure Python
Branch: master
Clone or download
Pull request Compare This branch is 1 commit ahead, 1 commit behind ryszard:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


This is a pute Python library that allows you to compare texts or strings using an n-gram model and cosine similarity. N-grams are tuples of length n consisting of subsequent tokens from a text. For example, if we treat words as tokens, then the first few trigrams (3-grams) of the license will be:

  • 'this work ‘as-is’',
  • 'work ‘as-is’ we',
  • '‘as-is’ we provide',
  • 'we provide no',
  • 'provide no warranty'.
  • ...

Depending on what you choose as the basic token (words or characters) you can use this library for approximate string matching (finding misspellings, etc.) or as a "good enough" method of checking whether two texts [are similar] Lee.

You can’t perform that action at this time.