Skip to content
master
Switch branches/tags
Code
This branch is up to date with master.
Contribute

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

This is a pute Python library that allows you to compare texts or strings using an n-gram model and cosine similarity. N-grams are tuples of length n consisting of subsequent tokens from a text. For example, if we treat words as tokens, then the first few trigrams (3-grams) of the license will be:

  • 'this work ‘as-is’',
  • 'work ‘as-is’ we',
  • '‘as-is’ we provide',
  • 'we provide no',
  • 'provide no warranty'.
  • ...

Depending on what you choose as the basic token (words or characters) you can use this library for approximate string matching (finding misspellings, etc.) or as a "good enough" method of checking whether two texts [are similar] Lee.

About

N-grams approximate string matching implementation in pure Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages