Skip to content

similarity of the texts (Jaccard Similarity, Minhash, LSH)

Notifications You must be signed in to change notification settings

sebSR/text-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text processing

General info

The set of classes to classify and analysis of texts

Description

We consider text processing in the area of:

  • preprocessing: remove punctuation, stemming of the words
  • calculate the Jaccard Similarity of the texts: in classical way and by the MinHash Algorithm

Technologies

Project has been created in Python 3.7.5. Main libraries:

  • Natural Language Toolkit (NLTK)
  • datasketch
  • unittest

About

similarity of the texts (Jaccard Similarity, Minhash, LSH)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages