Skip to content

foreverycast/similarity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

similarity

Search for duplicates

INTRODUCTION

Similarity script is for searching duplicates in master data. The script use Levenshtein distance to find similar names.

INSTALLATION

  • Create csv files for compare (replace the names in conf JSON ("duplicates.csv") for file to load and ("result.csv") for result csv)
  • If neccessary, the additional file can be added, important for intergrating data set isAddFileToCheck = True and replace "additional.csv" for file name
  • Go Lang is required
  • The package is still in development
  • Just execute the go run duplicatesearch.go

ABOUT THIS RELEASE

  • Tests are not ready
  • stopwords.csv is not working

Releases

No releases published

Packages

No packages published

Languages