PRINT LEV GROUPS

This prints out groups of similar fields for a given input of files or stdin. The similarity is measured by the Levenshtein edit distance.

You can put in the similarity factor as a number between 0 and 100, where 0 matches almost everything, and 100 looks for exact matches.

Requirements

This project uses python3.5, python2.7 may work but is not guarenteed.

You require the following python packages to run

fuzzywuzzy (0.15.0)
networkx (1.11)
python-Levenshtein (0.12.0)
matplotlib (2.0.2)

Which you can install using pip

pip3 install fuzzywuzzy networkx python-Levenshtein matplotlib

In addition you require python-tk which unfortunately cannot be installed by pip, however you can install it on Ubuntu via:

sudo apt install python3-tk

Usage

# Get the latest snapshot
git clone --depth=1 https://github.com/maabdelatif/print-lev-groups.git myproject

# Change directory
cd myproject

# Run the script against the sample first-names.txt files and 60 as the similarity percentage
python3 print_lev_groups.py --files small-file.txt --ratio 60

Disclaimer

Please do not use this for any production code

Credits

StackOverflow
The first-names.txt example file is from https://github.com/dominictarr/random-name/blob/master/first-names.txt

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
print_lev_groups		print_lev_groups
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
first-names.txt		first-names.txt
requirements.txt		requirements.txt
small-file.txt		small-file.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PRINT LEV GROUPS

Requirements

Usage

Disclaimer

Credits

About

Releases

Packages

Languages

License

maabdelatif/print-lev-groups

Folders and files

Latest commit

History

Repository files navigation

PRINT LEV GROUPS

Requirements

Usage

Disclaimer

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages