levenfind

A tool to find pairs of similar files, typically for checking that your students don't cheat. It works on text files so it should be pretty much agnostic with respect to the language used in the files (be it natural language or code). For now, we use the Levenshtein distance (aka edit distance) in order to compare contents.

It takes all the file in a directory (by default the current one) and shows all pairs of files whose similarity is above a given threshold (60% by default). The algorithm is quadratic, don't be surprised if it takes some time on directories with a few files, especially if some of those are big.

Usage

levenfind directory

Useful options include

--extension in order to specify the extension of files to consider,
--lines in order to compare files line by line instead of character by character (this is much faster, but will consider slightly different lines as distinct),
--threshold in order to specify the threshold of above which similar files should be reported.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
src		src
test		test
.gitignore		.gitignore
CHANGES.md		CHANGES.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
dune-project		dune-project
levenfind.opam		levenfind.opam

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

levenfind

Usage

About

Releases 2

Languages

License

smimram/levenfind

Folders and files

Latest commit

History

Repository files navigation

levenfind

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Languages