FSE '21 paper implementation (preprint)
DOI of submitted version executable file of NIL is 10.5281/zenodo.4492665
.
NIL is a clone detector using N-gram, Inverted index, and LCS. NIL provides scalable large-variance clone detection.
- JDK 21+
- Clone this repository (
git clone https://github.com/kusumotolab/NIL
) - Move into NIL's directory (
cd NIL
) and build NIL (./gradlew ShadowJar
) - Run NIL (
java -jar ./build/libs/NIL-all.jar [options]
) - Check the result file
- If you didn't specify
-bce
option, the format is/path/to/file_A,start_line_A,end_line_A,/path/to/file_B,start_line_B,end_line_B
. - If you specified
-bce
option, the format isdir_A,file_A,start_line_A,end_line_A,dir_B,file_B,start_line_B,end_line_B
- If you didn't specify
Name | Description | Default |
---|---|---|
-s ,--src |
Input source directory. You must specify the target dir. | None |
-mil ,--min-line |
Minimum number of lines that a code fragment must be to be treated as a clone. | 6 |
-mit ,--min-token |
Minimum number of tokens that a code fragment must be to be treated as a clone. | 50 |
-n ,--n-gram |
N for N-gram. | 5 |
-p ,--partition-size |
The number of partitions. | 10 |
-f ,--filtration-threshold |
Threshold used in the filtration phase (%). | 10 |
-v ,--verification-threshold |
Threshold used in the verificatioin phase (%). | 70 |
-o ,--output |
Output file name. | result_{n}_{f}_{v}.csv |
-t ,--threads |
The number of threads used for parallel execution (both the Preprocess and Clone detection phases) | all threads |
-l ,--language |
Target language | java |
-bce ,--bigcloneeval |
If you specify -bce option, NIL outputs result file feasible to BigCloneEval. |
not specified |
-mif ,--mutationinjectionframework |
If you specify -mif option, NIL outputs nothing except for the output file name as standard output. |
not specified |
Name | Option | Extension |
---|---|---|
Java | java |
.java |
C | c |
.c ,.h |
C++ | cpp |
.cpp ,.hpp |
C# | cs ,csharp |
.cs |
Python | py ,python |
.py |
Kotlin | kt ,kotlin |
.kt |
If you execute NIL on 250-MLOC codebase, we recommend -p
option to 135.
- Please refer to EXPERIMENTS.md