Plagiarism Detector in C++

Uses naive methods to detect observable plagiarism in text files.

Tests implemented

1. Token frequency matching

Target file's word count is matched with each individual textfile's token counts in the database folder to find a high degree of similarity in the tokens and their frequency of use.

2. N-Gram matching

A direct match of consecutive tokens (or ngrams) was performed to detect similarity in patterns and neighbourhood of tokens. The value of N for the N-Gram generation was varied and a cumulative result was obtained by a weighted average over all of the results.

3. Cosine matching

Cosine of the angle between the vectors obtained from the target and the base text files is computed to estimate the simiilarity in the token vectors of both the files.

Getting started

1. Place all the reference text files in the database directory.

2. Place all the text files required to be checked in the target directory.

3. (Optional) Edit the stopwords.txt text file as per requirement, to add words which are to be ignored in the analysis.

4. Change the database and target_folder variables with the actual location of them.

5. Compile run.cpp in C++11 (or above), with the command g++ run.cpp -std=c++11.

6. Run the generated executable with the command ./a.out (in Linux environment).

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
database		database
target		target
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run.cpp		run.cpp
stopwords.txt		stopwords.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

database

database

target

target

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

run.cpp

run.cpp

stopwords.txt

stopwords.txt

Repository files navigation

Plagiarism Detector in C++

Tests implemented

1. Token frequency matching

2. N-Gram matching

3. Cosine matching

Getting started

About

Releases 4

Packages

Languages

License

sayansil/Plagiarism-Detector

Folders and files

Latest commit

History

Repository files navigation

Plagiarism Detector in C++

Tests implemented

1. Token frequency matching

2. N-Gram matching

3. Cosine matching

Getting started

About

Topics

Resources

License

Stars

Watchers

Forks

Languages