The goal of the document similarity checker project is to develop a tool that can determine the similarity between two text documents. The project aims to compare the content of the documents and provide a similarity score, helping users to identify duplicate or similar documents.
- Programming language used: Python, PHP
- Required libraries and packages: math,re,sys,collections.Counter
- Algorithm used: Cosine Similarity
- Input format: text files
- Clone the repository or download the source code.
- Install the required libraries and packages specified in the "Technical Details" section
- Copy all contents in WAMP's www folder/XAMPP's htdocs folder and create a virtual host in corresponding web server package.
- Access the virtual host created in previous step. It would look like this: The text documents can be uploaded from this webpage
- Choose two text files and click on "Upload".
- The output will be displayed in same browser window displaying a cosine distance.
Cosine Similarity = 1 - Cosine Distance
Similarity Percentage = Cosine Similarity * 100
The files would be uploaded and send to python as arguments
Elevator pitch can be found here https://github.com/sud0x00/DS-Checker/blob/0860d0f29435b0b8ee95b0384e325c1ff8918e8a/src/Elevator%20Pitch.pdf (src/Elevator Pitch.pdf)