Skip to content

A project that checks the percentage of document similarity using cosine similarity formula

License

Notifications You must be signed in to change notification settings

sud0x00/DS-Checker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DS-Checker (DocSim)

Project Goal:

The goal of the document similarity checker project is to develop a tool that can determine the similarity between two text documents. The project aims to compare the content of the documents and provide a similarity score, helping users to identify duplicate or similar documents.

Technical Details:

  • Programming language used: Python, PHP
  • Required libraries and packages: math,re,sys,collections.Counter
  • Algorithm used: Cosine Similarity
  • Input format: text files

Instructions:

  1. Clone the repository or download the source code.
  2. Install the required libraries and packages specified in the "Technical Details" section
  3. Copy all contents in WAMP's www folder/XAMPP's htdocs folder and create a virtual host in corresponding web server package.
  4. Access the virtual host created in previous step. It would look like this: index The text documents can be uploaded from this webpage
  5. Choose two text files and click on "Upload".
  6. The output will be displayed in same browser window displaying a cosine distance.
    Cosine Similarity = 1 - Cosine Distance
    Similarity Percentage = Cosine Similarity * 100

upres The files would be uploaded and send to python as arguments

Elevator pitch can be found here https://github.com/sud0x00/DS-Checker/blob/0860d0f29435b0b8ee95b0384e325c1ff8918e8a/src/Elevator%20Pitch.pdf (src/Elevator Pitch.pdf)

About

A project that checks the percentage of document similarity using cosine similarity formula

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published