- Clone the repository using Git.
- Open terminal -> run
docker build -t catchgpt .
. - To launch the application -> run
docker run -d -p 5000:5000 catchgpt
. - Open your preferred browser and navigate to
localhost:5000
.
- Clone the repository using Git.
- Open terminal -> type
pip install -r requirements.txt
orpip3 install -r requirements.txt
. - To launch the application -> type
python app.py
orpython3 app.py
. - Open your preferred browser and navigate to
http://127.0.0.1:5000/
.
CatchGPT is a tool that can determine if a particular text is written by an AI or a human being by analyzing the text's average perplexity per line score and its burstiness.
In general, perplexity is the measurement of how likely the model can predict the next word in an input text sequence. This is often used to gauge the performance of a natural language processing model. Texts with a high perplexity value are generally written by humans, while AI-generated text typically has lower perplexity values.
Additionally, in our case, burstiness refers to the variation (standard deviation was used in this implementation) in perplexity per sentence/line. Typically, humans tend to write in a "bursty fashion" as we would generally have short sentences mixed in with long sentences. On the other hand, NLP models typically write in a "uniform fashion" where almost all sentences have a similar length. Thus, text generated by ais such as ChatGPT, YouChat, Bing Search, and others, will have lower values of burstiness in comparison to human-written text.
This application was inspired by the rise of ChatGPT and its widespread use of plagiarism within the academic space. Similar applications have been made with similar implementations, but are behind a paywall (GPTZero has now limited its use to 1 request per hour). With this in mind, I wanted to make a more accessible application for all students and educators who want to check if a particular piece of text had been produced entirely by an AI, partially by an AI, or entirely produced by a human. Though this implementation is not entirely accurate and can be deceived by manually adding burstiness and perplexity to every sentence, it can be used in conjunction with human input to determine if a particular piece of text had been produced with the aid of an AI.
I have plans to incorporate the DetectGPT implementation in the near future to gain a better understanding of Machine-Generated Text (NLP Models), its detection, and learn more about machine learning.
Python (Flask), Hugging Face Pretrained GPT2 Model (to calculate perplexity of fixed-length models), JavaScript (jQuery), jCharts, HTML/CSS
Install requirements:
pip install -r requirements.txt
Start app:
python3 app.py
The app will be hosted on http://127.0.0.1:5000/
Utilized Hugging Face to calculate the perplexity of a given text.
Research on Perplexity in Language Models
Inspiration from GPTZero