TextBrain, a tool that automatically grades and validates texts. In order to validate texts, it uses the copyleaks API to check for plagiarism. It also uses a modified version of GPT-2 to detect the likelihood that the text was real or fake. Then it outputs a validation score using these 2 scores. In order to grade the text, it uses a neural network model trained on the automatic essay/text grading dataset on Kaggle found here here .Take this project and go build a profitable startup with it...!!!
See the full Jupyter Notebook For TextBrain here.
In recent years, the natural language processing community has seen the development of increasingly larger and larger language models.
A language model is a machine learning model that is trained to predict the next word given an input context. As such, a model can generate text by generating one word at a time. These predictions can even, to some extent, be constrained by human-provided input to control what the model writes about. Due to their modeling power, large language models have the potential to generate textual output that is indistinguishable from human-written text to a non-expert reader.
Language models achieve this with incredibly accurate distributional estimates of what words may follow in a given context. If a generation system uses a language model and predicts a very likely next words, the generation will look similar to what word a human would have picked in similar situation, despite not having much knowledge about the context itself. This opens up paths for malicious actors to use these tools to generate fake reviews, comments or news articles to influence the public opinion.
To prevent this from happening, we need to develop forensic techniques to detect automatically generated text. We make the assumption that computer generated text fools humans by sticking to the most likely words at each position, a trick that fools humans. In contrast, natural writing actually more frequently selects unpredictable words that make sense to the domain. That means that we can detect whether a text actually looks too likely to be from a human writer!
represents a visually forensic tool to detect text that was automatically generated from large language models.
Credits for the base repository go to this academic team and Siraj Raval. Credits for the other tools go to Google, Stripe, and OpenAI. I will also take ths opportunity to give thanks to all humans who perform selfless acts for others, in big ways and small. Thank you, please keep doing that.
- Flask
- GPT-2 model
- D3.js
- Stripe
- Firebase
- Copyleaks API
- Tensorflow.js
note- D3.js is already integrated
Install dependencies for Python >3.6 :
pip install -r requirements.txt
run server for gpt-2-small
:
python server.py
The demo instance now
start the server for BERT
:
python server.py --model BERT
the instance runs now at http://localhost:5001/client/index.html?nodemo. HINT: we only provide demo texts for gpt2-small
.
usage: server.py [-h] [--model MODEL] [--nodebug NODEBUG] [--address ADDRESS]
[--port PORT] [--nocache NOCACHE] [--dir DIR] [--no_cors]
optional arguments:
-h, --help show this help message and exit
--model MODEL choose either 'gpt-2-small' (default) or 'BERT' or your own
--nodebug NODEBUG server in non-debugging mode
--port PORT port to launch UI and API (default:5001)
--no_cors launch API without CORS support (default: False)
The backend defines a number of model api's that can be invoked by the server by starting it with the parameter --model NAME
. To add a custom model, you need to write your own api in backend/api.py
and add the decorator @register_api(name=NAME)
.
Each api needs to be a class that inherits from AbstractLanguageChecker
, which defines two functions check_probabilities
and postprocess
. Please follow the documentation within api.py
when implementing the class and the functions.
the source code for the front-end is in client/src
.
To modify, installing of node dependencies is necessary:
cd client/src; npm install; cd ../..
re-compilation of front-end:
> rm -rf client/dist;cd client/src/; npm run build; cd ../..