Skip to content

A simple web app that can be used to detect AI-generated text in the Indonesian language, using various AI models such as LSTM, GRU, Bi-LSTM, Bi-GRU, and IndoBERT

License

Notifications You must be signed in to change notification settings

kevin-wijaya/AI-Generated-Text-Detection-with-Deep-Learning-Approach-on-Indonesian-Text

Repository files navigation

AI Generated Text Detection with Deep Learning Approach on Indonesian Text

methodology

Table of Contents

About

Artificial intelligence (AI) has become a popular technology that is now widely used by the public. That is due to the emergence of intelligent chatbots from OpenAI, namely ChatGPT. Various groups of people use ChatGPT for different purposes, one of them is students who use ChatGPT to understand material, do assignments, compose essays and paraphrase journals. Paraphrasing on ChatGPT and using paraphrased text as writing in papers can be considered a form of plagiarism. The problem is, to find out whether the text was AI-generated or human-written text, it takes a very long and in-depth understanding of the patterns and arrangement of words in the text. Therefore, we need a system that is able to detect text generated by AI or not. This text detection system uses a deep learning approach. Human text data is collected from the Detik news portal and the Quora question-and-answer website. AI text data is generated through a paraphrasing process on human text data. The vectorization in this research uses Doc2Vec and BERT Tokenizer. The models used in this study were LSTM, GRU, Bi-LSTM, Bi-GRU and BERT with the IndoBERT pre-trained model. Of the five models, the best accuracy on training data is BERT, while the best accuracy in evaluation with data validation is Bi-LSTM and Bi-GRU.

Tech Stack

  • Modeling: Numpy, Pandas, Scikit-learn, Gensim, Tensorflow, PyTorch, Hugging Face
  • Web Application: Flask, JQuery, Tailwind CSS

Getting Started

These instructions will guide you through installing the project on your local machine for testing purposes. (Note: This project contains large file storage, so please be patient as processing may take several minutes)

Requirements

This project requires Python 3.10.5.

Installation (Linux or MacOS)

Clone this repository

git clone https://github.com/kevin-wijaya/AI-Generated-Text-Detection-with-Deep-Learning-Approach-on-Indonesian-Text.git

Rename the folder and change directory into it

mv AI-Generated-Text-Detection-with-Deep-Learning-Approach-on-Indonesian-Text ai-text-detection && cd ai-text-detection

Initialize the python environment to ensure isolation

python -m venv .venv

Install prerequisite python packages

python run.py pip install -r requirements.txt

Install the necessary LFS model

gdown --folder 19fi_oNv42G5n27bO-W1f03PDPYckjgPX -O ./models/ && git lfs install && git clone https://huggingface.co/indolem/indobert-base-uncased ./models/indolem/indobert-base-uncased

Run the python app.py using run.py and enjoy 😁

python run.py app

Usage

To use this web application is easy, follow these 3 steps:

  1. Insert Text: Enter your text into the textarea provided.
  2. Detect: Click on the "Detect" button to process the text and obtain results.
  3. Change Models: Optionally, you can select different models from the options available to see varied results.

Reports

Below is a table showing the evaluation metrics from the experiments conducted:

Evaluation Metrics Each Model

Model Precision (%) Recall (%) F1-Score (%) Accuracy (%)
LSTM 75 75 75 75
GRU 71 72 71 71
Bi-LSTM 77 77 77 77
Bi-GRU 77 77 77 77
IndoBERT 71 71 71 71

Evaluation Metrics Each Label

Model Label Precision (%) Recall (%) F1-Score (%)
LSTM Human 72 77 75
AI 78 74 76
GRU Human 62 77 68
AI 81 68 77
Bi-LSTM Human 73 79 76
AI 80 75 78
Bi-GRU Human 72 80 76
AI 82 74 78
IndoBERT Human 67 73 70
AI 75 69 72

Screenshots

Here are some screenshots of the application:

initial-screen

models-option

detected-as-ai-generated-text

detected-as-human-text

Author

  • Kevin Wijaya

About

A simple web app that can be used to detect AI-generated text in the Indonesian language, using various AI models such as LSTM, GRU, Bi-LSTM, Bi-GRU, and IndoBERT

Topics

Resources

License

Stars

Watchers

Forks