Detect-AI-Generated-Text

As artificially generated content becomes more and more realistic and widespread, there is an increasing need to discern between the two types of content. This project aims to address the challenge of distinction, using an encoder-only transformer architecture for the binary classification of AI-generated and human-written responses to a given prompt. A byte-pair encoding tokenizer with a vocabulary size of 8000 was used, learned positional embedding and sinusoidal positional embedding was tested and compared, different pooling strategies such as the use of a [CLS] token and mean pooling were compared, and various hyperparameters (attention heads, layers, layer size, dropout amount, activation function, batch size, learning rate, epochs, optimizer, betas, weight decay) were tuned to yield the highest validation accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Confusion Matrix.png		Confusion Matrix.png
README.md		README.md
Transformer.ipynb		Transformer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Detect-AI-Generated-Text

About

Uh oh!

Releases

Packages

Languages

kevinmeng2001/Detect-AI-Generated-Text

Folders and files

Latest commit

History

Repository files navigation

Detect-AI-Generated-Text

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages