As artificially generated content becomes more and more realistic and widespread, there is an increasing need to discern between the two types of content. This project aims to address the challenge of distinction, using an encoder-only transformer architecture for the binary classification of AI-generated and human-written responses to a given prompt. A byte-pair encoding tokenizer with a vocabulary size of 8000 was used, learned positional embedding and sinusoidal positional embedding was tested and compared, different pooling strategies such as the use of a [CLS] token and mean pooling were compared, and various hyperparameters (attention heads, layers, layer size, dropout amount, activation function, batch size, learning rate, epochs, optimizer, betas, weight decay) were tuned to yield the highest validation accuracy.
-
Notifications
You must be signed in to change notification settings - Fork 0
kevinmeng2001/Detect-AI-Generated-Text
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published