A FastAPI service for detecting AI-generated text, based on AIGC_text_detector and DivEye.
# Using the pre-built image from GHCR
docker compose -f docker-compose.yml upThe API will be available at http://localhost:8000.
A Chrome extension is available that automatically detects AI-generated text on article pages you visit.
- Go to the Releases page and download the latest
aletheia-extension-*.zip. - Unzip the file.
- Open
chrome://extensionsin Chrome, enable Developer mode. - Click Load unpacked and select the unzipped folder.
Click the extension icon to open the popup. Set the API URL to point to your running Aletheia instance (default: http://localhost:8000). For more options (detection strategy, domain whitelist/blacklist), go to the extension's Settings page.
The extension will automatically detect article content on pages you visit and show a floating badge with the result.
Requires uv.
# Install dependencies
uv sync
# Run the server
uv run uvicorn app.main:app --reloadOr with Docker Compose:
docker compose up --buildDetect whether text is human-written or AI-generated.
Request body:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
text |
string | yes | Text to detect | |
lang |
string | no | auto-detect | "zh" for Chinese model, anything else for English |
model_id |
string | no | (by lang) | HuggingFace model ID; overrides lang |
strategy |
string | no | "truncate" |
"truncate", "sliding_avg", "sliding_weighted_avg", or "sliding_vote" |
early_stop |
bool | no | false |
Stop early when confidence is high enough (sliding strategies only) |
Strategies:
truncate– Truncate to 512 tokens. Fast, single forward pass.sliding_avg– Sliding window (512 tokens, stride 256). Average softmax scores across windows.sliding_weighted_avg– Sliding window. Confidence-weighted average: chunks with higher confidence contribute more.sliding_vote– Sliding window. Majority vote on predicted label across windows.
Note: For most use cases,
truncateis sufficient. For long texts where you want higher accuracy, usesliding_weighted_avgwithearly_stopenabled — it gives better results than plain averaging by weighting high-confidence chunks more heavily, and early stopping avoids unnecessary computation when the result is already clear.
Example:
curl -X POST http://localhost:8000/detect \
-H "Content-Type: application/json" \
-d '{"text": "This is a sample text to detect."}'Response:
{
"label": "human",
"score": 0.98,
"model_id": "yuchuantian/AIGC_detector_env3",
"detected_lang": "en",
"num_chunks": 1
}Health check endpoint.
| Language | Model ID |
|---|---|
| English | yuchuantian/AIGC_detector_env3 |
| Chinese | yuchuantian/AIGC_detector_zhv3 |
You can use any HuggingFace *ForSequenceClassification model by passing model_id in the request.
Transformer-based sequence classifiers fine-tuned for AI-text detection. Source: YuchuanTian/AIGC_text_detector.
DivEye detects AI-generated text using surprisal-based statistical features that capture how unpredictability varies throughout a text. Human writing exhibits greater variability in lexical and structural unpredictability compared to LLM outputs. These features feed an XGBoost classifier, making it interpretable and robust to paraphrasing attacks.
Advik Raj Basani, Pin-Yu Chen. Diversity Boosts AI-Generated Text Detection. TMLR 2026.
Source: IBM/diveye