# AI Content Detector

This notebook builds a machine learning model that can detect whether text was written by a human or generated by AI (like ChatGPT). We'll use:

- **Dataset**: HC3 (Human-ChatGPT Comparison Corpus)
- **Model**: DistilBERT (a smaller, faster version of BERT)
- **Libraries**: Hugging Face Transformers, PyTorch, Pandas

Let's get started!

In [1]:

pip install transformers datasets pandas scikit-learn torch

Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupt

## Step 1: Load the Dataset

We'll use the HC3 dataset which contains pairs of human and AI-generated (ChatGPT) answers to the same questions.

In [2]:
# Import necessary libraries
from datasets import load_dataset
import pandas as pd

# Instead of:
# dataset = load_dataset("Hello-SimpleAI/HC3", split='train')

# Use this:
dataset = load_dataset("Hello-SimpleAI/HC3", "all", split='train', trust_remote_code=True)

# Let's look at the first example to understand the structure
print("Dataset example:")
print(dataset[0])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/1.49k [00:00<?, ?B/s]

HC3.py:   0%|          | 0.00/9.47k [00:00<?, ?B/s]

0000.parquet:   0%|          | 0.00/39.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/24322 [00:00<?, ? examples/s]

Dataset example:
{'id': '0', 'question': 'Why is every book I hear about a " NY Times # 1 Best Seller " ? ELI5 : Why is every book I hear about a " NY Times # 1 Best Seller " ? Should n\'t there only be one " # 1 " best seller ? Please explain like I\'m five.', 'human_answers': ['Basically there are many categories of " Best Seller " . Replace " Best Seller " by something like " Oscars " and every " best seller " book is basically an " oscar - winning " book . May not have won the " Best film " , but even if you won the best director or best script , you \'re still an " oscar - winning " film . Same thing for best sellers . Also , IIRC the rankings change every week or something like that . Some you might not be best seller one week , but you may be the next week . I guess even if you do n\'t stay there for long , you still achieved the status . Hence , # 1 best seller .', "If you 're hearing about it , it 's because it was a very good or very well - publicized book ( or both ) , and a

## Step 2: Prepare the Data

Now we'll organize our data for training. We'll:
1. Label human-written texts as 0
2. Label AI-generated texts as 1
3. Filter out any empty or very short answers

In [3]:
# Function to organize our data
def prepare_data(dataset):
    data = []
    # Process each example in the dataset
    for row in dataset:
        # Add human answers with label 0
        for ans in row['human_answers']:
            if ans and len(ans.strip()) > 10:  # Skip empty or very short answers
                data.append({"text": ans, "label": 0})

        # Add AI answers with label 1
        for ans in row['chatgpt_answers']:
            if ans and len(ans.strip()) > 10:  # Skip empty or very short answers
                data.append({"text": ans, "label": 1})

    return data

# Convert to flat format
flat_data = prepare_data(dataset)

# Create a pandas DataFrame
df = pd.DataFrame(flat_data)

# Show dataset statistics
print(f"Total examples: {len(df)}")
print(f"Human examples: {len(df[df['label'] == 0])}")
print(f"AI examples: {len(df[df['label'] == 1])}")
print("\nSample data:")
print(df.head())

Total examples: 85429
Human examples: 58544
AI examples: 26885

Sample data:
                                                text  label
0  Basically there are many categories of " Best ...      0
1  If you 're hearing about it , it 's because it...      0
2  One reason is lots of catagories . However , h...      0
3  There are many different best seller lists tha...      1
4  salt is good for not dying in car crashes and ...      0


## Step 3: Split Data into Training and Validation Sets

We'll split our data into:
- 80% for training (to teach the model)
- 20% for validation (to evaluate how well it learned)

In [4]:
from sklearn.model_selection import train_test_split

# Split data into training and validation sets
train_texts, val_texts, train_labels, val_labels = train_test_split(
    df['text'], df['label'], test_size=0.2, random_state=42
)

print(f"Training examples: {len(train_texts)}")
print(f"Validation examples: {len(val_texts)}")

Training examples: 68343
Validation examples: 17086


## Step 4: Set Up the AI Model

We'll use a pre-trained language model called DistilBERT. This model has already learned a lot about language from massive amounts of text. We'll fine-tune it for our specific task of detecting AI-generated content.

In [5]:
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification

# Load a pre-trained model and tokenizer
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=2)

print("Model and tokenizer loaded successfully!")

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model and tokenizer loaded successfully!


## Step 5: Prepare Text Data for Training

The model can't understand raw text - we need to convert it to numbers. The tokenizer does this by:
1. Breaking text into tokens (words or word pieces)
2. Converting tokens to numbers
3. Adding special tokens and padding to standardize length

In [6]:
# Convert our text data into the format our model needs
train_encodings = tokenizer(list(train_texts), truncation=True, padding=True, max_length=512)
val_encodings = tokenizer(list(val_texts), truncation=True, padding=True, max_length=512)

# Create a PyTorch dataset
import torch

class AIDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        # Convert data to PyTorch tensors
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

# Create our dataset objects
train_dataset = AIDataset(train_encodings, train_labels.tolist())
val_dataset = AIDataset(val_encodings, val_labels.tolist())

print("Data prepared for training!")

Data prepared for training!


In [7]:
!pip install --upgrade transformers



## Step 6: Set Up Training Configuration

Now we configure how the model will be trained:
- How many times to go through the data (epochs)
- How much data to process at once (batch size)
- How to optimize the learning process

In [8]:
import transformers
print(f"Transformers version: {transformers.__version__}")


from transformers import Trainer, TrainingArguments

# Basic compatible version for older transformers
training_args = TrainingArguments(
    output_dir='./results',          # Where to save model checkpoints
    num_train_epochs=3,              # Keep same number of epochs for accuracy
    per_device_train_batch_size=32,  # Much larger batch size to utilize GPU
    per_device_eval_batch_size=64,   # Even larger for evaluation (doesn't need gradients)
    warmup_steps=200,                # Reduced warmup steps
    weight_decay=0.01,               # Same weight decay
    logging_dir='./logs',            # Keep same logging directory
    fp16=True,                       # Enable mixed precision for GPU acceleration
    gradient_accumulation_steps=2,   # Accumulate gradients for effective batch size of 64
    logging_steps=50,                # Log more frequently to see progress
    save_strategy="epoch",           # Save at the end of each epoch
    report_to=[],                    # Disable wandb reporting
    dataloader_num_workers=2,        # Use multiple workers for data loading
    optim="adamw_torch",             # Use PyTorch's implementation of AdamW
)

# Set up the trainer
trainer = Trainer(
    model=model,                     # The model to train
    args=training_args,              # Training arguments
    train_dataset=train_dataset,     # Training dataset
    eval_dataset=val_dataset         # Evaluation dataset
)

print("Training configured and ready to start!")

Transformers version: 4.51.3
Training configured and ready to start!


## Step 7: Train the Model

This is the most time-consuming step. The model will:
1. Look at examples of human and AI text
2. Make predictions
3. Learn from its mistakes
4. Improve over time

This will take about 15-30 minutes on Google Colab with GPU enabled.

In [9]:
# Start training!
trainer.train()

print("Training complete!")

Step,Training Loss
50,0.565
100,0.1339
150,0.0838
200,0.0516
250,0.0323
300,0.0306
350,0.034
400,0.0257
450,0.0333
500,0.0176


Training complete!


## Step 8: Evaluate the Model

Let's see how well our model performs on the validation data it hasn't seen during training.

In [10]:
# Run evaluation
eval_results = trainer.evaluate()
print(f"Evaluation results: {eval_results}")

# More detailed metrics
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix

# Get predictions for validation set
predictions = trainer.predict(val_dataset)
preds = np.argmax(predictions.predictions, axis=1)

# Print detailed metrics
print("\nDetailed Classification Report:")
print(classification_report(val_labels, preds, target_names=["Human", "AI"]))

# Create a confusion matrix
print("\nConfusion Matrix:")
print(confusion_matrix(val_labels, preds))

Evaluation results: {'eval_loss': 0.015001952648162842, 'eval_runtime': 68.2347, 'eval_samples_per_second': 250.4, 'eval_steps_per_second': 3.913, 'epoch': 3.0}

Detailed Classification Report:
              precision    recall  f1-score   support

       Human       1.00      1.00      1.00     11812
          AI       0.99      1.00      1.00      5274

    accuracy                           1.00     17086
   macro avg       1.00      1.00      1.00     17086
weighted avg       1.00      1.00      1.00     17086


Confusion Matrix:
[[11769    43]
 [    8  5266]]


## Step 9: Test With Your Own Examples

Now let's create a function to test our model with any text input!

In [16]:
# Function to test our model with any text
def detect_ai(text):
    # Prepare the text for the model
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

    # Move inputs to the same device as the model
    device = next(model.parameters()).device  # Get model's device
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Get model prediction
    with torch.no_grad():
        outputs = model(**inputs)

    # Convert to probability
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)

    # Get prediction (0 = human, 1 = AI)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    confidence = probs[0][prediction].item() * 100

    result = "AI-Generated" if prediction == 1 else "Human-Written"
    return f"Prediction: {result} (Confidence: {confidence:.2f}%)"

## Step 10: Save Your Model (Optional)

If you want to use this model later without retraining, you can save it now.

In [12]:
# Save the model and tokenizer
model.save_pretrained("./ai_detector_model")
tokenizer.save_pretrained("./ai_detector_tokenizer")
print("Model saved successfully!")

Model saved successfully!


## Step 11: Build a Simple Web Interface (Optional)

Let's create a simple web interface to test our model directly in the notebook!

In [17]:
# Install Gradio
!pip install gradio

import gradio as gr

# Create the interface
demo = gr.Interface(
    fn=detect_ai,
    inputs=gr.Textbox(placeholder="Enter text to check...", lines=5),
    outputs="text",
    title="AI Content Detector",
    description="Enter text to check if it's likely written by a human or AI"
)

# Launch the app
demo.launch()

Collecting gradio
  Downloading gradio-5.29.0-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.10.0 (from gradio)
  Downloading gradio_client-1.10.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.8-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)
Collecting safehttpx<0.2.0,>=0.1.6



In [18]:
from google.colab import drive
drive.mount('/content/drive')

MessageError: Error: credential propagation was unsuccessful

## Conclusion

Congratulations! You've built your own AI content detector that can:
- Analyze text to determine if it was written by a human or AI
- Provide confidence scores for its predictions
- Be easily tested through a simple web interface

This model is trained specifically on the HC3 dataset, which means it's particularly good at detecting text from ChatGPT. To improve it further, you could:
- Train with more diverse AI-generated text from different models
- Use a larger model or train for more epochs
- Collect more training data, especially edge cases

Remember that no AI detector is perfect - they work based on statistical patterns and can sometimes be wrong!