# LLM - Detect AI Generated Text
Identify which essay was written by a large language model

In [None]:
__author__ = "Pradeep Pujari"
__version__ = "Version 0"  

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cgpotts/cs224u/blob/main/hw_sentiment.ipynb)
[![Open in SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/cgpotts/cs224u/blob/main/hw_sentiment.ipynb)

If Colab is opened with this badge, please **save a copy to drive** (from the File menu) before running the notebook.

## Overview

In recent years, large language models (LLMs) have become increasingly sophisticated, capable of generating text that is difficult to distinguish from human-written text. In this competition, we hope to foster open research and transparency on AI detection techniques applicable in the real world.

This competition challenges participants to develop a machine learning model that can accurately detect whether an essay was written by a student or an LLM. The competition dataset comprises a mix of student-written essays and essays generated by a variety of LLMs.

**Data**

The dataset comprises a mix of student-written essays and essays generated by a variety of LLMs. source_text is given to the student with instructions to write an essay.

generated - Whether the essay was written by a student (0) or generated by an LLM (1). This field is the target and is not present in test_essays.csv.

What is a prompt_name column?
Looks like it is not the prompt to llm model. It is a theme or subject, student can use this theme togather with source_text column to write an essay.

## Set-up

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import re
import torch
import torch.nn as nn
from sklearn.model_selection import train_test_split
import transformers
from transformers import AutoModel, AutoTokenizer
from sklearn.metrics import roc_auc_score

trainEssayDf=pd.read_csv("/kaggle/input/llm-detect-ai-generated-text/train_essays.csv")
trainPromptDf=pd.read_csv("/kaggle/input/llm-detect-ai-generated-text/train_prompts.csv")
additionTextDf=pd.read_csv("/kaggle/input/daigt-v2-train-dataset/train_v2_drcat_02.csv")


## Transformer fine-tuning

We're now going to move into a more modern mode: fine-tuning pretrained components. We'll use BERT-mini (originally from [the BERT repo](https://github.com/google-research/bert)) for rapdily test code and can then consider scaling up to larger models.  

The transformers library does a lot of logging. To avoid ending up with a cluttered notebook, I am changing the logging level. You might want to skip this as you scale up to building production systems, since the logging is very good – it gives you a lot of insights into what the models and code are doing.

In [None]:
transformers.logging.set_verbosity_error()

Here we set ourselves up to use BERT-mini:

In [None]:
weights_name = "/kaggle/input/bert-mini"
#weights_name = "/kaggle/input/huggingface-bert/bert-base-cased"
bert = AutoModel.from_pretrained(weights_name)

bert_tokenizer = AutoTokenizer.from_pretrained(weights_name)

In [None]:
def clean_text(text):
    # Replace "\n\n" with a single space
    cleaned_text = re.sub(r'\n\n', ' ', text)
    # Replace "\'" with "'"
    cleaned_text = re.sub(r"\\'", "'", cleaned_text)
    return cleaned_text

In [None]:
trainEssayDf["cleaned_text"]=trainEssayDf["text"].apply(lambda x:clean_text(x))

In [None]:
additionTextDf["cleaned_text"]=additionTextDf["text"].apply(lambda x:clean_text(x))

In [None]:
trainPromptDf["cleaned_source_text"]=trainPromptDf["source_text"].apply(lambda x:clean_text(x))

In [None]:
trainDf = pd.merge(trainEssayDf, trainPromptDf, on='prompt_id', how='inner')

In [None]:
columns = ['id','prompt_name','cleaned_text','instructions','cleaned_source_text','generated']
train_df = trainDf[columns]

In [None]:
train_df.sample(5)

In [None]:
data1=additionTextDf[["cleaned_text","label"]]
data2=train_df[["cleaned_text","generated"]].rename(columns={'generated': 'label'})

In [None]:
result_df = pd.concat([data1, data2], ignore_index=True)

In [None]:
result_df.head()

In [None]:
train,validation = train_test_split(result_df,test_size=0.2)

### Background: Tokenization

Tokenization in Transformer models is handled differently from tokenization in linear models. For Transformer models, we need to use the tokenizer that comes with the model so that we reliably have embedding representations for every token.

### Background: Representation

Having mapped our string to a list of tokens, we can use the `forward` method of the model to get representations:  
The value of **last_hidden_state** hidden state is the sequence of final output states from the model:  
The value of pooler_output is a set of currently random parameters sitting on top of the first output hidden state. You can see here that it is a single vector representation per example. often feel unsure of precisely what this model component is. Here we can have a quick look:


In [None]:
bert.pooler

So this is a dense linear layer (a single matrix of weights) with a bias term, and a tanh activation function is applied to the output. We could put a classifier head on top of this if we wanted to, but we might have mixed feelings about being stuck with that tanh step. examples from a single batch have different lengths, we need to mask the padded tokens to get the intended results from the model.

### Task 1: Batch tokenization

Your task here is to use the `batch_encode_plus` method for `bert_tokenizer` to tokenize a list of strings. 

In [None]:
def get_batch_token_ids(batch, tokenizer):
    """Map `batch` to a tensor of ids. The return
    value should meet the following specification:

    1. The max length should be 512.
    2. Examples longer than the max length should be truncated
    3. Examples should be padded to the max length for the batch.
    4. The special [CLS] should be added to the start and the special
       token [SEP] should be added to the end.
    5. The attention mask should be returned
    6. The return value of each component should be a tensor.

    Parameters
    ----------
    batch: list of str
    tokenizer: Hugging Face tokenizer

    Returns
    -------
    dict with at least "input_ids" and "attention_mask" as keys,
    each with Tensor values

    """
    # Encode the concatenated string
    encoding = tokenizer.batch_encode_plus(batch, max_length=512, padding='max_length',
                                     truncation=True, return_tensors='pt', add_special_tokens=True)

    return encoding

### Task 2: Fine-tuning module

1. in the `init` method, define `self.classifier_layer` using [nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html)
2. Complete the `forward` method.

In [None]:
class BertClassifierModule(nn.Module):
    def __init__(self,
            n_classes,
            hidden_activation,
            weights_name=weights_name):
        """This module loads a Transformer based on  `weights_name`,
        puts it in train mode, add a dense layer with activation
        function give by `hidden_activation`, and puts a classifier
        layer on top of that as the final output. The output of
        the dense layer should have the same dimensionality as the
        model input.

        Parameters
        ----------
        n_classes : int
            Number of classes for the output layer
        hidden_activation : torch activation function
            e.g., nn.Tanh()
        weights_name : str
            Name of pretrained model to load from Hugging Face

        """
        super().__init__()
        self.n_classes = n_classes
        self.weights_name = weights_name
        self.bert = AutoModel.from_pretrained(self.weights_name)
        self.bert.train()
        self.hidden_activation = hidden_activation
        self.hidden_dim = self.bert.embeddings.word_embeddings.embedding_dim
        # Add the new parameters here using `nn.Sequential`.
        # We can define this layer as
        #
        #  h = f(cW1 + b_h)
        #  y = hW2 + b_y
        #
        # where c is the final hidden state above the [CLS] token,
        # W1 has dimensionality (self.hidden_dim, self.hidden_dim),
        # W2 has dimensionality (self.hidden_dim, self.n_classes),
        # f is the hidden activation, and we rely on the PyTorch loss
        # function to add apply a softmax to y.
        self.classifier_layer = None
        ##### YOUR CODE HERE
       # Define the classifier_layer using nn.Sequential
        self.classifier_layer = nn.Sequential(
            nn.Linear(self.hidden_dim, self.hidden_dim),  # W1
            self.hidden_activation,                       # Activation function
            nn.Linear(self.hidden_dim, self.n_classes)   # W2
        )

    def forward(self, indices, mask):
        """Process `indices` with `mask` by feeding these arguments
        to `self.bert` and then feeding the initial hidden state
        in `last_hidden_state` to `self.classifier_layer`

        Parameters
        ----------
        indices : tensor.LongTensor of shape (n_batch, k)
            Indices into the `self.bert` embedding layer. `n_batch` is
            the number of examples and `k` is the sequence length for
            this batch
        mask : tensor.LongTensor of shape (n_batch, d)
            Binary vector indicating which values should be masked.
            `n_batch` is the number of examples and `k` is the
            sequence length for this batch

        Returns
        -------
        tensor.FloatTensor
            Predicted values, shape `(n_batch, self.n_classes)`

        """
        # Process indices and mask through self.bert
        outputs = self.bert(indices, attention_mask=mask)

        # Extract the [CLS] token representation
        cls_token_representation = outputs.last_hidden_state[:, 0, :]

        # Apply the classifier layer
        logits = self.classifier_layer(cls_token_representation)

        return logits

In [None]:
bert_module = BertClassifierModule(n_classes=2, hidden_activation=nn.Tanh())

### Classifier interface

In [None]:
import sys
sys.path.append('/kaggle/input/py-scripts')


from torch_shallow_neural_classifier import TorchShallowNeuralClassifier

class BertClassifier(TorchShallowNeuralClassifier):
    def __init__(self, weights_name, *args, **kwargs):
        self.weights_name = weights_name
        self.tokenizer = AutoTokenizer.from_pretrained(self.weights_name)
        super().__init__(*args, **kwargs)
        self.params += ['weights_name']

    def build_graph(self):
        return BertClassifierModule(
            self.n_classes_, self.hidden_activation, self.weights_name)

    def build_dataset(self, X, y=None):
        data = get_batch_token_ids(X, self.tokenizer)
        if y is None:
            dataset = torch.utils.data.TensorDataset(
                data['input_ids'], data['attention_mask'])
        else:
            self.classes_ = sorted(set(y))
            self.n_classes_ = len(self.classes_)
            class2index = dict(zip(self.classes_, range(self.n_classes_)))
            y = [class2index[label] for label in y]
            y = torch.tensor(y)
            dataset = torch.utils.data.TensorDataset(
                data['input_ids'], data['attention_mask'], y)
        return dataset

In [None]:
bert_finetune = BertClassifier(
    weights_name=weights_name,
    hidden_activation=torch.nn.ReLU(),
    eta=0.00005,          # Low learning rate for effective fine-tuning.
    batch_size=32,         # Small batches to avoid memory overload.
    gradient_accumulation_steps=4,  # Increase the effective batch size to 32.
    early_stopping=True,  # Early-stopping
    n_iter_no_change=5)   # params.

In [None]:
%%time
_ = bert_finetune.fit(train['cleaned_text'].tolist(),train['label'])

In [None]:
#preds = bert_finetune.predict(validation['cleaned_text'].tolist())

In [None]:
pred_probs = bert_finetune.predict_proba(validation['cleaned_text'].tolist())[:,1]

In [None]:
true_labels = []
true_labels.extend(validation['label'])

In [None]:
roc_auc = roc_auc_score(true_labels, pred_probs)
print(f'ROC-AUC Score: {roc_auc:.10f}')

In [None]:
#from sklearn.metrics import classification_report
#print(classification_report(validation['label'], preds, digits=3))

In [None]:
test_df=pd.read_csv("/kaggle/input/llm-detect-ai-generated-text/test_essays.csv")

In [None]:
test_df["cleaned_text"]=test_df["text"].apply(lambda x:clean_text(x))

In [None]:
test_probs = bert_finetune.predict_proba(test_df['cleaned_text'].tolist())[:,1]

In [None]:
test_probs

In [None]:
subdf=pd.read_csv("/kaggle/input/llm-detect-ai-generated-text/sample_submission.csv")

In [None]:
#subdf

In [None]:
subdf['generated'] = [x for x in test_probs]

In [None]:
subdf.to_csv("submission.csv",index=False)