# Toxicity identier

I want to create a model that will access the level of toxicity, and use this score as metric or add it to the loss. I will train the simple model with pretrained tokenizer from the model for final solution, embedding layer, several linear layers and output with sigmoid function. Also, I will convert tox_level to integer values and train the model on a classification task.

In [1]:
import pandas as pd
import torch
import torch.utils.data as data_utils

In [2]:
!unzip ../data/interim/toxicity_levels.zip

Archive:  ../data/interim/toxicity_levels.zip
  inflating: data/interim/toxicity_levels.csv  


In [3]:
df = pd.read_csv('../data/interim/toxicity_levels.csv')
df.head()

Unnamed: 0,text,tox_level
0,"if Alkar floods her with her mental waste, it ...",0.981983
1,"If Alkar is flooding her with psychic waste, t...",0.014195
2,you're becoming disgusting.,0.999039
3,Now you're getting nasty.,0.065473
4,"well, we can spare your life.",0.985068


In [4]:
threeshold = 0.5

df['tox_level'] = df['tox_level'].apply(lambda x: 1 if x > threeshold else 0)
df.head()

Unnamed: 0,text,tox_level
0,"if Alkar floods her with her mental waste, it ...",1
1,"If Alkar is flooding her with psychic waste, t...",0
2,you're becoming disgusting.,1
3,Now you're getting nasty.,0
4,"well, we can spare your life.",1


## Dataset and Dataloader

In [5]:
from transformers import AutoTokenizer
model_checkpoint = "t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

In [6]:
def preprocessing_stage(sample):
    # in the preprocessing phase, I convert the input text to the list of tokens
    model_inputs = tokenizer(sample['text'], padding='max_length', max_length=256, truncation=True)
    return model_inputs['input_ids']

In [7]:
df['input_ids'] = df.apply(lambda x: preprocessing_stage(x), axis=1)

In [8]:
df.drop(columns=['text'], inplace=True)

In [9]:
from sklearn.model_selection import train_test_split

ratio = 0.2
train, val = train_test_split(
    df, stratify=df['tox_level'], test_size=0.2, random_state=42
)

In [10]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def collate_batch(batch):
    text_list, toxicity_list = [], []
    for _toxicity, _text in batch:
        text_list.append(_text)
        toxicity_list.append(_toxicity)
    return torch.LongTensor(text_list).to(device), torch.FloatTensor(toxicity_list).to(device)

In [11]:
batch_size = 512
train_dataloader = data_utils.DataLoader(
    train.to_numpy(), batch_size=batch_size, shuffle=True, collate_fn=collate_batch
)

val_dataloader = data_utils.DataLoader(
    val.to_numpy(), batch_size=batch_size, shuffle=False, collate_fn=collate_batch
)

## Model

In [12]:
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class TextClassificationModel(nn.Module):
    def __init__(self, input_dim):
        super(TextClassificationModel, self).__init__()
        self.embedding = nn.EmbeddingBag(input_dim, 300)
        self.model = nn.Sequential(
            nn.Linear(300, 1000),
            nn.ReLU(True),
            nn.Linear(1000, 250),
            nn.ReLU(True),
            nn.Linear(250, 50),
            nn.ReLU(True),
            nn.Linear(50, 1)
        )

    def forward(self, text):
        text = self.embedding(text)
        return F.sigmoid(self.model(text))

In [13]:
vocab_size = 32128

model = TextClassificationModel(vocab_size).to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.BCELoss()

## Training

In [14]:
from tqdm.autonotebook import tqdm

def train_one_epoch(
    model,
    loader,
    optimizer,
    loss_fn,
    epoch_num=10
):
    loop = tqdm(
        enumerate(loader, 1),
        total=len(loader),
        desc=f"Epoch {epoch_num}: train",
        leave=True,
    )
    model.train()
    train_loss = 0.0
    for i, batch in loop:
        texts, labels = batch
        optimizer.zero_grad()

        outputs = model(texts).squeeze(1)
        loss = loss_fn(outputs, labels)
        
        loss.backward()

        optimizer.step()

        train_loss += loss.item()
        loop.set_postfix({"loss": train_loss / (i * len(labels))})

def val_one_epoch(
    model,
    loader,
    loss_fn,
    epoch_num=-1
):
    
    loop = tqdm(
        enumerate(loader, 1),
        total=len(loader),
        desc=f"Epoch {epoch_num}: val",
        leave=True,
    )
    val_loss = 0.0
    total = 0
    with torch.no_grad():
        model.eval()
        for i, batch in loop:
            texts, labels = batch

            outputs = model(texts).squeeze(1)
            loss = loss_fn(outputs, labels)
            
            total += len(labels)

            val_loss += loss.item()
            loop.set_postfix({"loss": val_loss / total})
       
    torch.cuda.empty_cache()
    return val_loss / total

In [15]:
for epoch in range(1, 11):
    train_one_epoch(model, train_dataloader, optimizer, criterion, epoch_num=epoch)
    if epoch % 2 == 0:
        val_loss = val_one_epoch(model, val_dataloader, criterion, epoch)

Epoch 1: train:   0%|          | 0/1145 [00:00<?, ?it/s]

Epoch 2: train:   0%|          | 0/1145 [00:00<?, ?it/s]

Epoch 2: val:   0%|          | 0/287 [00:00<?, ?it/s]

Epoch 3: train:   0%|          | 0/1145 [00:00<?, ?it/s]

Epoch 4: train:   0%|          | 0/1145 [00:00<?, ?it/s]

Epoch 4: val:   0%|          | 0/287 [00:00<?, ?it/s]

Epoch 5: train:   0%|          | 0/1145 [00:00<?, ?it/s]

Epoch 6: train:   0%|          | 0/1145 [00:00<?, ?it/s]

Epoch 6: val:   0%|          | 0/287 [00:00<?, ?it/s]

Epoch 7: train:   0%|          | 0/1145 [00:00<?, ?it/s]

Epoch 8: train:   0%|          | 0/1145 [00:00<?, ?it/s]

Epoch 8: val:   0%|          | 0/287 [00:00<?, ?it/s]

Epoch 9: train:   0%|          | 0/1145 [00:00<?, ?it/s]

Epoch 10: train:   0%|          | 0/1145 [00:00<?, ?it/s]

Epoch 10: val:   0%|          | 0/287 [00:00<?, ?it/s]

In [16]:
model_scripted = torch.jit.script(model)
model_scripted.save('../models/toxicity_identifier.pt')

## Manual tests

In [17]:
def check_toxicity(model, inference_result, tokenizer=tokenizer):
    input_ids = tokenizer(inference_result, return_tensors="pt").input_ids.to('cuda')
    print(model(input_ids))

In [22]:
check_toxicity(model, "I love you so much")
check_toxicity(model, "I'm famous, and you're dead")
check_toxicity(model, "And it just helped that you have no morals or integrity")
check_toxicity(model, "Nolan will destroy it")

tensor([[1.0000]], device='cuda:0', grad_fn=<SigmoidBackward0>)
tensor([[1.]], device='cuda:0', grad_fn=<SigmoidBackward0>)
tensor([[0.0362]], device='cuda:0', grad_fn=<SigmoidBackward0>)
tensor([[1.]], device='cuda:0', grad_fn=<SigmoidBackward0>)


After running some manual tests, I can conclude that this model and its minor modifications do not work as I expected(can't differentiate the toxic and nontoxic phrases, and often produce 1) and I am abandoning my idea of using this model as a metric or part of the loss.