<a href="https://colab.research.google.com/github/michealman114/Natural-Language-Models-for-Hate-Speech-Classification/blob/main/ClassificationWithBERT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install transformers



In [2]:
#from transformers import DistilBertTokenizer, DistilBertModel, DistilBertConfig
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification, BertConfig
#from transformers import BertTokenizer, BertForSequenceClassification, BertConfig
from transformers import get_linear_schedule_with_warmup


import torch
import torch.nn as nn 
import torch.utils.data as torch_data
import torch.optim as optim

from tqdm import tqdm


import numpy as np
import random
import json

In [3]:
from torch import cuda

seed = 4814

if cuda.is_available():
    device = 'cuda'
    torch.cuda.manual_seed_all(seed)
    print("running on GPU:", torch.cuda.get_device_name(0))
else:
    device = 'cpu'
    print("running on CPU")


random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)

running on CPU


<torch._C.Generator at 0x7f409b0bc430>

In [4]:
def getCommentsTitlesLabels(file_lines):
    comment_list = []
    title_list = []
    labels = []
    for line in file_lines:
        content = json.loads(line)

        comment = content['text']
        comment_list.append(comment)

        title = content['title']
        title_list.append(title)

        labels.append(content['label'])
    
    return comment_list,title_list,labels

Pick one of the following models
- BERT for sequence classification
- DistilBERT for sequence classification
- Customized DistilBERT for sequence classification

When we finally tuned the parameters and set up training perfectly we acheived extremely good performance from DistilBERT:
DistilBERT for sequence classification results (fine-tuned on 100, tested on 50):
- Accuracy: 0.91
- Precision, Recall, F1: (0.8656716417910447, 1.0, 0.928, None)
However this performance was on a small test dataset of 50 samples (never before seen, but nonetheless a small sample size), so its hard to say in retrospect if it would have performed this well on the whole. 

From manual testing with custom sentences however this model worked pretty well and intuitively. I didn't save this one unfortunately since I tried reproducing it immediately to no success.


DistilBERT for sequence classification results (fine-tuned on 100, validated on 50, tested on remaining 720)
- Validation Performance
    - Accuracy: 0.76
    - Precision, Recall, F1: (0.717948717948718, 0.9655172413793104, 0.8235294117647058, None)
- Test Performance
    - Accuracy: 0.6366197183098592
    - Precision, Recall, F1: (0.5748175182481752, 0.9264705882352942, 0.7094594594594595, None)

This model was very overzealous with classifying things as hate speech (low precision), but it also almost never missed hate speech (high recall).

DistilBERT for sequence classification results (fine-tuned on 300, validated on 100, tested on remaining 470)
- Validation Performance
    - Accuracy: 0.71
    - Precision, Recall, F1: (0.7254901960784313, 0.7115384615384616, 0.7184466019417477, None)
- Test Performance
    - Accuracy: 0.65
    - Precision, Recall, F1: (0.6115702479338843, 0.6883720930232559, 0.6477024070021883, None)
- 3 epochs were run again with model.eval() mode, validation results:
    - Accuracy: 0.75
    - Precision, Recall, F1: (0.7454545454545455, 0.7884615384615384, 0.766355140186916, None)
- 3 more epochs Test:
    - Accuracy: 0.717391304347826
    - Precision, Recall, F1: (0.683982683982684, 0.7348837209302326, 0.7085201793721975, None)


DistilBERT for sequence classification results (fine-tuned on ~700, validated on ~160)
- Validation Performance
    - Accuracy: 0.7325581395348837
    - Precision, Recall, F1: (0.6746987951807228, 0.7466666666666667, 0.708860759493671, None)
- After fine tuning for 3 more epochs:
    - Accuracy: 0.7616279069767442
    - Precision, Recall, F1: (0.7125, 0.76, 0.7354838709677418, None)


In [None]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels = 2,
    output_attentions = False,
    output_hidden_states = False
)

In [5]:
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', do_lower_case=True)
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels = 2)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.weight', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'pre_classifier.bias', 'pre_classifi

In [5]:
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', do_lower_case=True)

Back to normal stuff

In [6]:
original_lines = open("./Data/pruned-fox-news-comments.json", "r").readlines() # original 2015 data
original_comments, original_titles, original_labels = getCommentsTitlesLabels(original_lines)

num_samples = len(original_labels)

print(len(original_comments), len(original_titles), len(original_labels))

1525 1525 1525


In [7]:
num_positives = 0

for comment,title,label in zip(original_comments, original_titles, original_labels):
    if label == 1:
        num_positives += 1
print(num_positives, len(original_labels))

print(num_positives/len(original_labels)) #this is so sad!

435 1525
0.28524590163934427


In [8]:
new_comments = []
new_titles = []
new_labels = []

num_negatives = 0
for comment,title,label in zip(original_comments, original_titles, original_labels):
    if label == 1:
        new_comments.append(comment)
        new_titles.append(title)
        new_labels.append(label)
    if num_negatives < num_positives and label == 0:
        new_comments.append(comment)
        new_titles.append(title)
        new_labels.append(label)
        num_negatives += 1

print(num_negatives)
original_comments, original_titles, original_labels = new_comments, new_titles, new_labels    

435


In [9]:
clipped_comments = []
clipped_titles = []
clipped_labels = []

largest_size = 200

#clip unnecessarily long comments to improve training speed
for comment,title,label in zip(original_comments, original_titles, original_labels):
    comment_length = len(tokenizer.encode(comment))
    if comment_length > largest_size :
        continue
    clipped_comments.append(comment)
    clipped_titles.append(title)
    clipped_labels.append(label)

num_samples = len(clipped_labels)
print(len(clipped_comments), len(clipped_titles), len(clipped_labels))

860 860 860


In [10]:
import random

zipped = list(zip(clipped_comments, clipped_titles, clipped_labels))
random.shuffle(zipped)
shuffled_comments, shuffled_titles, shuffled_labels = zip(*zipped)


print(len(shuffled_comments), len(shuffled_titles), len(shuffled_labels))
print(shuffled_comments[:5])

860 860 860
("How embarrassing for the Navy ....and America. Liberals have gleefully embraced every known perversion and deviance and are angry that sane people refuse to accept their sickness as 'normal'. Liberals - the very worst among us. Liberalism - America's greatest enemy.", 'Sure you do. And you look like a fo\ufeffol.', "Now we have reverse discrimination, BIG TIME because of this PERCEIVED PROBLEM that our DEAR LEADER intended to remedy by EXECUTIVE FIAT! Just another reason why there should not be a federal department of education that doles out OUR OWN TAX MONEY and makes the states JUMP THROUGH HOOPS TO GET IT! Don't you think ENOUGH IS ENOUGH!", 'And this is the vaunted "European Society" America\'s liberals want us to emulate?', 'Just curious...why do or should we care what she has to say?')


In [11]:
input_ids = []
attention_masks = []


encoded_dict = tokenizer(
                    shuffled_comments,
                    add_special_tokens = True,
                    max_length = 200,
                    padding = True,
                    truncation = True,
                    return_tensors = 'pt',
                )
    


tokenized_comments = encoded_dict['input_ids']
attention_masks = encoded_dict['attention_mask']
all_labels = torch.tensor(shuffled_labels)


print('Original: ', shuffled_comments[0])
print('Token IDs:', tokenized_comments[0][:12])

Original:  How embarrassing for the Navy ....and America. Liberals have gleefully embraced every known perversion and deviance and are angry that sane people refuse to accept their sickness as 'normal'. Liberals - the very worst among us. Liberalism - America's greatest enemy.
Token IDs: tensor([  101,  2129, 16436,  2005,  1996,  3212,  1012,  1012,  1012,  1012,
         1998,  2637])


In [12]:
print(tokenized_comments.shape)
print(attention_masks.shape)
print(all_labels.shape)

torch.Size([860, 200])
torch.Size([860, 200])
torch.Size([860])


In [13]:
class BERT_raw_Dataset(torch.utils.data.Dataset): # renamed to ProcessingDataset to avoid reuse of name
    def __init__(self, comments, attention_masks, labels):
        """
        comments/titles: (batch_size, max_length, embed_dim)
        labels: (batch_size,)
        """
        #Initialization
        self.comments = comments
        self.attention_masks = attention_masks
        self.labels = labels
        self.length = labels.shape[0]

    def __len__(self):
        return self.length

    def __getitem__(self, index):
        # Load data and get label
        comment = self.comments[index]
        attention_mask = self.attention_masks[index]
        label = self.labels[index]

        return comment,attention_mask,label

In [19]:
# Create a 85-15 train-validation split.
max_train = int(0.80 * num_samples)

max_val = all_labels.shape[0]
train_dataset = BERT_raw_Dataset(tokenized_comments[:max_train], attention_masks[:max_train], all_labels[:max_train])
val_dataset = BERT_raw_Dataset(tokenized_comments[max_train:max_val], attention_masks[max_train:max_val], all_labels[max_train:max_val])
#test_dataset = BERT_raw_Dataset(tokenized_comments[max_val:], attention_masks[max_val:], all_labels[max_val:])

In [20]:
train_loader = torch_data.DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = torch_data.DataLoader(val_dataset, batch_size=32, shuffle=False)
#test_loader = torch_data.DataLoader(test_dataset, batch_size=32, shuffle=False)

In [21]:
import sklearn
from sklearn.metrics import precision_recall_fscore_support, accuracy_score

To train BERT/DistilBERT for sequence classification, the infrastructure is a little different

In [26]:
"""
Recommended parameters: lr = 5e-5, 3e-5, 2e-5
num_epochs = 2,3,4
https://arxiv.org/pdf/1810.04805.pdf

AdamW because it experimentally generalizes better: https://towardsdatascience.com/why-adamw-matters-736223f31b5d

standard lr scheduler from here: https://towardsdatascience.com/advanced-techniques-for-fine-tuning-transformers-82e4e61e16e
"""
optimizer = optim.AdamW(model.parameters(), lr = 2e-5)

num_epochs = 3

In [27]:
scheduler = get_linear_schedule_with_warmup(                
                optimizer = optimizer,
                num_warmup_steps = 0,
                num_training_steps = num_epochs * len(train_loader)
)

In [28]:
training_stats = []
loss_fn = nn.BCELoss()
model.eval()
#model.to(device)

for epoch in tqdm(range(num_epochs)):    
    epoch_training_loss = 0
    print()

    
    #model.train()
    for tokenized_comment, mask, label in train_loader:
        tokenized_comment = tokenized_comment.to(device)
        mask = mask.to(device)
        label = label.to(device)


        model.zero_grad()        

        outputs = model(tokenized_comment, attention_mask=mask, labels=label, return_dict=True)

        loss = outputs.loss
        logits = outputs.logits


        epoch_training_loss += loss.item()

        loss.backward()
        optimizer.step()
        #scheduler.step()
        
        print(f"batch of size {label.shape[0]} finished")

    print(f"epoch training loss = {epoch_training_loss}")


    #model.eval()

    all_val_preds = []
    all_val_labels = []

    with torch.no_grad():
        for tokenized_comment, mask, label in val_loader:
            tokenized_comment = tokenized_comment.to(device)
            mask = mask.to(device)
            label = label.to(device)

            outputs = model(tokenized_comment, attention_mask=mask, labels=label, return_dict=True)

            logits = outputs.logits #output values tensor[16,2] = (batch_size, num_classes) of output values prior to softmaxing
            preds = np.argmax(logits, axis=1)

            
            all_val_preds.append(preds.detach().cpu().numpy())
            all_val_labels.append(label.detach().cpu().numpy())

    
    
    all_val_preds = np.concatenate(all_val_preds)
    all_val_labels = np.concatenate(all_val_labels)
    print(all_val_preds)
    print(all_val_labels)

    print(f"EPOCH {epoch + 1} finished")
    print('Accuracy:', accuracy_score(all_val_labels,all_val_preds))
    print('Precision, Recall, F1:',precision_recall_fscore_support(all_val_labels, all_val_preds, average='binary'))


  0%|          | 0/3 [00:00<?, ?it/s]


batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 16 finished
epoch training loss = 4.590676262974739


 33%|███▎      | 1/3 [10:38<21:17, 638.96s/it]

[1 0 1 1 0 0 1 1 1 1 1 1 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 1 0 0 1 1 0 0 1
 0 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0
 1 0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0 0 1 0 1 1 1 1 0 0 0 0 0
 0 1 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 1 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0
 0 0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0]
[0 1 1 1 0 0 1 1 1 1 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0 0 1 0 0 0 0 1 1
 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 1 0 0
 1 0 1 0 0 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 1 1 0 1 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 0 1 1 1 0 1 0 0
 0 0 0 1 0 0 1 1 1 1 1 0 1 1 1 0 0 1 0 0 0 1 0 0]
EPOCH 1 finished
Accuracy: 0.7558139534883721
Precision, Recall, F1: (0.7391304347826086, 0.68, 0.7083333333333334, None)

batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 fini

 67%|██████▋   | 2/3 [21:21<10:40, 640.78s/it]

[1 0 0 1 0 0 1 1 1 1 1 1 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 1 0 0 1 1 0 1 1
 0 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0
 1 0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1 0 1 1 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 0 1 0 1 1 0 0 1 0 0 1 1 0 1 1 0 0 1 0 0 1 1 1 1 1 0 0
 0 0 0 1 0 0 1 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0]
[0 1 1 1 0 0 1 1 1 1 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0 0 1 0 0 0 0 1 1
 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 1 0 0
 1 0 1 0 0 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 1 1 0 1 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 0 1 1 1 0 1 0 0
 0 0 0 1 0 0 1 1 1 1 1 0 1 1 1 0 0 1 0 0 0 1 0 0]
EPOCH 2 finished
Accuracy: 0.7674418604651163
Precision, Recall, F1: (0.7272727272727273, 0.7466666666666667, 0.7368421052631579, None)

batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch of size 32 finished
batch o

100%|██████████| 3/3 [31:55<00:00, 638.54s/it]

[1 0 1 1 0 0 0 1 1 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 1 1 0 1 1
 0 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0
 1 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 0 1 0 1 0 1 1 1 1 1 0 0 0 0
 0 1 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 0 0 1 1 1 1 1 0 0
 0 0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0]
[0 1 1 1 0 0 1 1 1 1 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0 0 1 0 0 0 0 1 1
 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 1 0 0
 1 0 1 0 0 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 1 1 0 1 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 0 1 1 1 0 1 0 0
 0 0 0 1 0 0 1 1 1 1 1 0 1 1 1 0 0 1 0 0 0 1 0 0]
EPOCH 3 finished
Accuracy: 0.7616279069767442
Precision, Recall, F1: (0.7125, 0.76, 0.7354838709677418, None)





In [60]:
sample_sentence = "Obama and a dog"
encoded_sentence = tokenizer(
                    sample_sentence,
                    add_special_tokens = True,
                    max_length = 200,
                    padding = True,
                    truncation = True,
                    return_tensors = 'pt',
                )
print(encoded_sentence)

{'input_ids': tensor([[ 101, 8112, 1998, 1037, 3899,  102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1]])}


In [61]:
print(model(**encoded_sentence))

SequenceClassifierOutput(loss=None, logits=tensor([[-0.9432,  0.9099]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)


In [None]:
all_train_preds = []
all_train_labels = []

with torch.no_grad():
    for iter, (tokenized_comment, mask, label) in enumerate(test_loader):
        tokenized_comment = tokenized_comment.to(device)
        mask = mask.to(device)
        label = label.to(device)

        outputs = model(tokenized_comment, attention_mask=mask, labels=label, return_dict=True)

        logits = outputs.logits
        preds = np.argmax(logits, axis=1)
        preds = preds.detach().cpu().numpy()
        label = label.detach().cpu().numpy()

        if (iter+1) %5 == 0:
            print(f"iteration {iter} results")
            print('loss:', outputs.loss)
            print('Accuracy:', accuracy_score(label,preds))
            print('Precision, Recall, F1:',precision_recall_fscore_support(label, preds, average='binary'))
        
        all_train_preds.append(preds)
        all_train_labels.append(label)


all_train_preds = np.concatenate(all_train_preds)
all_train_labels = np.concatenate(all_train_labels)


print('\n===Aggregate Stats===')
print('Accuracy:', accuracy_score(all_train_labels,all_train_preds))
print('Precision, Recall, F1:',precision_recall_fscore_support(all_train_labels, all_train_preds, average='binary'))

In [29]:
torch.save(model, "./BERTs/TrainTrainDistil700.pt")

In [30]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [31]:
!cp ./BERTs/TrainTrainDistil700.pt /content/drive/MyDrive

# Alternate Approach
Training customized classifier built from scratch on top of DistilBERT below:

In [15]:
class BertClassifier(nn.Module):
  def __init__(self, bert = None):
    assert bert is not None
    super().__init__()
    self.dense1 = nn.Linear(768, 100)
    self.relu = nn.ReLU()
    self.dense2 = nn.Linear(100,1)
    self.dense = nn.Linear(768,1)
    self.sigmoid = nn.Sigmoid()
    self.model = bert
  
  def forward(self, batch):
    sent_output = self.model(**batch)
    CLS_hidden_state = sent_output.last_hidden_state[:,0,:] #(batch_size, embed_dim)
    output = CLS_hidden_state

    #output = self.dense1(output)
    #output = self.relu(output)
    output = self.dense(output)
    output = self.sigmoid(torch.squeeze(output))
    
    return output


In [None]:
distil_bert = DistilBertModel.from_pretrained('distilbert-base-uncased')
distil_bert.eval()
distil_classifier = BertClassifier(bert = distil_bert)


num_epochs = 3

distil_optimizer = optim.AdamW(distil_classifier.parameters(), lr = 5e-5, eps = 1e-8)
distil_scheduler = get_linear_schedule_with_warmup(                
                optimizer = distil_optimizer,
                num_warmup_steps = 0,
                num_training_steps = num_epochs * len(train_loader)
)

In [20]:
training_stats = []
loss_fn = nn.BCELoss()

distil_classifier.to(device)

for epoch in tqdm(range(num_epochs)):    
    epoch_training_loss = 0
    print()

    #distil_classifier.train()
    for tokenized_comment, mask, label in train_loader:
        tokenized_comment = tokenized_comment.to(device)
        mask = mask.to(device)
        label = label.to(device).type(torch.float32)


        distil_classifier.zero_grad()

        batch = {'input_ids': tokenized_comment, 'attention_mask':mask}
        preds = distil_classifier(batch)

        loss = loss_fn(preds, label)
        epoch_training_loss += loss.item()

        loss.backward()
        distil_optimizer.step()
        distil_scheduler.step()
        
        print(f"batch of size {label.shape[0]} finished")

    print(f"epoch training loss = {epoch_training_loss}")
    

    all_val_preds = []
    all_val_labels = []

    #distil_classifier.eval()
    with torch.no_grad():
        for iter, (tokenized_comment, mask, label) in enumerate(val_loader):
            tokenized_comment = tokenized_comment.to(device)
            mask = mask.to(device)
            label = label.to(device).type(torch.float32)

            batch = {'input_ids': tokenized_comment, 'attention_mask':mask}
            preds = distil_classifier(batch)
            
            all_val_preds.append(torch.round(preds).cpu().detach().numpy())
            all_val_labels.append(label.detach().cpu().numpy())

            break
            
   

    all_val_preds = np.concatenate(all_val_preds)
    all_val_labels = np.concatenate(all_val_labels)

    print(all_val_preds.shape, all_val_preds)
    print(all_val_labels.shape, all_val_labels)

    print(f"EPOCH {epoch + 1} finished")
    print('Accuracy:', accuracy_score(all_val_labels,all_val_preds))
    print('Precision, Recall, F1:',precision_recall_fscore_support(all_val_labels, all_val_preds, average='binary'))


  0%|          | 0/3 [00:00<?, ?it/s]




  _warn_prf(average, modifier, msg_start, len(result))
 33%|███▎      | 1/3 [00:08<00:16,  8.07s/it]

tensor([0.1690, 0.1579, 0.1325, 0.1654, 0.1553, 1.2467, 0.1757, 1.1699, 0.1874,
        0.1324, 0.1511, 0.2115, 0.1649, 1.1360, 0.1124, 0.1476])
tensor([0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0.])
(16,) [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
(16,) [0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0.]
EPOCH 1 finished
Accuracy: 0.8125
Precision, Recall, F1: (0.0, 0.0, 0.0, None)



  _warn_prf(average, modifier, msg_start, len(result))
 67%|██████▋   | 2/3 [00:12<00:06,  6.00s/it]

tensor([0.1690, 0.1579, 0.1325, 0.1654, 0.1553, 1.2467, 0.1757, 1.1699, 0.1874,
        0.1324, 0.1511, 0.2115, 0.1649, 1.1360, 0.1124, 0.1476])
tensor([0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0.])
(16,) [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
(16,) [0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0.]
EPOCH 2 finished
Accuracy: 0.8125
Precision, Recall, F1: (0.0, 0.0, 0.0, None)



  _warn_prf(average, modifier, msg_start, len(result))
100%|██████████| 3/3 [00:17<00:00,  5.79s/it]

tensor([0.1690, 0.1579, 0.1325, 0.1654, 0.1553, 1.2467, 0.1757, 1.1699, 0.1874,
        0.1324, 0.1511, 0.2115, 0.1649, 1.1360, 0.1124, 0.1476])
tensor([0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0.])
(16,) [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
(16,) [0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0.]
EPOCH 3 finished
Accuracy: 0.8125
Precision, Recall, F1: (0.0, 0.0, 0.0, None)



