---
title: "Syllogism Validation with BERT"
description: |
  Given two premises this validation model can classify validity with 85% accuracy on a 50/50 split dataset.
author:
  - name: Jake Gehri
    url: {}
date: "2022-11-27"
categories: ["Python", "Deep Learning", "NLP"]
image: syllogism.jpeg
format: 
  html:
    df-print: paged
    toc: true
engine: knitr
---

In [1]:
import pandas as pd
import torch
from torch import nn
import torch.nn.functional as F
import transformers
from transformers import DistilBertTokenizer
from transformers import DistilBertForSequenceClassification
from transformers import Trainer, TrainingArguments
from datasets import load_metric
import numpy as np

In [2]:
df = pd.read_csv('Avicenna_Train.csv', encoding='ISO-8859-1')

In [3]:
df.head()

Unnamed: 0,Premise 1,Premise 2,Syllogistic relation,Conclusion
0,"unchecked imbalances in the society, will see ...",correct these imbalances requires in-depth kno...,no,No conclusion
1,"Chronic diseases are heart attacks and stroke,...",In populations that eat a regular high-fiber d...,yes,In populations that eat a regular high-fiber d...
2,Formative assessment encourages children to en...,An ideal learning environment uses formative a...,yes,An ideal learning environment encourages child...
3,Underrepresented female labor force in some pr...,Job discrimination comes with underrepresented...,yes,Job discrimination comes with not being able t...
4,damaged mentality in an individual brings seri...,Aggression harms the mentality of person.,yes,Aggression brings brings serious health proble...


In [4]:
df['label'] = df['Syllogistic relation'].eq('yes').mul(1)

In [5]:
df['text'] = (df['Premise 1'] + " : " + df['Premise 2'])

In [38]:
df['label'].value_counts()

1    2427
0    2373
Name: label, dtype: int64

In [6]:
int(len(df) * 0.8)

3840

In [7]:
train_texts = df.iloc[:3840]['text'].values
train_labels = df.iloc[:3840]['label'].values

valid_texts = df.iloc[3840:]['text'].values
valid_labels = df.iloc[3840:]['label'].values

In [8]:
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

In [9]:
train_encodings = tokenizer(list(train_texts), truncation=True, padding=True)
valid_encodings = tokenizer(list(valid_texts), truncation=True, padding=True)

In [10]:
class SyllogismDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels
    
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item
    
    def __len__(self):
        return len(self.labels)


In [11]:
train_dataset = SyllogismDataset(train_encodings, train_labels)
valid_dataset = SyllogismDataset(valid_encodings, valid_labels)

In [12]:
train_dataloader = torch.utils.data.DataLoader2(train_dataset, batch_size=16, shuffle=True)
valid_dataloader = torch.utils.data.DataLoader2(valid_dataset, batch_size=16, shuffle=True)

In [13]:
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')

Downloading:   0%|          | 0.00/256M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_layer_norm.bias', 'vocab_transform.bias', 'vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier

In [14]:
DEVICE = 'cuda'

In [15]:
model.train()

metrics = load_metric('accuracy')

Downloading builder script:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

In [16]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    
    predictions = np.argmax(logits, axis=-1)
    return metrics.compute(predictions=predictions, references=labels)

In [17]:
training_args = TrainingArguments(output_dir='./results', num_train_epochs=3, per_device_train_batch_size=16,
                                 per_device_eval_batch_size=16, logging_dir='./logs', logging_steps=72)

trainer = Trainer(model=model, 
                  args=training_args, 
                  train_dataset=train_dataset, 
                  eval_dataset=valid_dataset,
                  compute_metrics=compute_metrics
                 )

In [18]:
trainer.train()

***** Running training *****
  Num examples = 3840
  Num Epochs = 3
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 720


Step,Training Loss
72,0.6585
144,0.4923
216,0.4134
288,0.2983
360,0.2532
432,0.2167
504,0.1786
576,0.1069
648,0.1068
720,0.0918


Saving model checkpoint to ./results/checkpoint-500
Configuration saved in ./results/checkpoint-500/config.json
Model weights saved in ./results/checkpoint-500/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=720, training_loss=0.281636557314131, metrics={'train_runtime': 97.56, 'train_samples_per_second': 118.081, 'train_steps_per_second': 7.38, 'total_flos': 289110097566720.0, 'train_loss': 0.281636557314131, 'epoch': 3.0})

In [19]:
trainer.evaluate()

***** Running Evaluation *****
  Num examples = 960
  Batch size = 16


{'eval_loss': 0.4387502670288086,
 'eval_accuracy': 0.88125,
 'eval_runtime': 2.2301,
 'eval_samples_per_second': 430.476,
 'eval_steps_per_second': 26.905,
 'epoch': 3.0}

In [20]:
df_test = pd.read_csv('Avicenna_Test.csv', encoding='ISO-8859-1')

df_test['label'] = df_test['Syllogistic relation'].eq('yes').mul(1)
df_test['text'] = (df_test['Premise 1'] + " : " + df_test['Premise 2'])

test_texts = df_test['text'].values
test_labels = df_test['label'].values

test_encodings = tokenizer(list(test_texts), truncation=True, padding=True)

test_dataset = SyllogismDataset(test_encodings, test_labels)

test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=16, shuffle=True)

In [21]:
trainer.evaluate(test_dataset)

***** Running Evaluation *****
  Num examples = 1200
  Batch size = 16


{'eval_loss': 0.5759531855583191,
 'eval_accuracy': 0.8525,
 'eval_runtime': 2.8515,
 'eval_samples_per_second': 420.837,
 'eval_steps_per_second': 26.302,
 'epoch': 3.0}

In [22]:
sample_text = ['Socrates is a man : all men are mortal']
sample_label = [1]

In [23]:
sample_encoded = tokenizer(sample_text, truncation=True, padding=True)

In [25]:
sample_dataset = SyllogismDataset(sample_encoded, sample_label)
sample_dataset

<__main__.SyllogismDataset at 0x7f63a4fccd60>

In [33]:
trainer.predict(sample_dataset).label_ids

***** Running Prediction *****
  Num examples = 1
  Batch size = 16


array([1])

In [32]:
sample_text_2 = ['If the streets are wet, it has rained recently : The streets are wet.']
sample_label_2 = [0]

sample_encoded_2 = tokenizer(sample_text_2, truncation=True, padding=True)

sample_dataset_2 = SyllogismDataset(sample_encoded_2, sample_label_2)

trainer.predict(sample_dataset_2).label_ids

***** Running Prediction *****
  Num examples = 1
  Batch size = 16


array([0])