Source code of every test for the task A with a BERT model

In [1]:
import numpy as np
import pandas as pd

# Load CSV files.
# CSV task B
def getData():
    df_train_data = pd.read_csv("data/Training_Data/subtaskB_data_all.csv")
    df_train_answers = pd.read_csv("data/Training_Data/subtaskB_answers_all.csv")

    df_train = pd.merge(df_train_data,df_train_answers,on='id', how='left').drop(['id'], axis=1)
    
    df_dev_data = pd.read_csv("data/Dev_Data/subtaskB_dev_data.csv")
    df_dev_answers = pd.read_csv("data/Dev_Data/subtaskB_gold_answers.csv")

    df_dev = pd.merge(df_dev_data,df_dev_answers,on='id', how='left').drop(['id'], axis=1)

    df_test_data = pd.read_csv("data/Test_Data/subtaskB_test_data.csv")
    df_test_answers = pd.read_csv("data/Test_Data/subtaskB_gold_answers.csv")

    df_test= pd.merge(df_test_data,df_test_answers,on='id', how='left').drop(['id'], axis=1)
    
    return df_train, df_dev, df_test

df_train_B, df_dev_B, df_test_B = getData()

df_train_B.head()

Unnamed: 0,FalseSent,OptionA,OptionB,OptionC,answer
0,He poured orange juice on his cereal.,Orange juice is usually bright orange.,Orange juice doesn't taste good on cereal.,Orange juice is sticky if you spill it on the ...,B
1,He drinks apple.,Apple juice are very tasty and milk too,Apple can not be drunk,Apple cannot eat a human,B
2,"Jeff ran 100,000 miles today","100,000 miles is way to long for one person to...","Jeff is a four letter name and 100,000 has six...","100,000 miles is longer than 100,000 km.",A
3,I sting a mosquito,A human is a mammal,A human is omnivorous,A human has not stings,C
4,A giraffe is a person.,Giraffes can drink water from a lake.,A giraffe is not a human being.,.Giraffes usually eat leaves.,B


In [31]:
import spacy
nlp = spacy.load("en_core_web_sm")

Methods to pre-process the dataframe

In [32]:
def lemmatizer(text):
    """
    Receives a string as an input and lemmatizes it.
    """
    str = ""
    doc = nlp(text)
    for token in doc:
        str+=" "+token.lemma_
    return str 

def removeStopWords(text):
    """
    Receives a string and remove stop words from it.
    """
    str = ""
    doc = nlp(text)
    for token in doc:
        if(not token.is_stop):
            str+=" "+token.text
    return str 



In [33]:
def pre_process(df, function):
    newdf = df[['FalseSent', 'OptionA', 'OptionB', 'OptionC']]
    newdf.loc[:,"FalseSent"] = df.FalseSent.apply(function)
    newdf.loc[:,"OptionA"] = df.OptionA.apply(function)
    newdf.loc[:,"OptionB"] = df.OptionB.apply(function)
    newdf.loc[:,"OptionC"] = df.OptionC.apply(function)
    return newdf

Process of data frame, create subsample of it

In [34]:
def subsampleData():
    # subsample data 
    train = df_train_B.sample(n=1000, random_state=42)

    X_train = train[['FalseSent', 'OptionA', 'OptionB', 'OptionC']]
    y_train = train['answer']

    # use the dev set for testing  
    return X_train, y_train

X_test = df_dev_B[['FalseSent', 'OptionA', 'OptionB', 'OptionC']]
y_test = df_dev_B['answer']

Importation of the BERT model

In [35]:
from transformers import BertModel
from bert_sklearn import BertClassifier

In [36]:
model = BertClassifier(max_seq_length=64, train_batch_size=16)
#model.num_mlp_layers = 3
model.max_seq_length = 64
model.epochs = 3
#model.learning_rate = 4e-5
                             
model

Building sklearn text classifier...


In [37]:
X_train_sample, y_train= subsampleData()

Fit with different preprocess type                                                                

In [None]:
X_train_sample.head()

In [None]:
model_classic = model.fit(X_train_sample, y_train)

Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 900, validation data size: 100


Training  : 100%|██████████████████████████████████████████████████████████| 57/57 [07:14<00:00,  7.62s/it, loss=0.738]
Validating: 100%|██████████████████████████████████████████████████████████████████████| 13/13 [00:41<00:00,  3.19s/it]

Epoch 1, Train loss: 0.7380, Val loss: 0.7107, Val accy: 45.00%



Training  : 100%|██████████████████████████████████████████████████████████| 57/57 [07:47<00:00,  8.20s/it, loss=0.681]
Validating: 100%|██████████████████████████████████████████████████████████████████████| 13/13 [00:38<00:00,  2.94s/it]

Epoch 2, Train loss: 0.6810, Val loss: 0.6907, Val accy: 56.00%



Training  :  75%|███████████████████████████████████████████▊              | 43/57 [06:29<02:09,  9.24s/it, loss=0.593]

With only lemma

In [38]:
X_train = pre_process(X_train_sample,lemmatizer)
X_train.head()

Unnamed: 0,sent0,sent1
6252,a duck walk on three leg,a duck walk on two leg
4684,Jack 's mom praise he because he break the plate,Jack 's mom condemn he because he break the p...
1731,People use electricity to buy thing,People use money to buy thing
4742,"the speaker be damage , thus I can not hear a...","the display be damage , thus I can not hear a..."
4521,Santa Claus be the legend of the East,Santa Claus be the legend of the west


In [None]:
model_lemma = model.fit(X_train, y_train)

Remove stop words

In [None]:
X_train = pre_process(X_train_sample,removeStopWords)
X_train.head()

In [None]:
model_stopWords = model.fit(X_train, y_train)

Score of models

In [None]:
from sklearn.metrics import f1_score
from sklearn.metrics import classification_report

In [9]:
def test_performance(model, x_test, y_test):
    y_pred = model.predict(x_test)
    print(classification_report(y_pred=y_pred, y_true=y_test))
    return f1_score(y_pred=y_pred, y_true=y_test, average="macro"), f1_score(y_pred=y_pred, y_true=y_test, average="micro")

In [10]:
f1micro, f1macro = test_performance(model_classic, X_test, y_test)
print(f"f1micro = {f1micro:.3f} and "f"f1macro = {f1macro:.3f}")

Predicting: 100%|████████████████████████████████████████████████████████████████████| 125/125 [02:38<00:00,  1.27s/it]

              precision    recall  f1-score   support

           0       0.52      0.75      0.62       518
           1       0.48      0.24      0.32       479

    accuracy                           0.51       997
   macro avg       0.50      0.50      0.47       997
weighted avg       0.50      0.51      0.48       997

f1 = 0.469





In [None]:
f1micro, f1macro = test_performance(model_lemma, X_test, y_test)
print(f"f1micro = {f1micro:.3f} and "f"f1macro = {f1macro:.3f}")

In [None]:
f1micro, f1macro = test_performance(model_stopWords, X_test, y_test)
print(f"f1micro = {f1micro:.3f} and "f"f1macro = {f1macro:.3f}")

To save a model

In [8]:
#save model to disk
savefile = 'BERT_TaskB.bin'
model.save(savefile)