In this notebook, we will apply a first-level DeBERTa model, trained on the 2022 data and 2021 data with pseudo-labels, to generate predictions for the holdout and test datasets. These predictions will serve as one of the features used by the second-level model, which will be trained on the holdout dataset and ultimately evaluated on the test set.

In [None]:
!pip install transformers datasets evaluate accelerate peft==0.12.0 patool

Collecting datasets
  Downloading datasets-3.0.1-py3-none-any.whl.metadata (20 kB)
Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting peft==0.12.0
  Downloading peft-0.12.0-py3-none-any.whl.metadata (13 kB)
Collecting patool
  Downloading patool-3.0.0-py2.py3-none-any.whl.metadata (4.0 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.17-py310-none-any.whl.metadata (7.2 kB)
INFO: pip is looking at multiple versions of multiprocess to determine which version is compatible with other requirements. This could take a while.
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading peft-0.12.0-py3-none-any.whl (296 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
import patoolib
import os
import json
from collections import Counter

import pandas as pd

import torch
from torch import nn
from transformers import AutoTokenizer, AutoModelForSequenceClassification, DataCollatorWithPadding
from peft import LoraConfig, get_peft_model, AutoPeftModelForSequenceClassification
from datasets import Dataset
from torch.utils.data import DataLoader

from sklearn.metrics import precision_score, recall_score, f1_score

from tqdm import tqdm

Let's extract and load the data, which has been previously split into training, holdout, and test sets, along with loading the pre-trained model. Since the model has already been trained, we will only use it to generate predictions for the holdout and test sets.

In [None]:
mkdir data2022

In [None]:
BASIC_PATH = '/content/gdrive/MyDrive/ML/projects/feedback-prize/'
TRAINED_TOKENIZER = 'deberta-base-tokenizer'
BASE_MODEL = 'microsoft/deberta-v3-base'
DEBERTA_WEIGHTS = '1st_level_models/deberta_trained_2022_weights.pth'

TOKENIZER_PATH = BASIC_PATH + TRAINED_TOKENIZER
FINAL_WEIGHTS = BASIC_PATH + DEBERTA_WEIGHTS

SAVE_DATASETS_FOLDER = '1st_level_preds/'

In [None]:
patoolib.extract_archive(BASIC_PATH+'data/feedback-prize-effectiveness.zip', outdir = '/content/data2022')

INFO patool: Extracting /content/gdrive/MyDrive/ML/projects/feedback-prize/feedback-prize-effectiveness.zip ...
INFO:patool:Extracting /content/gdrive/MyDrive/ML/projects/feedback-prize/feedback-prize-effectiveness.zip ...
INFO patool: running /usr/bin/7z x -o/content/data2022 -- /content/gdrive/MyDrive/ML/projects/feedback-prize/feedback-prize-effectiveness.zip
INFO:patool:running /usr/bin/7z x -o/content/data2022 -- /content/gdrive/MyDrive/ML/projects/feedback-prize/feedback-prize-effectiveness.zip
INFO patool:     with input=''
INFO:patool:    with input=''
INFO patool: ... /content/gdrive/MyDrive/ML/projects/feedback-prize/feedback-prize-effectiveness.zip extracted to `/content/data2022'.
INFO:patool:... /content/gdrive/MyDrive/ML/projects/feedback-prize/feedback-prize-effectiveness.zip extracted to `/content/data2022'.


'/content/data2022'

In [None]:
input_dir = '/content/data2022'

train_csv = os.path.join(input_dir, 'train.csv')

data_2022 = pd.read_csv(train_csv)

In [None]:
data_2022.head()

Unnamed: 0,discourse_id,essay_id,discourse_text,discourse_type,discourse_effectiveness
0,0013cc385424,007ACE74B050,"Hi, i'm Isaac, i'm going to be writing about h...",Lead,Adequate
1,9704a709b505,007ACE74B050,"On my perspective, I think that the face is a ...",Position,Adequate
2,c22adee811b6,007ACE74B050,I think that the face is a natural landform be...,Claim,Adequate
3,a10d361e54e4,007ACE74B050,"If life was on Mars, we would know by now. The...",Evidence,Adequate
4,db3e453ec4e2,007ACE74B050,People thought that the face was formed by ali...,Counterclaim,Adequate


In [None]:
class_names = list(set(data_2022['discourse_effectiveness']))
class_names.sort()

label_to_id = {label: i for i, label in enumerate(class_names)}
id_to_label = {i: label for i, label in enumerate(class_names)}

In [None]:
label_to_id

{'Adequate': 0, 'Effective': 1, 'Ineffective': 2}

In [None]:
data_2022['target'] = data_2022['discourse_effectiveness'].replace(label_to_id)

  data_2022['target'] = data_2022['discourse_effectiveness'].replace(label_to_id)


In [None]:
data_2022['text'] = data_2022['discourse_type'] + ' ' + data_2022['discourse_text']

In [None]:
tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_PATH, clean_up_tokenization_spaces = True, use_fast = False)
model = AutoModelForSequenceClassification.from_pretrained(BASE_MODEL, num_labels = len(class_names), id2label = id_to_label, label2id = label_to_id)

peft_config = LoraConfig(task_type = "SEQ_CLS",
                         inference_mode = True,
                         target_modules = 'all-linear',
                         r = 8,
                         lora_alpha = 16,
                         lora_dropout = 0.2)

deberta_model = get_peft_model(model, peft_config)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/579 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/371M [00:00<?, ?B/s]

Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-v3-base and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
deberta_model.load_state_dict(torch.load(FINAL_WEIGHTS))

  deberta_model.load_state_dict(torch.load(FINAL_WEIGHTS))


<All keys matched successfully>

Write the preprocess and predict functions.

In [None]:
def preprocess(data):
    tokenized = tokenizer(data['text'], max_length = 200, truncation = True, padding = 'max_length')
    return tokenized

In [None]:
def predict_labels(inference_model, dataset):

    inference_model.eval()

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    inference_model.to(device)

    data_collator = DataCollatorWithPadding(tokenizer = tokenizer, return_tensors = 'pt')
    dataloader = DataLoader(dataset, batch_size = 64, collate_fn = data_collator, shuffle = False)

    predictions_list = []

    for batch in tqdm(dataloader):

        # Move batch to the appropriate device
        batch = {k: v.to(device) for k, v in batch.items()}

        with torch.no_grad():
            outputs = inference_model(input_ids=batch['input_ids'], attention_mask=batch['attention_mask'])

        predictions = outputs.logits.argmax(dim=-1)
        predictions_list.extend(predictions.cpu().numpy())

    return predictions_list

Split the data.

In [None]:
with open(BASIC_PATH+'data_splits.json', 'r') as file:
    split_ids = json.load(file)

In [None]:
# for 2nd level model
holdout_ids = split_ids['holdout_ids']
holdout_data = data_2022[data_2022['essay_id'].isin(holdout_ids)].copy()
holdout_data.reset_index(drop = True, inplace = True)

#for final evaluation of blending ensemble
test_ids = split_ids['test_ids']
test_data = data_2022[data_2022['essay_id'].isin(test_ids)].copy()
test_data.reset_index(drop = True, inplace = True)

In [None]:
holdout_dataset = Dataset.from_pandas(holdout_data[['text']])
test_dataset = Dataset.from_pandas(test_data[['text']])

In [None]:
holdout_dataset = holdout_dataset.map(preprocess, batched = True, remove_columns = ['text'])

Map:   0%|          | 0/5921 [00:00<?, ? examples/s]

In [None]:
test_dataset = test_dataset.map(preprocess, batched = True, remove_columns = ['text'])

Map:   0%|          | 0/7382 [00:00<?, ? examples/s]

Predict the labels for the holdout and test sets, then evaluate the model against the ground truth to ensure it has learned to make accurate predictions.

In [None]:
holdout_preds = predict_labels(deberta_model, holdout_dataset)

100%|██████████| 93/93 [01:38<00:00,  1.05s/it]


In [None]:
test_preds = predict_labels(deberta_model, test_dataset)

100%|██████████| 116/116 [01:59<00:00,  1.03s/it]


In [None]:
holdout_data['1st_level_deberta_preds'] = holdout_preds
test_data['1st_level_deberta_preds'] = test_preds

In [None]:
Counter(holdout_data['target'])

Counter({0: 3366, 2: 1033, 1: 1522})

In [None]:
Counter(holdout_data['1st_level_deberta_preds'])

Counter({0: 2291, 2: 1408, 1: 2222})

In [None]:
print(f"Metrics for holdout set")
print(f"Precision: {precision_score(holdout_data['target'], holdout_data['1st_level_deberta_preds'], average = 'macro')}")
print(f"Recall: {recall_score(holdout_data['target'], holdout_data['1st_level_deberta_preds'], average = 'macro')}")
print(f"F1: {f1_score(holdout_data['target'], holdout_data['1st_level_deberta_preds'], average = 'macro')}")

Metrics for holdout set
Precision: 0.564309871078771
Recall: 0.6224592270507238
F1: 0.5742628326183216


In [None]:
print(f"Metrics for test set")
print(f"Precision: {precision_score(test_data['target'], test_data['1st_level_deberta_preds'], average = 'macro')}")
print(f"Recall: {recall_score(test_data['target'], test_data['1st_level_deberta_preds'], average = 'macro')}")
print(f"F1: {f1_score(test_data['target'], test_data['1st_level_deberta_preds'], average = 'macro')}")

Metrics for test set
Precision: 0.5695797110453841
Recall: 0.6276890227527977
F1: 0.5805985223494694


Save the model predictions into a separate folder for future use in training the second-level model.

In [None]:
#holdout_data[['discourse_id', 'essay_id', 'target', '1st_level_deberta_preds']].to_csv(BASIC_PATH+SAVE_DATASETS_FOLDER+'holdout_1st_level_deberta_preds.csv', index = False)

In [None]:
#test_data[['discourse_id', 'essay_id', 'target', '1st_level_deberta_preds']].to_csv(BASIC_PATH+SAVE_DATASETS_FOLDER+'test_1st_level_deberta_preds.csv', index = False)