# HUHU@IberLEF2023 Task 2b (Regression)

Task: https://sites.google.com/view/huhuatiberlef23/huhu

This notebook contains the code to load and test several trained transformers for the task of hurtful humour detection (regression).

In particular, the models are:

* BERT Multilingual: ``bert-base-multilingual-cased`` and ``bert-base-multilingual-uncased``
* RoBERTa: ``roberta-base``
* BETO: ``dccuchile/bert-base-spanish-wwm-cased`` and ``dccuchile/bert-base-spanish-wwm-uncased``
* DistilBERT Multilingual: ``distilbert-base-multilingual-cased``

The transformers that define the best ensemble found in experimentation will be used to predict the score of the instances in the test split.

The predicted scores will be stored in an output file as per the requirements of the competition. Additional evaluation plots may be also genered.

# Setting up the environment

In [None]:
import torch

# Check GPU availability on Google Colab
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

use_cuda = torch.cuda.is_available()

In [None]:
# Install libraries
!pip install simpletransformers
!pip install datasets
!pip install ipywidgets
!pip install --upgrade huggingface_hub

In [None]:
# Define global variables

SEED = 42 # allow for experiments' reproductibility
WEIGHTED = True # use weighted ensemble (in favour of models with higher F1-score)

# Test split load

In [None]:
from huggingface_hub import notebook_login
# Notebook login via HF's token
notebook_login()

In [None]:
from datasets import *
import pandas as pd

# Avoid warnings
logging.set_verbosity_error()

# Load test split
test = pd.DataFrame(load_dataset("huhu2023/test-huhu2023", split="test"))

In [None]:
# Function to rename fields and drop unnecessary ones
def get_text_and_label(df, original_dataset=True):
  return df.rename(columns={"tweet": "text"})[["index", "text"]]

# Get treated dataframe for test split
test = get_text_and_label(test)

print("Test split size:", len(test.index))
test.head()

# Set-up the working environment

In [None]:
# Select the name of the experiment to be evaluated
EXP = "/path/to/task2b/outputs/experiment_name/"
TRANSFORMERS = "/path/to/task2b/outputs/trained_transformers/"

In [None]:
# Load and mount the Drive helper
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import os

# Define the path to the experiment folder
PATH = "/content/drive/My Drive/HUHU-IberLEF2023/reg/outputs/"
EXP_PATH = os.path.join(PATH, EXP)
TRANSFORMERS_PATH = os.path.join(PATH, TRANSFORMERS)
print("Current working dir:", EXP_PATH)

# Create a folder for the test results
OUTPUT = os.path.join(EXP_PATH, "test")
os.mkdir(OUTPUT)

# Models' load

In this section, the different transformers that will be tested are gathered. For this purpose, the implementation mainly relies in the ``simpletransformers`` Python library, which allows to train and test transformers within few steps.

For further information: https://simpletransformers.ai/

In [None]:
# Define transformers' initialization dictionary 
models = {
    "mbert-cased": {
        "model_type": "bert",
        "model_name": "bert-base-multilingual-cased"
    },
    "mbert-uncased": {
        "model_type": "bert",
        "model_name": "bert-base-multilingual-uncased"
    },
    "roberta": {
        "model_type": "roberta",
        "model_name": "roberta-base"
    },
    "beto-cased": {
        "model_type": "bert",
        "model_name": "dccuchile/bert-base-spanish-wwm-cased"
    },
    "beto-uncased": {
        "model_type": "bert",
        "model_name": "dccuchile/bert-base-spanish-wwm-uncased"
    },
    "distilbert-multi": {
        "model_type": "distilbert",
        "model_name": "distilbert-base-multilingual-cased"
    }
}

In [None]:
# Import pre-trained simpletransformers models for classification
from simpletransformers.classification import ClassificationModel, ClassificationArgs

# Define a dictionary where each key matches its corresponding transformer
for model, fields in models.items():    
  models[model] = ClassificationModel(fields["model_type"], os.path.join(TRANSFORMERS_PATH, model))

# Best ensemble's definition

The ensemble that performed the best in the selected experiment is defined.

This will be the one used for the predictions to be performed on the test set.

In [None]:
import json

# Get the data relative to the best ensemble
best_ensemble = {}
with open(os.path.join(EXP_PATH, "best-ensemble.json")) as json_file:
    best_ensemble = json.load(json_file)

print("----- BEST ENSEMBLE -----")
for field in ["name", "models", "metrics"]:
  print(f"{field}:", best_ensemble.get(field))

# Best ensemble's predictions

In [None]:
# Load model evaluation JSON
model_evaluation = {}
for model in best_ensemble.get("models"):
  with open(os.path.join(os.path.join(EXP_PATH, model), "model-evaluation.json")) as json_file:
      model_evaluation[model] = json.load(json_file)

In [None]:
from sklearn.preprocessing import normalize

# Function which determines the ensembler prediction based on its
# transformers' predictions. A weighted voting system may be used
def vote(predictions, weighted=False, weights=None):
  return sum(predictions * weights) if weighted else sum(predictions)/len(predictions)

test_predictions = list()

# Function to predict the label of the instances in a dataset split (validation
# ("val") or test ("test")) for each ensemble
def predict_ensemble(ensemble_name, dataset_name, dataset, weighted=False):
  # Traverse each dataset instance
  for i in range(len(dataset.index)):
    predictions = list()
    ensemble_models = best_ensemble.get("models")
    # Get the raw output of each model in the ensemble for the instance at hand
    for model_name in ensemble_models:
      curr_model_outputs = model_evaluation[model_name].get(f"{dataset_name}_scores")
      predictions.append(curr_model_outputs[i])
    
    # Define the list of weights if a weighted voting system must be used
    weights = list()
    if weighted:
      # The weights' list is obtained by normalizing the RMSE of the models
      # in the ensemble
      rmse_list = [model_evaluation[model_name]["metrics"].get("rmse")
                        for model_name in best_ensemble.get("models")]
      weights = normalize([[1/rmse for rmse in rmse_list]], norm="l1")[0]

    # Append the computed scores to the predictions of the ensemble
    ensemble_pred = vote(predictions, weighted, weights)
    test_predictions.append(ensemble_pred)

In [None]:
# Predicting the score of the test set's instances with each individual
# transformer that composes the best ensemble
for model_name in best_ensemble.get("models"):
  _, model_raw_outputs = models.get(model_name).predict(test["text"].tolist())
  model_evaluation[model_name]["test_scores"] = model_raw_outputs

# Calculating the test predictions of the best ensemble
predict_ensemble(best_ensemble.get("name"), "test", test, weighted=WEIGHTED)

# Show some predictions
n = 5
print(f"First {n} predictions:", test_predictions[:n])

# Save test predictions

A new pandas dataframe is created. Further, the test predictions are saved to an output CSV file as per required by the competition.

In [None]:
# Create output dataframe for test predictions
test_output = test[["index"]].rename(columns={"index": "tweet_id"})
# Rounding to only one decimal value
test_predictions = [round(pred, 1) for pred in test_predictions]
test_output["prejudice_degree"] = test_predictions
test_output.head(10)

In [None]:
# Save results in CSV file
test_output.to_csv(os.path.join(OUTPUT, "results.csv"), index=False)