[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mbertani/conformal-prediction/blob/main/02_Conformal_Prediction_NLP.ipynb)

The purpose of this notebook is to show how Conformal Prediction can be used to better filter toxic or hateful text via conformal outlier detection. We will only use the non-toxic data, and then with type-1 error control identify the toxic outliers

This notebook uses Kennedy et al.'s (2020) [pre-trained RoBERTa-Base](https://huggingface.co/ucberkeley-dlab/hate-measure-roberta-base) to score hate speech provided by [publicly released dataset](https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech) described in Kennedy et al. (2020) and Sachdeva et al. (2022).

This notebook is inspired by Angelopoulos and Bates' (2021) [A gentle introduction to conformal prediction and distribution-free uncertainty quantification](https://github.com/aangelopoulos/conformal-prediction/blob/main/notebooks/toxic-text-outlier-detection.ipynb)


References:\
Angelopoulos, A. N., & Bates, S. (2021). A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511.\
Kennedy, C. J., Bacon, G., Sahn, A., & von Vacano, C. (2020). Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application. arXiv preprint arXiv:2009.10277.\
Pratik Sachdeva, Renata Barreto, Geoff Bacon, Alexander Sahn, Claudia von Vacano, and Chris Kennedy. 2022. The Measuring Hate Speech Corpus: Leveraging Rasch Measurement Theory for Data Perspectivism. In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022, pages 83–94, Marseille, France. European Language Resources Association.

# Libraries

In [None]:
#conda create -n tekna_nlp python=3.9
#!pip install jupyter
#!pip install -q transformers
#!pip install -q datasets
#!pip install -q huggingface_hub
#!pip install -q torch
#!pip install tf-keras==2.17.0

# Imports & Verbosities

In [1]:
import numpy as np
import datasets, transformers
from huggingface_hub import from_pretrained_keras
import tensorflow as tf
import pickle

transformers.logging.set_verbosity_error()
tf.get_logger().setLevel('ERROR')

# Globals

In [2]:
n_data_samples = 5000 # download this number of data samples from https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech

alpha = 0.1 # 1-alpha is the desired type-1 error (10%). If the desired level is set to 10%, the conformal predictor will be adjusted to ensure that the probability of falsely identifying non-toxic speech as toxic does not exceed 10%
n = 200 # n calibration points

model_path = "ucberkeley-dlab/hate-measure-roberta-base" #https://huggingface.co/ucberkeley-dlab/hate-measure-roberta-base
tokenizer_path = "roberta-base"

data_path = 'ucberkeley-dlab/measuring-hate-speech' #https://huggingface.co/FacebookAI/roberta-base

# Model & Tokenizer

In [3]:
model = from_pretrained_keras(model_path, verbose=0)
model.compile()

tokenizer = transformers.RobertaTokenizer.from_pretrained(tokenizer_path)

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

# Data & Tokenization

In [4]:
sub_df = datasets.load_dataset(data_path, 'default', split='train').to_pandas()[['text', 'hate_speech_score']].sample(n=n_data_samples)

tokens = tokenizer(sub_df['text'].values.tolist(), return_tensors='np', padding="max_length", max_length=247, truncation=True)

# Inference

In [5]:
y_pred = model.predict([tokens['input_ids'], tokens['attention_mask']])



In [6]:
with open('assets/NLP_y_pred_a=0.1_n_data_samples=5000_n=500.pickle', 'wb') as handle:
    pickle.dump(y_pred, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [6]:
with open('assets/NLP_y_pred_a=0.1_n_data_samples=5000_n=500.pickle', 'rb') as handle:
    y_pred = pickle.load(handle)

# Normalise data

In [7]:
# Continuous data is in the range = [-8.34 - 6.3]
preds = np.array([(float(x)+8.34)/(6.3+8.34) for x in y_pred.flatten().tolist()])
is_toxic = np.array([round((float(x)+8.34)/(6.3+8.34)) for x in sub_df['hate_speech_score'].values.tolist()])

In [8]:
# Look at only the non-toxic data
nontoxic = is_toxic == 0

preds_nontoxic = preds[nontoxic]
preds_toxic = preds[np.invert(nontoxic)]

In [9]:
# Split nontoxic data into calibration and validation sets
# The conformal predictor can be calibrated to control the type-1 error rate to a desired level, ensuring that the probability of missing toxic speech is kept below a threshold α
idx = np.array([1] * n + [0] * (preds_nontoxic.shape[0]-n)) > 0
cal_scores, val_scores = preds_nontoxic[idx], preds_nontoxic[np.invert(idx)]

### Conformal outlier detection happens here

In [10]:
# Use the outlier detection method to get a threshold on the toxicities
# qhat ​is a quantile value derived from the nonconformity scores of the calibration set
qhat = np.quantile(cal_scores, np.ceil((n+1)*(1-alpha))/n)

# Perform outlier detection on the ind and ood data
outlier_ind = val_scores > qhat # Identifying outliers in the in-distribution data. The goal is to have the proportion of outliers be no more than alpha on average
outlier_ood = preds_toxic > qhat # Identifying outliers in the out-of-distribution data. Out-of-distribution data is inherently more unpredictable and may not conform to the same patterns as the in-distribution data


In [11]:
# Calculate type-1 and type-2 errors
# Type-1 Error (False Negative) - true label (toxic or non-toxic) is not included in the predicted confidence region.
# Type-2 Error (False Positive) - predicted confidence region includes both toxic and non-toxic labels, even when the true label is non-toxic.

# There's a trade-off between the two types of errors. Reducing type-1 errors (ensuring all toxic speech is identified) might increase type-2 errors (more non-toxic speech being flagged as potentially toxic)

# The conformal predictor can be calibrated to control the type-1 error rate to a desired level, ensuring that the probability of missing toxic speech is kept below a threshold, say α. However, this often results in larger prediction sets, which could lead to higher type-2 errors.
type1 = outlier_ind.mean()
type2 = 1-outlier_ood.mean()
print(f"The type-1 error is {type1:.4f}, the type-2 error is {type2:.4f}, and the threshold is {qhat:.4f}.")

The type-1 error is 0.1110, the type-2 error is 0.8780, and the threshold is 0.7407.


In [12]:
# # Show predictions made by the model that are outliers
predictions_post_cf = sub_df['text'].values[nontoxic][n:][outlier_ind==True].tolist()

_=[print(item) for item in list(np.random.choice(predictions_post_cf, size=(10,)))]

I, a Catholic and a Jesuit, am grateful for this Muslim holy season for challenging me to be a more dedicated child of God. URL
Trans rights are human rights
Ugh, it's so hard being a beautiful blonde woman.  *Tosses hair.*
The sex appeal of a pregnant woman. She is beautiful, congratulations dude!
I am curious as to how having a lot of 1st gen Indians will help you. ABCDs grow up in a different culture than people like us (immigrants from India) and I have often found that ABCDs have their own circle of friends and don't like to interact much with immigrants.
@Lowkz Only people born in Mexico are Mexicans fucking idiot.
oh that's cute. you didn't even read the thing you claim to support. there is nothing biologically inherent in women that would make them less adapt to do it work. thinking so does in fact make you 'crazy stupid'. sorry!
@Agri_EC This is a disgrace to an African country, look how apartheid is let to continue.. 😲
discorperted how exactly are they invading if they're see