[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mbertani/conformal-prediction/blob/main/02_Conformal_Prediction_NLP.ipynb)

The purpose of this notebook is to show how Conformal Prediction can be used to better filter toxic or hateful text via conformal outlier detection. We will only use the non-toxic data, and then with type-1 error control identify the toxic outliers

This notebook uses Kennedy et al.'s (2020) [pre-trained RoBERTa-Base](https://huggingface.co/ucberkeley-dlab/hate-measure-roberta-base) to score hate speech provided by [publicly released dataset](https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech) described in Kennedy et al. (2020) and Sachdeva et al. (2022).

This notebook is inspired by Angelopoulos and Bates' (2021) [A gentle introduction to conformal prediction and distribution-free uncertainty quantification](https://github.com/aangelopoulos/conformal-prediction/blob/main/notebooks/toxic-text-outlier-detection.ipynb)


References:\
Angelopoulos, A. N., & Bates, S. (2021). A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511.\
Kennedy, C. J., Bacon, G., Sahn, A., & von Vacano, C. (2020). Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application. arXiv preprint arXiv:2009.10277.\
Pratik Sachdeva, Renata Barreto, Geoff Bacon, Alexander Sahn, Claudia von Vacano, and Chris Kennedy. 2022. The Measuring Hate Speech Corpus: Leveraging Rasch Measurement Theory for Data Perspectivism. In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022, pages 83–94, Marseille, France. European Language Resources Association.

# Libraries

In [1]:
!pip install -q transformers
!pip install -q datasets
!pip install -q huggingface_hub

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m20.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m401.2/401.2 kB[0m [31m25.3 MB/s[0m eta [36m0:00:00[0m
[?25h

# Imports & Verbosities

In [31]:
import numpy as np
import datasets, transformers
from huggingface_hub import from_pretrained_keras
import tensorflow as tf

transformers.logging.set_verbosity_error()
tf.get_logger().setLevel('ERROR')

# Globals

In [29]:
n_data_samples = 5000 # download this number of data samples from https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech

alpha = 0.1 # 1-alpha is the desired type-1 error (10%)
n = 200 # n calibration points

model_path = "ucberkeley-dlab/hate-measure-roberta-base" #https://huggingface.co/ucberkeley-dlab/hate-measure-roberta-base
tokenizer_path = "roberta-base"

data_path = 'ucberkeley-dlab/measuring-hate-speech' #https://huggingface.co/FacebookAI/roberta-base

# Model & Tokenizer

In [32]:
model = from_pretrained_keras(model_path, verbose=0)
model.compile()

tokenizer = transformers.RobertaTokenizer.from_pretrained(tokenizer_path)

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]



# Data & Tokenization

In [19]:
sub_df = datasets.load_dataset(data_path, 'default', split='train').to_pandas()[['text', 'hate_speech_score']].sample(n=n_data_samples)

tokens = tokenizer(sub_df['text'].values.tolist(), return_tensors='np', padding="max_length", max_length=247, truncation=True)

# Inference

In [6]:
y_pred = model.predict([tokens['input_ids'], tokens['attention_mask']])



# Normalise data

In [7]:
# Continuous data is in the range = [-8.34 - 6.3]
preds = np.array([(float(x)+8.34)/(6.3+8.34) for x in y_pred.flatten().tolist()])
toxic = np.array([round((float(x)+8.34)/(6.3+8.34)) for x in sub_df['hate_speech_score'].values.tolist()])

In [8]:
# Look at only the non-toxic data
nontoxic = toxic == 0

preds_nontoxic = preds[nontoxic]
preds_toxic = preds[np.invert(nontoxic)]

In [9]:
# Split nontoxic data into calibration and validation sets
# The conformal predictor can be calibrated to control the type-1 error rate to a desired level, ensuring that the probability of missing toxic speech is kept below a threshold α
idx = np.array([1] * n + [0] * (preds_nontoxic.shape[0]-n)) > 0
cal_scores, val_scores = preds_nontoxic[idx], preds_nontoxic[np.invert(idx)]

### Conformal outlier detection happens here

In [10]:
# Use the outlier detection method to get a threshold on the toxicities
# qhat ​is a quantile value derived from the nonconformity scores of the calibration set
qhat = np.quantile(cal_scores, np.ceil((n+1)*(1-alpha))/n)

# Perform outlier detection on the ind and ood data
outlier_ind = val_scores > qhat # We want this to be no more than alpha on average
outlier_ood = preds_toxic > qhat # We want this to be as large as possible, but it doesn't have a guarantee

In [11]:
# Calculate type-1 and type-2 errors
# Type-1 Error (False Negative) - true label (toxic or non-toxic) is not included in the predicted confidence region.
# Type-2 Error (False Positive) - predicted confidence region includes both toxic and non-toxic labels, even when the true label is non-toxic.

# There's a trade-off between the two types of errors. Reducing type-1 errors (ensuring all toxic speech is identified) might increase type-2 errors (more non-toxic speech being flagged as potentially toxic)

# The conformal predictor can be calibrated to control the type-1 error rate to a desired level, ensuring that the probability of missing toxic speech is kept below a threshold, say α. However, this often results in larger prediction sets, which could lead to higher type-2 errors.
type1 = outlier_ind.mean()
type2 = 1-outlier_ood.mean()
print(f"The type-1 error is {type1:.4f}, the type-2 error is {type2:.4f}, and the threshold is {qhat:.4f}.")

The type-1 error is 0.0995, the type-2 error is 0.0973, and the threshold is 0.5267.


In [14]:
# Show predictions made by the model
predictions_post_cf = sub_df['text'].values[nontoxic][n:][outlier_ind==True].tolist()

_=[print(item) for item in list(np.random.choice(predictions_post_cf, size=(10,)))]

Holy shit....see if I was there your boxers would be off and cos I'm a nasty slut I'd eat your ass while you jerked that huge dick and came on my face 😈😈
If u don't read ur Bible NOW, u will become a slave to charlatans like Rabbi John HAGEE.
Niggas really whippin these shits? Lmaooo URL
"Bury the rag deep in your face For now's the time for your tears". Another white man escapes the gavel of justice for killing a black person. Reparations are due.
So let me get this straight you just tell them you are an American citizen then they let you drive through with 6 illegal Mexicans in the trunk.
She even said her first and last sugar daddy. Obvi she's not into making it a career. If she keeps the child or not, it's clear her life has been changed by it or will be forever. How very Close minded of you to think that her only option is to "get pounded with dick".
all i know is, if you put your hands on a woman you are the ultimate trash - i don't give a fuck about the situation. YOU ARE A BITC