# Individual Assignment 

Oromia Sero - 2720857

The chosen two diagnostic tests:
1. Invariance test (INV): is used to see if the model prediction remains the same when insignificant parts of the inputs are slightly changed, with changes that are label preserving. The model's prediction is expected to remain unchanged.
 
 Such tests could be relevant in checking to see how adaptable our model is to unexpected or unintended changes in our vocabulary of the data set that we are checking. The more invariance our model can catch the more flexible it is to intentional or unintentional changes it can encounter in a dataset.
 
  These changes could sometimes be intentional in some cases, for instance in offensive language or hate speech detection, people can insert intentional typos or replace some words with special characters to avoid detection of offensive or profanity words. In such cases it is very crucial for a model to be able to accommodate these variances in writing styles.


2. A Directional Expectation test (DIR)  is the same as an invariance test, except that the label is expected to change in a certain way.

 Such tests could be relevant in checking to see how the model catches slight changes that should result in a change of direction for example change in sentiment from good to bad or a statement containing a word a model would classify as hate or offensive word, but when the usage isn’t for that purpose. A good example could be our model effectively differentiating between ‘I hate Christians.’ vs. ‘I don’t hate Christians.’. 

  Therefore, these changes are non-label preserving, and the model's prediction is expected to change. 


#1. Build The Model

## I have chosen to implement these tests for the model built for the previous tasks of the assignment.

I will be using checklist's built-in tests to perform the chosen diagnostic tests. Therefore, Let's build that model.


In [None]:
!pip install checklist
!pip install simpletransformers 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from random import seed
from random import randrange
from collections import Counter
from sklearn.metrics import f1_score, precision_score, recall_score
import seaborn as sn

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
from google.colab import drive
drive.mount('/content/drive')
import os
os.chdir('/content/drive/My Drive/Colab Notebooks/')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
train_df = pd.read_csv ('/content/drive/My Drive/olid-train.csv')
test_df = pd.read_csv ('/content/drive/My Drive/olid-test.csv')

In [None]:
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import logging

logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

#Model configuration 
model_args = ClassificationArgs()
model_args.num_train_epochs = 1
model_args.labels_list = [1, 0]
model_args.overwrite_output_dir = True 
#the classification model
model = ClassificationModel("bert", "bert-base-cased", args=model_args)

Train the model using 13,240 annotated 
tweets for offensive language detection given in the assignment.

In [None]:
# Train the model
train_df_new, eval_df = np.split(train_df, [int(.8*len(train_df))])
model.train_model(train_df_new.iloc[: , 1:], )

# # Evaluate the model
result, model_outputs, wrong_predictions = model.eval_model(eval_df.iloc[: , 1:])

Then, create a function(```
prediction_confidence
```) that will return probabilities or confidences of our prediction weather a given sentence is offensive or not because checklist's test input must be a function that returns predictions and confidences wrapped as tuple '(predictions, confidences)'.

In [6]:
def prediction_confidence(input_list):
  predictions = model.predict(input_list)[0]
  print(predictions)
  p1 = np.array([(x + 1) / 2.0 for x in predictions]).reshape(-1, 1)
  p0 = 1 - p1
  return np.hstack((p0, p1))

At last we add a wrapper method to add predictions as the first elements of the tuple. 

In [7]:
from checklist.pred_wrapper import PredictorWrapper

wrapped_pp = PredictorWrapper.wrap_softmax(prediction_confidence)

# Create Tests

In [8]:
from checklist.perturb import Perturb
from checklist.editor import Editor
from checklist.test_types import INV, DIR

editor = Editor()


##1. Invariance Test(INV)

Lable preserving perturbations will be applied to the data set, and the model's predictions are expected to remain the same.
The following variances will be created to see how the model performs when some change is introduced to the sample data. These changes are:

*   Adding typos
*   Adding Punctuation
*   Changing names


In [9]:
#This method accepts data, type of perturbation applies it to the data and runs checklists's invariance test
def inv_test(data: list, method: callable, wrapped_predict: callable):
    if (method == Perturb.add_typos):
      t = Perturb.perturb(data, Perturb.add_typos, typos=7)
    else:
      t = Perturb.perturb(data, method)
    test = INV(**t)
    test.run(wrapped_predict)
    test.summary()
    test.visual_summary()


This sample data was taken from Hugging Face hate speech offensive dataset,  link is provided in references.

In [10]:
data = [
        '!!! RT @mayasolovely: As a woman you shouldn\'t complain about cleaning up your house. &amp; as a man you should always take the trash out...',
        'Lemmie eat a Oreo &amp; do these dishes." One oreo? Lol',
        '@__BrighterDays: I can not just sit up and HATE on another bitch .. I got too much shit going on!',
        'black bottle &amp; a bad bitch',
        '@ArizonasFinest6: Why the eggplant emoji doe?"y he say she looked like scream lmao',
        'Mike is an awful person, he should be cancelled for life.',
        "&amp; he's got no shit done"
]

import spacy

nlp = spacy.load("en_core_web_sm")
pdata = list(nlp.pipe(data))

The expected resut labels are: 


```
[0, 0, 1, 1, 0, 1, 1]
```

Now let's add typos to our dataset and run our test with check list , Examples of faild tests are also shown below.

In [12]:
#Add typos and test Invariance 
inv_test(data, Perturb.add_typos, wrapped_pp)


INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


Predicting 14 examples


  0%|          | 0/14 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

[1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1]
Test cases:      7
Fails (rate):    3 (42.9%)

Example fails:
1.0 Mike is an awful person, he should be cancelled for life.
0.5 Mike i sana wfulp erosn,h e should be cancellde fro life.

----
1.0 black bottle &amp; a bad bitch
0.5 blcak botlt e&amp; a ba dbtchi

----
1.0 @__BrighterDays: I can not just sit up and HATE on another bitch .. I got too much shit going on!
0.5 @__BrighterDyas:  Ica nno tjust sit u pand HATE on another ibtch .. I got too much shti going on!

----


Now let's change names in our data to see if this will change anything. 

In [16]:

inv_test(pdata, Perturb.change_names, wrapped_pp)

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


Predicting 11 examples


  0%|          | 0/11 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Test cases:      1
Fails (rate):    0 (0.0%)


Now let's see if punctuation invariance causes any change in the model, for this the data needs to be changed to spaCy Doc

In [15]:

inv_test(pdata, Perturb.punctuation, wrapped_pp)


INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


Predicting 16 examples


  0%|          | 0/16 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

[1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1]
Test cases:      7
Fails (rate):    0 (0.0%)



In the first test, It is seen that the typos cause failures:
Some examples are:


```
Example fails:
1.0 Mike is an awful person, he should be cancelled for life.
0.5 Mike i sana wfulp erosn,h e should be cancellde fro life.








```






.



The potential reason for failure is the lack of vocabulary variances in the model and the training dataset. To mitigate this issue, we could train the model using more epochs; in addition, we can use more corps text to train our model on different writing styles, which, of course, would require more resources.


In the second and third tests, punctuations and name changes seem to cause no effect on the prediction of our model; All tests seem to pass; this is to be expected since the model is checking for offensive language and, usually, the offensiveness of a statement doesn't change with punctuation, the tone may change. The same goes for the test with name changes, and the model seems to predict 100%, but this can not be used as a measure of performance for the model as a very small dataset was used.


## Directional Expectation Test

Now we expect the output of the model to change after perturbation. For this test, we will be using a negation; the pitfall here is that when it comes to offensive or hate speech, the negation function of the checklist rarely makes changes that would result in a change in the direction of a given statement; this is due to the function's ability only to negate auxiliary verbs and remove some negation words. This doesn’t help with the offensiveness of a sentence usually; for this reason, a simpler hate speech data set is used to test the directional change expectation of the model, and a checklist will be used to perturb the dataset by negating it.

In [17]:
data = [
        'I hate Gabonese.',
        'I hate Taoism.',
        'I don’t hate Indonesian.',
        'I hate Zoroastrianism.',
        'I don’t hate Macedonian.',
        'I don’t hate Agnosticism.', 
        'I hate robots.',
        'I don’t hate Mexican.'
        'I don’t hate god',
        'I hate children.',
        
]
pdata = list(nlp.pipe(data))


In [20]:
from checklist.expect import Expect

def changed_pred(orig_pred, pred, orig_conf, conf, labels=None, meta=None):
    return pred != orig_pred


expect_fn = Expect.pairwise(changed_pred)

In [21]:
t = Perturb.perturb(pdata, Perturb.add_negation)


In [22]:
test = DIR(**t, expect=expect_fn)
test.run(wrapped_pp)
test.summary()

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


Predicting 10 examples


  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

[0, 0, 0, 0, 1, 0, 1, 0, 1, 0]
Test cases:      5
Fails (rate):    2 (40.0%)

Example fails:
0.5 I hate Taoism.
0.5 I don't hate Taoism.

----
0.5 I hate Gabonese.
0.5 I don't hate Gabonese.

----


Here, even though a simple dataset was used, we can see that the model fails to pick on some of the words and fails to classify them as offensive or not.

```
Example:
0.5 I hate Taoism.
0.5 I don't hate Taoism.
```

These mistakes are due to the lack of vocabulary in the dataset, and the model has deficient performance when new words are introduced during predictions. Again to mitigate this issue, using various datasets could be very useful. The model will learn to pick up on words that are not very frequently used but crucial for specific cases, hate speech detection in this case.


# References
1. https://huggingface.co/datasets/hate_speech_offensive/viewer/default/train
2. https://github.com/marcotcr/checklist