### Installation

In [None]:
!pip install checklist
!python -m spacy download en_core_web_sm
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
2023-04-27 16:45:09.856746: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-27 16:45:13.554232: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-04-27 16:45:13.554851: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See mor

### Dependencies

In [None]:
import spacy
import torch
import sys
import numpy as np
nlp = spacy.load('en_core_web_sm')
import pandas as pd
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
import checklist
from checklist.editor import Editor
from checklist.perturb import Perturb
from checklist.test_types import MFT, INV, DIR
from checklist.test_suite import TestSuite
from checklist.expect import Expect
import warnings
warnings.filterwarnings("ignore")

### Mount Drive

In [None]:
from google.colab import drive

drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


### Load Testset

In [None]:
dataset = pd.read_csv('drive/MyDrive/nlp_project/datasets/senti_test.csv')
data = list(nlp.pipe(dataset['text']))
data[0:5]

[@stellargirl I loooooooovvvvvveee my Kindle2. Not that the DX is cool, but the 2 is fantastic in its own right.,
 Reading my kindle2...  Love it... Lee childs is good read.,
 Ok, first assesment of the #kindle2 ...it fucking rocks!!!,
 @kenburbary You'll love your Kindle2. I've had mine for a few months and never looked back. The new big one is huge! No need for remorse! :),
 @mikefish  Fair enough. But i have the Kindle2 and I think it's perfect  :)]

### Models

#### Centralized BERT

In [None]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("drive/MyDrive/nlp_project/models/centralized_bert/",local_files_only=True)
pipe = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer, framework="pt", device=0)

#### Federated BERT

In [None]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("drive/MyDrive/nlp_project/models/federated_bert/",local_files_only=True)
federated_pipe = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer, framework="pt", device=0)

### Checklist

In [None]:
suite = TestSuite()
editor = Editor()

#### Minimum Functionality Test (MFT) - Negation

In [None]:
', '.join(editor.suggest('This is not {a:mask} {thing}.', thing=['book', 'movie', 'show', 'game'])[:30])

'easy, academic, ordinary, educational, average, enjoyable, entertaining, interesting, old, independent, good, art, exciting, original, ideal, innocent, excellent, adventure, amateur, awards, actual, introductory, engaging, obscure, amazing, bad, experimental, accessible, awful, great'

In [None]:
pos = ['good', 'enjoyable', 'exciting', 'excellent', 'amazing', 'great', 'engaging']
neg = ['bad', 'terrible', 'awful', 'horrible']

In [None]:
ret = editor.template('This is not {a:pos} {mask}.', pos=pos, labels=0, save=True, nsamples=200)
ret += editor.template('This is not {a:neg} {mask}.', neg=neg, labels=1, save=True, nsamples=200)

In [None]:
print(ret.data[0])
print(ret.data[201])

This is not an amazing option.
This is not a terrible analogy.


In [None]:
test = MFT(ret.data, labels=ret.labels, name='Simple negation',
           capability='Negation', description='Very simple negations.')

In [None]:
suite.add(test, 'simple negations: negative', 'Negation', 'Very simple negations of positive statements')

#### Invariance tests

##### Perturbing Names

In [None]:
t = Perturb.perturb(data, Perturb.change_names)
test = INV(**t,name='Perturbing Names',capability='Robustness')

In [None]:
t.data[0]

['Reading my kindle2...  Love it... Lee childs is good read.',
 'Reading my kindle2...  Love it... Charles childs is good read.',
 'Reading my kindle2...  Love it... Juan childs is good read.',
 'Reading my kindle2...  Love it... Thomas childs is good read.',
 'Reading my kindle2...  Love it... Julian childs is good read.',
 'Reading my kindle2...  Love it... Stephen childs is good read.',
 'Reading my kindle2...  Love it... David childs is good read.',
 'Reading my kindle2...  Love it... Isaac childs is good read.',
 'Reading my kindle2...  Love it... Jason childs is good read.',
 'Reading my kindle2...  Love it... Jeffrey childs is good read.',
 'Reading my kindle2...  Love it... Nathaniel childs is good read.']

In [None]:
suite.add(test,'Perturbing Names','Robustness','changing names')

In [None]:
ret = editor.template('{male} reads kindle. He likes to read.... loves it!', labels=1, save=True, nsamples=200)
ret += editor.template('{female} reads kindle. She likes to read.... loves it!', labels=1, save=True, nsamples=200)
ret += editor.template('{female1} reads kindle. {female2} likes to read.... loves it!', labels=1, save=True, nsamples=200)
ret += editor.template('{female1} reads kindle. {female2} hates to read.', labels=0, save=True, nsamples=200)

In [None]:
t = Perturb.perturb(list(nlp.pipe(list(ret.data))), Perturb.change_names)
test = INV(**t,name='Perturbing Additional Names',capability='Robustness')

In [None]:
t.data[0]

['Jack reads kindle. He likes to read.... loves it!',
 'Alex reads kindle. He likes to read.... loves it!',
 'Henry reads kindle. He likes to read.... loves it!',
 'Jordan reads kindle. He likes to read.... loves it!',
 'Joshua reads kindle. He likes to read.... loves it!',
 'Lucas reads kindle. He likes to read.... loves it!',
 'Connor reads kindle. He likes to read.... loves it!',
 'Austin reads kindle. He likes to read.... loves it!',
 'Isaac reads kindle. He likes to read.... loves it!',
 'Jack reads kindle. He likes to read.... loves it!',
 'Jason reads kindle. He likes to read.... loves it!']

In [None]:
suite.add(test,'Perturbing Additional Names','Robustness','changing more names')

##### Perturbing Locations

In [None]:
t = Perturb.perturb(data, Perturb.change_location)
test = INV(**t,name='Perturbing Locations',capability='Robustness')

In [None]:
t.data[0]

["glad i didnt do Bay to Breakers today, it's 1000 freaking degrees in San Francisco wtf",
 "glad i didnt do Bay to Breakers today, it's 1000 freaking degrees in New York wtf",
 "glad i didnt do Bay to Breakers today, it's 1000 freaking degrees in Gilbert wtf",
 "glad i didnt do Bay to Breakers today, it's 1000 freaking degrees in New Orleans wtf",
 "glad i didnt do Bay to Breakers today, it's 1000 freaking degrees in Nashville-Davidson wtf",
 "glad i didnt do Bay to Breakers today, it's 1000 freaking degrees in Bakersfield wtf",
 "glad i didnt do Bay to Breakers today, it's 1000 freaking degrees in Fremont wtf",
 "glad i didnt do Bay to Breakers today, it's 1000 freaking degrees in Winston-Salem wtf",
 "glad i didnt do Bay to Breakers today, it's 1000 freaking degrees in Lincoln wtf",
 "glad i didnt do Bay to Breakers today, it's 1000 freaking degrees in Albuquerque wtf",
 "glad i didnt do Bay to Breakers today, it's 1000 freaking degrees in Charlotte wtf"]

In [None]:
suite.add(test,'Perturbing Locations','Robustness','changing locations')

##### Adding Typos

In [None]:
t = Perturb.perturb(dataset['text'], Perturb.add_typos)
test = INV(**t,name='Add Typos',capability='Robustness')

In [None]:
t.data[0]

['@stellargirl I loooooooovvvvvveee my Kindle2. Not that the DX is cool, but the 2 is fantastic in its own right.',
 '@stellargirl I loooooooovvvvvveee my Kidnle2. Not that the DX is cool, but the 2 is fantastic in its own right.']

In [None]:
suite.add(test,'Add Typos','Robustness','adding typos')

In [None]:
t = Perturb.perturb(dataset['text'], Perturb.add_typos,typos=2)
test = INV(**t,name='Add 2 Typos',capability='Robustness')

In [None]:
t.data[0]

['@stellargirl I loooooooovvvvvveee my Kindle2. Not that the DX is cool, but the 2 is fantastic in its own right.',
 '@stellargirl I loooooooovvvvvveee my Kindle2. Not that the DX is cool, but the 2 is fantsatic in its ownr ight.']

In [None]:
suite.add(test,'Add 2 Typos','Robustness','adding 2 typos')

##### Punctuation

In [None]:
t = Perturb.perturb(data, Perturb.punctuation)
test = INV(**t,name='Punctuation',capability='Robustness')

In [None]:
t.data[0:2]

[['@stellargirl I loooooooovvvvvveee my Kindle2. Not that the DX is cool, but the 2 is fantastic in its own right.',
  '@stellargirl I loooooooovvvvvveee my Kindle2. Not that the DX is cool, but the 2 is fantastic in its own right'],
 ['Reading my kindle2...  Love it... Lee childs is good read.',
  'Reading my kindle2...  Love it... Lee childs is good read']]

In [None]:
suite.add(test,'Punctuation','Robustness','strip and / or add punctuation')

##### Contractions

In [None]:
t = Perturb.perturb(dataset['text'], Perturb.contractions)
test = INV(t.data)

In [None]:
t.data[0]

["@kenburbary You'll love your Kindle2. I've had mine for a few months and never looked back. The new big one is huge! No need for remorse! :)",
 '@kenburbary You will love your Kindle2. I have had mine for a few months and never looked back. The new big one is huge! No need for remorse! :)']

In [None]:
suite.add(test, 'Contractions', 'Robustness', 'Contract or expand contractions, e.g. What is -> What\'s')

### Running Tests

#### Centralized BERT

In [None]:
def pred_and_conf(data):
    raw_preds = pipe(data)
    preds = np.array([ 0 if p["label"]=="negative" else 1 for p in raw_preds])
    pp = np.array([[p["score"], 1-p["score"]] if p["label"]=="negative" else [1-p["score"], p["score"]] for p in raw_preds])
    return preds, pp

In [None]:
pred_and_conf(['good','bad'])

(array([1, 0]),
 array([[0.09076554, 0.90923446],
        [0.9978283 , 0.0021717 ]]))

In [None]:
suite.run(pred_and_conf)

Running simple negations: negative
Predicting 400 examples
Running Perturbing Names
Predicting 374 examples
Running Perturbing Additional Names
Predicting 6985 examples
Running Perturbing Locations
Predicting 308 examples
Running Add Typos
Predicting 996 examples
Running Add 2 Typos
Predicting 996 examples
Running Punctuation
Predicting 1152 examples
Running Contractions
Predicting 210 examples


In [None]:
suite.summary()

Robustness

Perturbing Names
Test cases:      34
Fails (rate):    1 (2.9%)

Example fails:
0.1 Lawson to head Newedge Hong Kong http://bit.ly/xLQSD #business #china
1.0 Jesus to head Newedge Hong Kong http://bit.ly/xLQSD #business #china
0.6 Nathaniel to head Newedge Hong Kong http://bit.ly/xLQSD #business #china

----


Perturbing Additional Names
Test cases:      635
Fails (rate):    0 (0.0%)


Perturbing Locations
Test cases:      28
Fails (rate):    6 (21.4%)

Example fails:
0.2 myfoxdc Barrie Students Back from Trip to China: A Silver Spring high school's class trip to China has en.. http://tinyurl.com/nlhqba
0.8 myfoxdc Barrie Students Back from Trip to Ethiopia: A Silver Spring high school's class trip to Ethiopia has en.. http://tinyurl.com/nlhqba
0.7 myfoxdc Barrie Students Back from Trip to Sudan: A Silver Spring high school's class trip to Sudan has en.. http://tinyurl.com/nlhqba

----
0.8 Trouble in Iran, I see. Hmm. Iran. Iran so far away. #flockofseagullsweregeopoliticall

#### Federated BERT

In [None]:
def pred_and_conf(data):
    raw_preds = federated_pipe(data)
    preds = np.array([ 0 if p["label"]=="negative" else 1 for p in raw_preds])
    pp = np.array([[p["score"], 1-p["score"]] if p["label"]=="negative" else [1-p["score"], p["score"]] for p in raw_preds])
    return preds, pp

In [None]:
pred_and_conf(['good','bad'])

(array([1, 0]),
 array([[0.1482026 , 0.8517974 ],
        [0.99288392, 0.00711608]]))

In [None]:
suite.run(pred_and_conf,overwrite=True)

Running simple negations: negative
Predicting 400 examples
Running Perturbing Names
Predicting 374 examples
Running Perturbing Additional Names
Predicting 6985 examples
Running Perturbing Locations
Predicting 308 examples
Running Add Typos
Predicting 996 examples
Running Add 2 Typos
Predicting 996 examples
Running Punctuation
Predicting 1152 examples
Running Contractions
Predicting 210 examples


In [None]:
suite.summary()

Robustness

Perturbing Names
Test cases:      34
Fails (rate):    1 (2.9%)

Example fails:
0.4 Lawson to head Newedge Hong Kong http://bit.ly/xLQSD #business #china
0.9 Jesus to head Newedge Hong Kong http://bit.ly/xLQSD #business #china
0.6 Timothy to head Newedge Hong Kong http://bit.ly/xLQSD #business #china

----


Perturbing Additional Names
Test cases:      635
Fails (rate):    0 (0.0%)


Perturbing Locations
Test cases:      28
Fails (rate):    5 (17.9%)

Example fails:
0.2 Heading to San Francisco
0.6 Heading to Chesapeake

----
0.2 is in San Francisco at Bay to Breakers.
0.6 is in Irvine at Bay to Breakers.
0.5 is in Fort Worth at Bay to Breakers.

----
0.5 Rocawear Heads to China, Building 300 Stores  - http://tinyurl.com/nofet3
0.6 Rocawear Heads to Morocco, Building 300 Stores  - http://tinyurl.com/nofet3

----


Add Typos
Test cases:      498
Fails (rate):    32 (6.4%)

Example fails:
0.9 San Francisco today.  Any suggestions?
0.4 San Francisco today.  An ysuggestions?

--