# SRL negation experiments

In this notebook I carry out experiments to test whether two Semantic Role Labelling (SRL) systems can correctly identify patients in sentences with varying structures. This code was based on code provided by Pia Sommerauer.

In this code I load two models, namely the AllenNLP SRL model and the AllenNLP SRL BERT model. I create a variety of tets cases, for wich I evaluate the performance of the two models. All the test sentences are stored in a json file specified through the `test_sents_path` variable. The SRL predictions are stored in the json file specified through `srl_pred_path`, and similarly the SRL BERT predictions are stored at the path `bert_pred_path`.

### Negation - invariance test
* Agent: '`first_name` did `activity`' vs. '`first_name` did not do `activity`'
* Patient: '`name1` hit `name2` yesterday' vs. '`name1` didn't hit `name2` yesterday
* Instrument: `name1` killed `name2` with a `instrument` vs. `name1` shouldn't kill `name2` with a `instrument`
* Location: `name1` hit `name2` `location` vs. `name1` wouldn't hit `name2` `location`
* Manner: `name1` stopped the ball `manner` vs. `name1` could not stop the ball `manner`


In [1]:
from allennlp_models.pretrained import load_predictor

In [2]:
import checklist
from checklist.editor import Editor
from checklist.perturb import Perturb
from checklist.test_types import MFT, INV, DIR
from checklist.expect import Expect

In [3]:
from checklist.pred_wrapper import PredictorWrapper

In [4]:
import json
import logging
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

In [5]:
from utils_functions import *

### Load the models 

In [None]:
# load the regular SRL model
srl_predictor = load_predictor('structured-prediction-srl')
# load the SRL BERT model
srlbert_predictor = load_predictor('structured-prediction-srl-bert')

In [7]:
#functions to create model predictions for a list containing sentences
### added by pia, edited by Goya ###

def predict_srl(data):
    pred = []
    for d in data:
        pred.append(srl_predictor.predict(d))
    return pred


def predict_srlbert(data):
    pred = []
    for d in data:
        pred.append(srlbert_predictor.predict(d))
    return pred

predict_srl = PredictorWrapper.wrap_predict(predict_srl)
predict_srlbert = PredictorWrapper.wrap_predict(predict_srlbert)

### Define output file paths

In [8]:
#create lists to store test sentences and model predictions in 
test_data = []
SRLBERT_predictions = []
SRL_predictions = []

In [9]:
#define paths to output files
test_sents_path = './JSON_test_and_predict_files/test_data_negation.json'
bert_pred_path = './JSON_test_and_predict_files/BERT_predictions_negation.json'
srl_pred_path = './JSON_test_and_predict_files/SRL_predictions_negation.json'

#set name of current capability
capability = 'negation'

### Load Checklist tests (Load functions defined in utils)
Load functions to test arguments are correctly identified

In [10]:
expect_arg0_verb0 = Expect.single(found_arg0_verb0)
expect_arg0_verb1 = Expect.single(found_arg0_verb1)
expect_arg1_verb0 = Expect.single(found_arg1_verb0)
expect_arg1_verb1 = Expect.single(found_arg1_verb1)
expect_arg2_verb0 = Expect.single(found_arg2_verb0)
expect_arg2_verb1 = Expect.single(found_arg2_verb1)
expect_argloc_verb0 = Expect.single(found_argloc_verb0)
expect_argloc_verb1 = Expect.single(found_argloc_verb1)
expect_argmnr_verb0 = Expect.single(found_arg_manner_verb0)
expect_argmnr_verb1 = Expect.single(found_arg_manner_verb1)

### Load wordlists to use in sample sentences

In [11]:
# initialize editor object
editor = Editor()

#negation words
neg = ["did not", "would not", "should not", "could not", "does not", "doesn't", "didn't", "wouldn't", "shouldn't", "couldn't"]
#activities
activity = ['does the dishes', 'attends the party', 'prepares dinner', 'makes breakfast', 'hosts the event', 'takes the picture', 'watches tv all day']
neg_activity = ['do the dishes', 'attend the party', 'prepare dinner', 'make breakfast', 'host the event', 'take the picture', 'watch tv all day']

# a list of verbs to use in the test cases
patient_verbs = ['kissed', 'killed', 'hurt', 'touched', 'ignored', 'silenced', 'hit', 'greeted']
patient_neg_verbs = ['kiss', 'kill', 'hurt', 'touch', 'ignore', 'silence', 'hit', 'greet']
#names
english_firstname = editor.lexicons.female_from.United_Kingdom + editor.lexicons.male_from.United_Kingdom
#instruments
instrument = ['knife', 'stone', 'bottle', 'table', 'chair', 'fist', 'rollerblade', 'shoelace', 'discoball', 'fork', 'racket']
#locations
locations = ['in the kitchen', 'in the hallway', 'at the busstop', 'at university', 'on the street', 'in the supermarket', 'on the balcony', 'at the theatre', 'in the museum', 'on the roof']
#lists of manner words to test
manner_adv = ['gently', 'softly', 'powerfully', 'wisely', 'quickly', 'slowly', 'patiently', 'tactically', 'generously', 'blatantly', 'kindly']
manner_verbs = ['hit', 'kicked', 'stopped', 'touched', 'missed', 'smashed']
manner_verbs_neg = ['hit', 'kick', 'stop', 'touch', 'miss', 'smash']

## Tests
### Agent recognition  invariance

In [12]:
#create samples
testcase_name = 'agent_base'
t = editor.template("{first_name} {activity}", activity=activity, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg0_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    4 (4.0%)

Example fails:
[ARG0: Carolyn] [V: does] [ARG0: the dishes]
----
[ARG0: Keith] [V: does] [ARG0: the dishes]
----
[ARGM-DIS: Dan] [V: does] [ARG1: the dishes]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


In [13]:
#create samples
testcase_name = 'agent_negated'
t = editor.template("{first_name} {neg} {activity}", activity=neg_activity, neg=neg, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg0_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    2 (2.0%)

Example fails:
[ARGM-DIS: Dan] did [ARGM-NEG: n't] [V: do] [ARG1: the dishes]
----
[ARGM-DIS: Andrea] did [ARGM-NEG: n't] [V: host] [ARG1: the event]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    1 (1.0%)

Example fails:
[ARGM-DIS: Andrea] did [ARGM-NEG: n't] [V: host] [ARG1: the event]
----


### Patient recognition

In [14]:
#create samples
testcase_name = 'patient_base'
t = editor.template("{first_name} {verb} {first} yesterday.", first=english_firstname, verb=patient_verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg1_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    3 (3.0%)

Example fails:
[ARG1: Chris] [V: hit] [ARG2: Norman] [ARGM-TMP: yesterday] .
----
[ARG0: Philip] [V: hurt] [ARGM-TMP: Judith] [ARGM-TMP: yesterday] .
----
[ARG1: Annie] [V: greeted] [ARG2: Pamela] [ARGM-TMP: yesterday] .
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


In [15]:
#create samples
testcase_name = 'patient_negated'
t = editor.template("{first_name} {neg} {verb} {first}.", first=english_firstname, neg=neg, verb=patient_neg_verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg1_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    1 (1.0%)

Example fails:
[ARG0: Albert] [ARGM-MOD: would] [ARGM-NEG: n't] [V: kiss] [ARGM-EXT: Norman] .
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


### Instrument recognition

In [16]:
#create samples
testcase_name = 'instrument_base'
t = editor.template("{first_name} killed {firstname} with a {instrument}.", instrument=instrument, firstname=english_firstname, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg2_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    47 (47.0%)

Example fails:
[ARG0: Jim] [V: killed] [ARG1: Albert] [ARGM-MNR: with a knife] .
----
[ARG0: Sandra] [V: killed] [ARG1: Laura] [ARGM-MNR: with a bottle] .
----
[ARG0: Al] [V: killed] [ARG1: Sara] [ARGM-MNR: with a discoball] .
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    32 (32.0%)

Example fails:
[ARG0: Helen] [V: killed] Rose with a discoball .
----
[ARG0: Andrew] [V: killed] [ARG1: Tom] [ARGM-MNR: with a table] .
----
[ARG0: Melissa] [V: killed] [ARG1: Rose] with a table .
----


In [17]:
#create samples
testcase_name = 'instrument_negated'
t = editor.template("{first_name} {neg} kill {firstname} with a {instrument}.", neg=neg, instrument=instrument, firstname=english_firstname, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg2_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    55 (55.0%)

Example fails:
[ARG0: Jonathan] did [ARGM-NEG: n't] [V: kill] [ARG1: Margaret] [ARGM-MNR: with a fork] .
----
[ARG0: Deborah] did [ARGM-NEG: not] [V: kill] [ARG1: Hugh] [ARGM-MNR: with a fist] .
----
[ARG0: Diana] does [ARGM-NEG: not] [V: kill] [ARG1: Nigel] [ARGM-MNR: with a fist] .
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    3 (3.0%)

Example fails:
[ARG0: Matthew] [ARGM-MOD: should] [ARGM-NEG: not] [V: kill] [ARG1: Caroline] [ARGM-MNR: with a racket] .
----
[ARG0: Amy] [ARGM-MOD: could] [ARGM-NEG: not] [V: kill] [ARG1: Gordon] [ARGM-MNR: with a racket] .
----
[ARG0: Stephen] did [ARGM-NEG: n't] [V: kill] [ARG1: Patricia] [ARGM-MNR: with a racket] .
----


### Location 

In [18]:
#create samples
testcase_name = 'location_base'
t = editor.template("{first_name} {verb} {firstname} {location}.", verb=patient_verbs, firstname=english_firstname, location=locations, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_argloc_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    10 (10.0%)

Example fails:
[ARG0: Ellen] [V: touched] [ARG1: Adam] [ARG2: on the street] .
----
[ARG0: Colin] [V: touched] [ARG1: Bobby] [ARG2: on the roof] .
----
[ARG0: Patricia] [V: kissed] [ARGM-PRD: Charlotte on the balcony] .
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


In [19]:
#create samples
testcase_name = 'location_negated'
t = editor.template("{first_name} {neg} {verb} {firstname} {location}.", neg=neg, verb=patient_neg_verbs, firstname=english_firstname, location=locations, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_argloc_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    23 (23.0%)

Example fails:
[ARG0: Greg] does [ARGM-NEG: not] [V: touch] [ARG1: Kathleen] [ARG2: at the busstop] .
----
[ARG0: Virginia] [ARGM-MOD: would] [ARGM-NEG: n't] [V: ignore] [ARG1: Martin in the kitchen] .
----
[ARG0: Anna] [ARGM-MOD: would] [ARGM-NEG: n't] [V: ignore] [ARG1: Jonathan in the hallway] .
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


### Manner

In [20]:
#create samples
testcase_name = 'manner_base'
t = editor.template("{first_name} {verb} the ball {manner}", manner=manner_adv, verb=manner_verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_argmnr_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    12 (12.0%)

Example fails:
[ARG0: Sara] [V: kicked] [ARG1: the ball] [ARGM-PRD: tactically]
----
[ARG0: Kim] [V: stopped] [ARG1: the ball tactically]
----
[ARG0: Patrick] [V: missed] [ARG1: the ball tactically]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


In [21]:
#create samples
testcase_name = 'manner_negated'
t = editor.template("{first_name} {neg} {verb} the ball {manner}", neg=neg, manner=manner_adv, verb=manner_verbs_neg, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_argmnr_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    8 (8.0%)

Example fails:
[ARGM-MNR: Kathy] [ARGM-MOD: should] [ARGM-NEG: not] [V: smash] [ARG1: the ball] [ARGM-MNR: kindly]
----
[ARG0: Matthew] does [ARGM-NEG: n't] [V: miss] [ARG1: the ball tactically]
----
[ARG0: Arthur] [ARGM-MOD: should] [ARGM-NEG: n't] [V: hit] [ARG1: the ball] tactically
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


### Store all data to JSON

In [22]:
#store the test sentences
store_data(test_sents_path, test_data, new_file=True)
#store the model predictions
store_data(bert_pred_path, SRLBERT_predictions, new_file=True)
store_data(srl_pred_path, SRL_predictions, new_file=True)