# SRL Patient recognition experiments

In this notebook I carry out experiments to test whether two Semantic Role Labelling (SRL) systems can correctly identify patients in sentences with varying structures. This code was based on code provided by Pia Sommerauer.

In this code I load two models, namely the AllenNLP SRL model and the AllenNLP SRL BERT model. I create a variety of tets cases, for wich I evaluate the performance of the two models. All the test sentences are stored in a json file specified through the `test_sents_path` variable. The SRL predictions are stored in the json file specified through `srl_pred_path`, and similarly the SRL BERT predictions are stored at the path `bert_pred_path`.

### Distance  Agent recognition
In this notebook, I evaluate whether the two models can identify agents equally well if the agent is close to or further away from the predicate. I try both an active and a passive formulation. 
*  Active, small distance: "John smashed the ball", 
*  Active, large distance: "Frederick, after nearly falling down, finally missed the ball", 
*  Passive, small distance: "The bal was kicked by Jennifer", 
*  Passive, large distance: "The ball was kicked after a boring match by Evelyn" 

### Import libraries

In [1]:
from allennlp_models.pretrained import load_predictor

In [2]:
import checklist
from checklist.editor import Editor
from checklist.perturb import Perturb
from checklist.test_types import MFT, INV, DIR
from checklist.expect import Expect

In [3]:
from checklist.pred_wrapper import PredictorWrapper

In [4]:
import json
import logging
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

In [8]:
from utils_functions import *

### Load the AllenNLP models

In [None]:
# load the regular SRL model
srl_predictor = load_predictor('structured-prediction-srl')
# load the SRL BERT model
srlbert_predictor = load_predictor('structured-prediction-srl-bert')

In [6]:
#functions to create model predictions for a list containing sentences
### added by pia, edited by Goya ###

def predict_srl(data):
    pred = []
    for d in data:
        pred.append(srl_predictor.predict(d))
    return pred


def predict_srlbert(data):
    pred = []
    for d in data:
        pred.append(srlbert_predictor.predict(d))
    return pred

predict_srl = PredictorWrapper.wrap_predict(predict_srl)
predict_srlbert = PredictorWrapper.wrap_predict(predict_srlbert)

### Define output file paths

In [7]:
#create lists to store test sentences and model predictions in 
test_data = []
SRLBERT_predictions = []
SRL_predictions = []

In [5]:
#define paths to output files
test_sents_path = './JSON_test_and_predict_files/test_data_agent.json'
bert_pred_path = './JSON_test_and_predict_files/BERT_predictions_agent.json'
srl_pred_path = './JSON_test_and_predict_files/SRL_predictions_agent.json'

#set name of current capability
capability = 'agent_small_long_recognition'

### Load Checklist tests (Load functions defined in utils)
Load functions to test whether names are recognized as agents

In [10]:
expect_arg0_verb0 = Expect.single(found_arg0_verb0)
expect_arg0_verb1 = Expect.single(found_arg0_verb1)
expect_byarg0_verb1 = Expect.single(found_byarg0_verb1)

### Load wordlists to use in sample sentences

In [11]:
# initialize editor object
editor = Editor()
# verbs
verbs = ['hit', 'kicked', 'stopped', 'touched', 'missed', 'smashed']
# lists of sentence fillers to increase the distance between the agent and the predicate
# for active sentences
precedents = ['nearly falling down', 'missing the past three games', 'celebrating a perfect streak', 'suffering from a knee injury', 'appearing so fit']
# for passive sentences
ball_precedents = ['lying there for a while', 'a boring match', 'three nerve-wrecking minutes', 'some time']

## Tests: Agent recognition - small and large predicate distance

### Active
small distance

In [12]:
#create samples
testcase_name = 'active_small_distance'
t = editor.template("{first_name} {verb} the ball", verb=verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg0_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    8 (8.0%)

Example fails:
[ARG2: Jack] [V: hit] [ARG1: the ball]
----
Diane [V: hit] [ARG1: the ball]
----
[ARGM-DIS: Andrea] [V: hit] [ARG1: the ball]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


Large distance

In [13]:
#create samples
testcase_name = 'active_large_distance'
t = editor.template("{first_name}, after {filler}, finnaly {verb} the ball", verb=verbs, filler=precedents, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg0_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    100 (100.0%)

Example fails:
[ARGM-TMP: Dave] , [ARGM-TMP: after missing the past three games] , [ARG0: finnaly] [V: kicked] [ARG1: the ball]
----
[ARG0: Joan] , [ARGM-TMP: after nearly falling down] , [ARG0: finnaly] [V: kicked] [ARG1: the ball]
----
[ARGM-DIS: Patricia] , [ARGM-TMP: after nearly falling down] , [ARG0: finnaly] [V: kicked] [ARG1: the ball]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    100 (100.0%)

Example fails:
[ARGM-DIS: Nancy] , [ARGM-TMP: after celebrating a perfect streak] , [ARG0: finnaly] [V: hit] [ARG1: the ball]
----
[ARGM-DIS: Emily] , [ARGM-TMP: after suffering from a knee injury] , [ARGM-MNR: finnaly] [V: smashed] [ARG1: the ball]
----
[ARGM-DIS: Chris] , [ARGM-TMP: after celebrating a perfect streak] , [ARG0: finnaly] [V: hit] [ARG1: the ball]
----


### Passive
Small distance

In [14]:
#create samples
testcase_name = 'passive_small_distance'
t = editor.template("The ball was {verb} by {first_name}", verb=verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_byarg0_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


Large distance

In [15]:
#create samples
testcase_name = 'passive_large_distance'
t = editor.template("The ball was {verb} after {precedent} by {first_name}", verb=verbs, precedent=ball_precedents, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_byarg0_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    96 (96.0%)

Example fails:
[ARG1: The ball] was [V: hit] [ARGM-TMP: after some time] [ARG2: by Amanda]
----
[ARG1: The ball] was [V: missed] [ARGM-TMP: after three nerve - wrecking minutes by Alice]
----
[ARG1: The ball] was [V: stopped] [ARGM-TMP: after three nerve - wrecking minutes by Howard]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    29 (29.0%)

Example fails:
[ARG1: The ball] was [V: smashed] [ARGM-TMP: after a boring match by Katie]
----
[ARG1: The ball] was [V: smashed] [ARGM-TMP: after lying there for a while by Katherine]
----
[ARG1: The ball] was [V: stopped] [ARGM-TMP: after a boring match by Sally]
----


In [16]:
len(test_data)

400

### Store all data to JSON

In [17]:
#store the test sentences
store_data(test_sents_path, test_data, new_file=True)
#store the model predictions
store_data(bert_pred_path, SRLBERT_predictions, new_file=True)
store_data(srl_pred_path, SRL_predictions, new_file=True)