# SRL Manner recognition experiments

In this notebook I carry out experiments to test whether two Semantic Role Labelling (SRL) systems can correctly identify patients in sentences with varying structures. This code was based on code provided by Pia Sommerauer.

In this code I load two models, namely the AllenNLP SRL model and the AllenNLP SRL BERT model. I create a variety of tets cases, for wich I evaluate the performance of the two models. All the test sentences are stored in a json file specified through the `test_sents_path` variable. The SRL predictions are stored in the json file specified through `srl_pred_path`, and similarly the SRL BERT predictions are stored at the path `bert_pred_path`.

### Manner recognition
I test seven different sentence structures:
* `first_name` hit the ball `manner` : e.g. 'John hit the ball quickly'
* The `manner` manner in which `first_name` hit the ball was `opinion`: e.g. 'The powerful manner in which Steven hit the ball was impressive'
* `first_name` hit the ball in a `manner` manner : e.g. 'Kathy hit the ball in a soft manner' 
        
* `manner` `first_name` hit the ball : e.g. 'Kindly Louise hit the ball'
* Ever so `manner` `first_name` hit the ball: e.g. 'Ever so softly Edwin hit the ball'
* `manner`, it was undeniable, `first_name` hit the ball: e.g. 'Powerfully, it was undeniable, Beborah hit the ball'
* Ever so `manner`, it was undeniable, `first_name` hit the ball: e.g. 'Ever so blatantly, it was undeniable, Bobby hit the ball' 



In [1]:
from allennlp_models.pretrained import load_predictor

In [2]:
import checklist
from checklist.editor import Editor
from checklist.perturb import Perturb
from checklist.test_types import MFT, INV, DIR
from checklist.expect import Expect

In [3]:
from checklist.pred_wrapper import PredictorWrapper

In [4]:
import json
import logging
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

In [5]:
from utils_functions import *

### Load the models 

In [None]:
# load the regular SRL model
srl_predictor = load_predictor('structured-prediction-srl')
# load the SRL BERT model
srlbert_predictor = load_predictor('structured-prediction-srl-bert')

In [7]:
#functions to create model predictions for a list containing sentences
### added by pia, edited by Goya ###

def predict_srl(data):
    pred = []
    for d in data:
        pred.append(srl_predictor.predict(d))
    return pred


def predict_srlbert(data):
    pred = []
    for d in data:
        pred.append(srlbert_predictor.predict(d))
    return pred

predict_srl = PredictorWrapper.wrap_predict(predict_srl)
predict_srlbert = PredictorWrapper.wrap_predict(predict_srlbert)

### Define output file paths

In [8]:
#create lists to store test sentences and model predictions in 
test_data = []
SRLBERT_predictions = []
SRL_predictions = []

In [9]:
#define paths to output files
test_sents_path = './JSON_test_and_predict_files/test_data_manner.json'
bert_pred_path = './JSON_test_and_predict_files/BERT_predictions_manner.json'
srl_pred_path = './JSON_test_and_predict_files/SRL_predictions_manner.json'

#set name of current capability
capability = 'manner_recognition'

### Load Checklist tests (Load functions defined in utils)
Load functions to test whether the manner is correctly recognized

In [10]:
expect_argmnr_verb0 = Expect.single(found_arg_manner_verb0)
expect_mannerlong_verb0 = Expect.single(found_arg_mannerlong_verb0)
expect_inamanner_verb0 = Expect.single(found_arg_inamanner_verb0)
expect_eversomanner_verb0 = Expect.single(found_arg_eversomanner_verb0)
expect_eversomanner_verb1 = Expect.single(found_arg_eversomanner_verb1)
expect_argmnr_verb1 = Expect.single(found_arg_manner_verb1)

In [11]:
editor = Editor()

### Load wordlists to use in sample sentences

In [12]:
#lists of manner words to test
manner_adv = ['gently', 'softly', 'powerfully', 'wisely', 'quickly', 'slowly', 'patiently', 'tactically', 'generously', 'blatantly', 'kindly']
manner_adj = ['gentle', 'soft', 'powerful', 'wise', 'quick', 'slow', 'patient', 'tactical', 'generous', 'blatant', 'kind']
opinion = ['impressive', 'fascinating', 'mindblowing', 'sensational', 'boring', 'uninteresting', 'interesting']
#list of verbs
verbs = ['hit', 'kicked', 'stopped', 'touched', 'missed', 'smashed']

## Tests : Manner recognition

In [13]:
#create samples
testcase_name = 'final_position'
t = editor.template("{first_name} {verb} the ball {manner}", manner=manner_adv, verb=verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_argmnr_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    8 (8.0%)

Example fails:
Lauren [V: touched] [ARG1: the ball] tactically
----
[ARG0: Ian] [V: missed] [ARG1: the ball tactically]
----
[ARG0: Rebecca] [V: stopped] [ARG1: the ball tactically]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


In [14]:
#create samples
testcase_name = 'the_manner_in_which'
t = editor.template("The {manner} manner in which {first_name} {verb} the ball was {opinion}", manner=manner_adj, verb=verbs, opinion=opinion, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_mannerlong_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    13 (13.0%)

Example fails:
[ARGM-LOC: The gentle manner] [R-ARGM-LOC: in which] [ARG0: Francis] [V: kicked] [ARG1: the ball] was uninteresting
----
[ARGM-MNR: The quick manner] [R-ARGM-MNR: in which] [ARGM-MNR: Harriet] [V: hit] [ARG1: the ball] was uninteresting
----
The gentle manner [R-ARGM-LOC: in which] [ARG0: Pamela] [V: missed] the ball was sensational
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


In [15]:
#create samples
testcase_name = 'in_a_manner'
t = editor.template("{first_name} {verb} the ball in a {manner} manner ", manner=manner_adj, verb=verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_inamanner_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    1 (1.0%)

Example fails:
[ARGM-MNR: Alex] [V: touched] [ARG1: the ball] [ARGM-MNR: in a powerful manner]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


In [16]:
#create samples
testcase_name = 'first_position'
t = editor.template("{manner} {first_name} {verb} the ball", manner=manner_adv, verb=verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_argmnr_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    21 (21.0%)

Example fails:
[ARGM-MNR: generously Catherine] [V: smashed] [ARG1: the ball]
----
[ARGM-MNR: kindly Anna] [V: touched] [ARG1: the ball]
----
[ARG0: blatantly Ruth] [V: stopped] [ARG1: the ball]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    19 (19.0%)

Example fails:
[ARG0: tactically Grace] [V: smashed] [ARG1: the ball]
----
[ARGM-ADV: generously] [ARG0: Patricia] [V: smashed] [ARG1: the ball]
----
[ARGM-ADV: generously] [ARG0: Kathleen] [V: smashed] [ARG1: the ball]
----


In [17]:
#create samples
testcase_name = 'first_position_comma'
t = editor.template("{manner}, {first_name} {verb} the ball", manner=manner_adv, verb=verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_argmnr_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    25 (25.0%)

Example fails:
[ARGM-ADV: powerfully] , [ARG0: Arthur] [V: smashed] [ARG1: the ball]
----
[ARGM-DIS: tactically] , [ARG0: Ian] [V: missed] [ARG1: the ball]
----
[ARGM-ADV: wisely] , [ARG0: Catherine] [V: kicked] [ARG1: the ball]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    6 (6.0%)

Example fails:
[ARGM-ADV: generously] , [ARG0: Thomas] [V: kicked] [ARG1: the ball]
----
[ARGM-ADV: generously] , [ARG0: Frank] [V: missed] [ARG1: the ball]
----
[ARGM-ADV: generously] , [ARG0: Bobby] [V: hit] [ARG1: the ball]
----


In [18]:
#create samples
testcase_name = 'ever_so_manner'
t = editor.template("Ever so {manner} {first_name} {verb} the ball", manner=manner_adv, verb=verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_eversomanner_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    74 (74.0%)

Example fails:
Ever so [ARGM-MNR: blatantly] [ARG0: Tony] [V: touched] [ARG1: the ball]
----
Ever so [ARGM-MNR: blatantly] [ARG0: Michelle] [V: stopped] [ARG1: the ball]
----
[ARGM-DIS: Ever] [ARGM-DIS: so] [ARG0: tactically Matt] [V: missed] [ARG1: the ball]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    10 (10.0%)

Example fails:
[ARGM-ADV: Ever so tactically] [ARG0: Greg] [V: missed] [ARG1: the ball]
----
[ARG0: Ever so powerfully Frank] [V: hit] [ARG1: the ball]
----
[ARGM-ADV: Ever so tactically] [ARG0: Matt] [V: missed] [ARG1: the ball]
----


In [19]:
#create samples
testcase_name = 'ever_so_manner_comma'
t = editor.template("Ever so {manner}, {first_name} {verb} the ball", manner=manner_adv, verb=verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_eversomanner_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    72 (72.0%)

Example fails:
[ARGM-MNR: Ever] [R-ARG0: so slowly] , [ARG0: Donna] [V: missed] [ARG1: the ball]
----
[ARGM-TMP: Ever] [R-ARG0: so softly] , [ARG0: Francis] [V: smashed] [ARG1: the ball]
----
[R-ARG0: Ever so kindly] , [ARG0: Edith] [V: stopped] [ARG1: the ball]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    3 (3.0%)

Example fails:
[R-ARG0: Ever so generously] , [ARG0: David] [V: kicked] [ARG1: the ball]
----
[R-ARG0: Ever so tactically] , [ARG0: Philip] [V: kicked] [ARG1: the ball]
----
[ARGM-ADV: Ever so tactically] , [ARG0: Maria] [V: kicked] [ARG1: the ball]
----


In [20]:
#create samples
testcase_name = 'first_position_long_distance'
t = editor.template("{manner}, it was undeniable, {first_name} {verb} the ball", manner=manner_adv, verb=verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_argmnr_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    100 (100.0%)

Example fails:
tactically , it was undeniable , [ARGM-MNR: Andrea] [V: hit] [ARG1: the ball]
----
softly , it was undeniable , [ARG0: Ralph] [V: stopped] [ARG1: the ball]
----
generously , it was undeniable , [ARG0: Mark] [V: stopped] [ARG1: the ball]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    83 (83.0%)

Example fails:
patiently , it was undeniable , [ARG0: Christine] [V: kicked] [ARG1: the ball]
----
patiently , it was undeniable , [ARG0: Jason] [V: smashed] [ARG1: the ball]
----
softly , it was undeniable , [ARG0: Mary] [V: hit] [ARG1: the ball]
----


In [21]:
#create samples
testcase_name = 'ever_so_manner_long_distance'
t = editor.template("Ever so {manner}, it was undeniable, {first_name} {verb} the ball", manner=manner_adv, verb=verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_eversomanner_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    98 (98.0%)

Example fails:
Ever so tactically , it was undeniable , [ARG0: Ben] [V: missed] [ARG1: the ball]
----
Ever so tactically , it was undeniable , [ARG0: Jean] [V: touched] [ARG1: the ball]
----
Ever so kindly , it was undeniable , [ARG0: Benjamin] [V: kicked] [ARG1: the ball]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    51 (51.0%)

Example fails:
Ever so quickly , it was undeniable , [ARG0: Sara] [V: kicked] [ARG1: the ball]
----
Ever so tactically , it was undeniable , [ARG0: Jeff] [V: kicked] [ARG1: the ball]
----
Ever so patiently , it was undeniable , [ARG0: Kathryn] [V: touched] [ARG1: the ball]
----


### Store all data to JSON

In [22]:
#store the test sentences
store_data(test_sents_path, test_data, new_file=True)
#store the model predictions
store_data(bert_pred_path, SRLBERT_predictions, new_file=True)
store_data(srl_pred_path, SRL_predictions, new_file=True)