# SRL Patient recognition experiments

In this notebook I carry out experiments to test whether two Semantic Role Labelling (SRL) systems can correctly identify patients in sentences with varying structures. This code was based on code provided by Pia Sommerauer.

In this code I load two models, namely the AllenNLP SRL model and the AllenNLP SRL BERT model. I create a variety of tets cases, for wich I evaluate the performance of the two models. All the test sentences are stored in a json file specified through the `test_sents_path` variable. The SRL predictions are stored in the json file specified through `srl_pred_path`, and similarly the SRL BERT predictions are stored at the path `bert_pred_path`.

### Recognizing locations as agents
In the news, location names like countries or capitols are often used as agents, to refer to the political power of that country, rather than to the country itself. This might be difficult for an SRL system if it uses Named Entity labels that recognize the names as locations. As the two models I test do not use this information directly, I expect the models can deal with this correctly.
* Location as location: 'The deal was made in Ukraine'
* Location as agent: 'Ukraine made a deal with Russia'


In [3]:
from allennlp_models.pretrained import load_predictor

In [4]:
import checklist
from checklist.editor import Editor
from checklist.perturb import Perturb
from checklist.test_types import MFT, INV, DIR
from checklist.expect import Expect

In [5]:
from checklist.pred_wrapper import PredictorWrapper

In [6]:
import json
import logging
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

In [7]:
from utils_functions import *

### Load the model

In [None]:
# load the regular SRL model
srl_predictor = load_predictor('structured-prediction-srl')
# load the SRL BERT model
srlbert_predictor = load_predictor('structured-prediction-srl-bert')

In [9]:
#functions to create model predictions for a list containing sentences
### added by pia, edited by Goya ###

def predict_srl(data):
    pred = []
    for d in data:
        pred.append(srl_predictor.predict(d))
    return pred


def predict_srlbert(data):
    pred = []
    for d in data:
        pred.append(srlbert_predictor.predict(d))
    return pred

predict_srl = PredictorWrapper.wrap_predict(predict_srl)
predict_srlbert = PredictorWrapper.wrap_predict(predict_srlbert)

### Define output file paths

In [25]:
#create lists to store test sentences and model predictions in 
test_data = []
SRLBERT_predictions = []
SRL_predictions = []

In [10]:
#define paths to output files
test_sents_path = './JSON_test_and_predict_files/test_data_location.json'
bert_pred_path = './JSON_test_and_predict_files/BERT_predictions_location.json'
srl_pred_path = './JSON_test_and_predict_files/SRL_predictions_location.json'

#set name of current capability
capability = 'location_recognition'

### Load Checklist tests (Load functions defined in utils)
Load functions to test whether the arguments are correctly classified

In [2]:
expect_arg0_verb0 = Expect.single(found_location_arg0_verb0)
expect_arg0_verb1 = Expect.single(found_location_arg0_verb1)
expect_argloc_verb1 = Expect.single(found_location_argloc_verb1)

In [33]:
# initialize editor object
editor = Editor()
countries = [country for country in editor.lexicons.country if ('-' not in country and ',' not in country)]
cities = [city for city in editor.lexicons.city if ('-' not in city and ',' not in city)]

## Tests
Country as location

In [34]:
#create samples
testcase_name = 'country_location'
t = editor.template("The deal was made in {country}", country=countries, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_argloc_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    1 (1.0%)

Example fails:
[ARG1: The deal] was [V: made] [ARGM-MNR: in Sierra Leone]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


Country as agent

In [35]:
#create samples
testcase_name = 'country_agent'
t = editor.template("{country} made the deal", country= countries, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg0_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    1 (1.0%)

Example fails:
[ARGM-MNR: Equatorial] [ARG0: Guinea] [V: made] [ARG1: the deal]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


City as agent

In [36]:
#create samples
testcase_name = 'city_agent'
t = editor.template("{city} made the deal", city=cities, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg0_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


City as location, country as agent

In [37]:
#create samples
testcase_name = 'city_location_country_agent'
t = editor.template("In {city} the deal with the president was made by {country}", city=cities, country=countries, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg0_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    1 (1.0%)

Example fails:
[ARGM-LOC: In Fort Wayne] [ARG1: the deal with the president] was [V: made] [ARGM-MNR: by Sierra Leone]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


### Store all data to JSON

In [39]:
#store the test sentences
store_data(test_sents_path, test_data, new_file=True)
#store the model predictions
store_data(bert_pred_path, SRLBERT_predictions, new_file=True)
store_data(srl_pred_path, SRL_predictions, new_file=True)