# SRL negation experiments

In this notebook I carry out experiments to test whether two Semantic Role Labelling (SRL) systems can correctly identify patients in sentences with varying structures. This code was based on code provided by Pia Sommerauer.

In this code I load two models, namely the AllenNLP SRL model and the AllenNLP SRL BERT model. I create a variety of tets cases, for wich I evaluate the performance of the two models. All the test sentences are stored in a json file specified through the `test_sents_path` variable. The SRL predictions are stored in the json file specified through `srl_pred_path`, and similarly the SRL BERT predictions are stored at the path `bert_pred_path`.

### Negation - invariance test
* Agent: '`first_name` did `activity`' vs. '`first_name` did not do `activity`'
* Patient: '`name1` hit `name2` yesterday' vs. '`name1` didn't hit `name2` yesterday
* Instrument: `name1` killed `name2` with a `instrument` vs. `name1` shouldn't kill `name2` with a `instrument`
* Location: `name1` hit `name2` `location` vs. `name1` wouldn't hit `name2` `location`
* Manner: `name1` stopped the ball `manner` vs. `name1` could not stop the ball `manner`


In [1]:
from allennlp_models.pretrained import load_predictor

In [2]:
import checklist
from checklist.editor import Editor
from checklist.perturb import Perturb
from checklist.test_types import MFT, INV, DIR
from checklist.expect import Expect

In [3]:
from checklist.pred_wrapper import PredictorWrapper

In [4]:
import json
import logging
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

In [5]:
from utils_functions import *

### Load the models 

In [6]:
# load the regular SRL model
srl_predictor = load_predictor('structured-prediction-srl')
# load the SRL BERT model
srlbert_predictor = load_predictor('structured-prediction-srl-bert')

2022-03-28 12:51:01,239 - INFO - allennlp.common.plugins - Plugin allennlp_models available
2022-03-28 12:51:01,385 - INFO - allennlp.common.plugins - Plugin allennlp_semparse available
2022-03-28 12:51:01,559 - INFO - allennlp.common.plugins - Plugin allennlp_server available
2022-03-28 12:51:01,591 - INFO - allennlp.common.params - id = pair-classification-esim
2022-03-28 12:51:01,591 - INFO - allennlp.common.params - registered_model_name = esim
2022-03-28 12:51:01,592 - INFO - allennlp.common.params - model_class = None
2022-03-28 12:51:01,593 - INFO - allennlp.common.params - registered_predictor_name = None
2022-03-28 12:51:01,593 - INFO - allennlp.common.params - display_name = Enhanced LSTM for Natural Language Inference
2022-03-28 12:51:01,594 - INFO - allennlp.common.params - task_id = textual_entailment
2022-03-28 12:51:01,595 - INFO - allennlp.common.params - model_usage.archive_file = esim-elmo-2020.11.11.tar.gz
2022-03-28 12:51:01,595 - INFO - allennlp.common.params - mod

2022-03-28 12:51:01,712 - INFO - allennlp.common.params - metrics.decision_thresholds = None
2022-03-28 12:51:01,713 - INFO - allennlp.common.params - metrics.variation_approaches = None
2022-03-28 12:51:01,714 - INFO - allennlp.common.params - evaluation_data.dataset = None
2022-03-28 12:51:01,714 - INFO - allennlp.common.params - evaluation_data.motivation = None
2022-03-28 12:51:01,715 - INFO - allennlp.common.params - evaluation_data.preprocessing = None
2022-03-28 12:51:01,716 - INFO - allennlp.common.params - training_data.dataset.name = DROP
2022-03-28 12:51:01,717 - INFO - allennlp.common.params - training_data.dataset.url = https://allennlp.org/drop
2022-03-28 12:51:01,718 - INFO - allennlp.common.params - training_data.motivation = None
2022-03-28 12:51:01,719 - INFO - allennlp.common.params - training_data.preprocessing = None
2022-03-28 12:51:01,720 - INFO - allennlp.common.params - quantitative_analyses.unitary_results = None
2022-03-28 12:51:01,720 - INFO - allennlp.commo

2022-03-28 12:51:01,865 - INFO - allennlp.common.params - model_details.version = 1
2022-03-28 12:51:01,865 - INFO - allennlp.common.params - model_details.model_type = RoBERTa large
2022-03-28 12:51:01,866 - INFO - allennlp.common.params - model_details.paper.citation = 
@article{Liu2019RoBERTaAR,
title={RoBERTa: A Robustly Optimized BERT Pretraining Approach},
author={Y. Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and M. Lewis and Luke Zettlemoyer and Veselin Stoyanov},
journal={ArXiv},
year={2019},
volume={abs/1907.11692}}

2022-03-28 12:51:01,866 - INFO - allennlp.common.params - model_details.paper.title = RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al)
2022-03-28 12:51:01,867 - INFO - allennlp.common.params - model_details.paper.url = https://api.semanticscholar.org/CorpusID:198953378
2022-03-28 12:51:01,868 - INFO - allennlp.common.params - model_details.license = None
2022-03-28 12:51:01,868 - INFO - allennlp

2022-03-28 12:51:02,003 - INFO - allennlp.common.params - model_class = None
2022-03-28 12:51:02,004 - INFO - allennlp.common.params - registered_predictor_name = None
2022-03-28 12:51:02,005 - INFO - allennlp.common.params - display_name = Coreference Resolution
2022-03-28 12:51:02,006 - INFO - allennlp.common.params - task_id = coref
2022-03-28 12:51:02,007 - INFO - allennlp.common.params - model_usage.archive_file = coref-spanbert-large-2021.03.10.tar.gz
2022-03-28 12:51:02,007 - INFO - allennlp.common.params - model_usage.training_config = coref/coref_spanbert_large.jsonnet
2022-03-28 12:51:02,008 - INFO - allennlp.common.params - model_usage.install_instructions = pip install allennlp==2.1.0 allennlp-models==2.1.0
2022-03-28 12:51:02,009 - INFO - allennlp.common.params - model_usage.overrides = None
2022-03-28 12:51:02,009 - INFO - allennlp.common.params - model_details.description = The basic outline of this model is to get an embedded representation of each span in the document.

2022-03-28 12:51:02,105 - INFO - allennlp.common.params - model_details.paper.url = https://api.semanticscholar.org/CorpusID:199453025
2022-03-28 12:51:02,105 - INFO - allennlp.common.params - model_details.license = None
2022-03-28 12:51:02,106 - INFO - allennlp.common.params - model_details.contact = allennlp-contact@allenai.org
2022-03-28 12:51:02,107 - INFO - allennlp.common.params - intended_use.primary_uses = This model is developed for the AllenNLP demo.
2022-03-28 12:51:02,108 - INFO - allennlp.common.params - intended_use.primary_users = None
2022-03-28 12:51:02,109 - INFO - allennlp.common.params - intended_use.out_of_scope_use_cases = None
2022-03-28 12:51:02,110 - INFO - allennlp.common.params - factors.relevant_factors = None
2022-03-28 12:51:02,111 - INFO - allennlp.common.params - factors.evaluation_factors = None
2022-03-28 12:51:02,112 - INFO - allennlp.common.params - metrics.model_performance_measures = Accuracy and F1-score
2022-03-28 12:51:02,113 - INFO - allennlp.

2022-03-28 12:51:02,205 - INFO - allennlp.common.params - quantitative_analyses.unitary_results = Achieves 99% accuracy and 96% F1 on the CoNLL-2003 validation set.
2022-03-28 12:51:02,206 - INFO - allennlp.common.params - quantitative_analyses.intersectional_results = None
2022-03-28 12:51:02,207 - INFO - allennlp.common.params - model_ethical_considerations.ethical_considerations = None
2022-03-28 12:51:02,208 - INFO - allennlp.common.params - model_caveats_and_recommendations.caveats_and_recommendations = This model is based on ELMo. ELMo is not deterministic, meaning that you will see slight differences every time you run it. Also, ELMo likes to be warmed up, so we recommend processing dummy input before processing real workloads with it.
2022-03-28 12:51:02,244 - INFO - allennlp.common.params - id = semparse-text-to-sql
2022-03-28 12:51:02,245 - INFO - allennlp.common.params - registered_model_name = None
2022-03-28 12:51:02,246 - INFO - allennlp.common.params - model_class = None

2022-03-28 12:51:02,339 - INFO - allennlp.common.params - model_details.license = None
2022-03-28 12:51:02,340 - INFO - allennlp.common.params - model_details.contact = allennlp-contact@allenai.org
2022-03-28 12:51:02,341 - INFO - allennlp.common.params - intended_use.primary_uses = None
2022-03-28 12:51:02,343 - INFO - allennlp.common.params - intended_use.primary_users = None
2022-03-28 12:51:02,344 - INFO - allennlp.common.params - intended_use.out_of_scope_use_cases = None
2022-03-28 12:51:02,345 - INFO - allennlp.common.params - factors.relevant_factors = None
2022-03-28 12:51:02,346 - INFO - allennlp.common.params - factors.evaluation_factors = None
2022-03-28 12:51:02,347 - INFO - allennlp.common.params - metrics.model_performance_measures = Accuracy and Span-based F1 metric
2022-03-28 12:51:02,348 - INFO - allennlp.common.params - metrics.decision_thresholds = None
2022-03-28 12:51:02,349 - INFO - allennlp.common.params - metrics.variation_approaches = None
2022-03-28 12:51:02,

2022-03-28 12:51:02,494 - INFO - allennlp.common.params - registered_predictor_name = textual_entailment
2022-03-28 12:51:02,494 - INFO - allennlp.common.params - display_name = RoBERTa SNLI
2022-03-28 12:51:02,495 - INFO - allennlp.common.params - task_id = textual_entailment
2022-03-28 12:51:02,496 - INFO - allennlp.common.params - model_usage.archive_file = snli-roberta.2021-03-11.tar.gz
2022-03-28 12:51:02,496 - INFO - allennlp.common.params - model_usage.training_config = pair_classification/snli_roberta.jsonnet
2022-03-28 12:51:02,497 - INFO - allennlp.common.params - model_usage.install_instructions = pip install allennlp==2.1.0 allennlp-models==2.1.0
2022-03-28 12:51:02,497 - INFO - allennlp.common.params - model_usage.overrides = None
2022-03-28 12:51:02,498 - INFO - allennlp.common.params - model_details.description = This `Model` implements a basic text classifier. The text is embedded into a text field using a RoBERTa-large model. The resulting sequence is pooled using a cl

2022-03-28 12:51:02,604 - INFO - allennlp.common.params - metrics.decision_thresholds = None
2022-03-28 12:51:02,605 - INFO - allennlp.common.params - metrics.variation_approaches = None
2022-03-28 12:51:02,607 - INFO - allennlp.common.params - evaluation_data.dataset.name = WikiTableQuestions
2022-03-28 12:51:02,608 - INFO - allennlp.common.params - evaluation_data.dataset.notes = Please download the data from the url provided.
2022-03-28 12:51:02,608 - INFO - allennlp.common.params - evaluation_data.dataset.url = https://ppasupat.github.io/WikiTableQuestions/
2022-03-28 12:51:02,609 - INFO - allennlp.common.params - evaluation_data.motivation = None
2022-03-28 12:51:02,610 - INFO - allennlp.common.params - evaluation_data.preprocessing = None
2022-03-28 12:51:02,613 - INFO - allennlp.common.params - training_data.dataset.name = WikiTableQuestions
2022-03-28 12:51:02,614 - INFO - allennlp.common.params - training_data.dataset.notes = Please download the data from the url provided.
202

2022-03-28 12:51:02,766 - INFO - allennlp.common.params - model_details.developed_by = Stanovsky et al
2022-03-28 12:51:02,766 - INFO - allennlp.common.params - model_details.contributed_by = None
2022-03-28 12:51:02,767 - INFO - allennlp.common.params - model_details.date = 2020-03-26
2022-03-28 12:51:02,768 - INFO - allennlp.common.params - model_details.version = 1
2022-03-28 12:51:02,769 - INFO - allennlp.common.params - model_details.model_type = BiLSTM
2022-03-28 12:51:02,770 - INFO - allennlp.common.params - model_details.paper.citation = 
@inproceedings{Stanovsky2018SupervisedOI,
title={Supervised Open Information Extraction},
author={Gabriel Stanovsky and Julian Michael and Luke Zettlemoyer and I. Dagan},
booktitle={NAACL-HLT},
year={2018}}

2022-03-28 12:51:02,772 - INFO - allennlp.common.params - model_details.paper.title = Supervised Open Information Extraction
2022-03-28 12:51:02,772 - INFO - allennlp.common.params - model_details.paper.url = https://api.semanticscholar.or

2022-03-28 12:51:02,889 - INFO - allennlp.common.params - quantitative_analyses.unitary_results = None
2022-03-28 12:51:02,890 - INFO - allennlp.common.params - quantitative_analyses.intersectional_results = None
2022-03-28 12:51:02,891 - INFO - allennlp.common.params - model_ethical_considerations.ethical_considerations = None
2022-03-28 12:51:02,892 - INFO - allennlp.common.params - model_caveats_and_recommendations.caveats_and_recommendations = This model is trained on the original SNLI-VE dataset. [Subsequent work](https://api.semanticscholar.org/CorpusID:215415945) has found that an estimated 31% of `neutral` labels in the dataset are incorrect. The `e-SNLI-VE-2.0` dataset contains the re-annotated validation and test sets.
2022-03-28 12:51:02,935 - INFO - allennlp.common.params - id = vgqa-vilbert
2022-03-28 12:51:02,936 - INFO - allennlp.common.params - registered_model_name = vqa_vilbert_from_huggingface
2022-03-28 12:51:02,937 - INFO - allennlp.common.params - model_class = No

2022-03-28 12:51:03,037 - INFO - allennlp.common.params - model_details.short_description = RoBERTa finetuned on SNLI with binary gender bias mitigation.
2022-03-28 12:51:03,038 - INFO - allennlp.common.params - model_details.developed_by = Dev at al
2022-03-28 12:51:03,039 - INFO - allennlp.common.params - model_details.contributed_by = Arjun Subramonian
2022-03-28 12:51:03,040 - INFO - allennlp.common.params - model_details.date = 2021-05-20
2022-03-28 12:51:03,041 - INFO - allennlp.common.params - model_details.version = 1
2022-03-28 12:51:03,041 - INFO - allennlp.common.params - model_details.model_type = RoBERTa
2022-03-28 12:51:03,042 - INFO - allennlp.common.params - model_details.paper.citation = 
@article{Dev2020OnMA,
title={On Measuring and Mitigating Biased Inferences of Word Embeddings},
author={Sunipa Dev and Tao Li and J. M. Phillips and Vivek Srikumar},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2020},
volume={34},
number={05},
pages={

2022-03-28 12:51:03,141 - INFO - allennlp.common.params - evaluation_data.dataset.processed_url = balanced_real_val
2022-03-28 12:51:03,142 - INFO - allennlp.common.params - evaluation_data.dataset.url = https://visualqa.org/
2022-03-28 12:51:03,143 - INFO - allennlp.common.params - evaluation_data.motivation = None
2022-03-28 12:51:03,143 - INFO - allennlp.common.params - evaluation_data.preprocessing = None
2022-03-28 12:51:03,145 - INFO - allennlp.common.params - training_data.dataset.name = VQA dataset
2022-03-28 12:51:03,145 - INFO - allennlp.common.params - training_data.dataset.notes = Training requires a large amount of images to be accessible locally, so we cannot provide a command you can easily copy and paste. The first time you run it, you will get an error message that tells you how to get the rest of the data.
2022-03-28 12:51:03,146 - INFO - allennlp.common.params - training_data.dataset.processed_url = balanced_real_train
2022-03-28 12:51:03,146 - INFO - allennlp.common

2022-03-28 12:51:03,295 - INFO - allennlp.common.params - model_usage.training_config = structured-prediction/constituency_parser_elmo.jsonnet
2022-03-28 12:51:03,295 - INFO - allennlp.common.params - model_usage.install_instructions = pip install allennlp==2.1.0 allennlp-models==2.1.0
2022-03-28 12:51:03,296 - INFO - allennlp.common.params - model_usage.overrides = None
2022-03-28 12:51:03,296 - INFO - allennlp.common.params - model_details.description = This is an implementation of a minimal neural model for constituency parsing based on an independent scoring of labels and spans. This `SpanConstituencyParser` simply encodes a sequence of text with a stacked `Seq2SeqEncoder`, extracts span representations using a `SpanExtractor`, and then predicts a label for each span in the sequence. These labels are non-terminal nodes in a constituency parse tree, which we then greedily reconstruct. The model uses ELMo embeddings, which are completely character-based and improves single model perf

2022-03-28 12:51:03,399 - INFO - allennlp.common.params - intended_use.primary_users = None
2022-03-28 12:51:03,400 - INFO - allennlp.common.params - intended_use.out_of_scope_use_cases = None
2022-03-28 12:51:03,401 - INFO - allennlp.common.params - factors.relevant_factors = None
2022-03-28 12:51:03,402 - INFO - allennlp.common.params - factors.evaluation_factors = None
2022-03-28 12:51:03,403 - INFO - allennlp.common.params - metrics.model_performance_measures = Accuracy
2022-03-28 12:51:03,404 - INFO - allennlp.common.params - metrics.decision_thresholds = None
2022-03-28 12:51:03,407 - INFO - allennlp.common.params - metrics.variation_approaches = None
2022-03-28 12:51:03,408 - INFO - allennlp.common.params - evaluation_data.dataset.name = SuperGLUE Recognizing Textual Entailment validation set
2022-03-28 12:51:03,408 - INFO - allennlp.common.params - evaluation_data.dataset.processed_url = https://dl.fbaipublicfiles.com/glue/superglue/data/v2/RTE.zip!RTE/val.jsonl
2022-03-28 12:5

2022-03-28 12:51:03,539 - INFO - allennlp.common.params - task_id = rc
2022-03-28 12:51:03,539 - INFO - allennlp.common.params - model_usage.archive_file = transformer-qa.2021-02-11.tar.gz
2022-03-28 12:51:03,540 - INFO - allennlp.common.params - model_usage.training_config = rc/transformer_qa.jsonnet
2022-03-28 12:51:03,541 - INFO - allennlp.common.params - model_usage.install_instructions = pip install allennlp==2.1.0 allennlp-models==2.1.0
2022-03-28 12:51:03,541 - INFO - allennlp.common.params - model_usage.overrides = None
2022-03-28 12:51:03,542 - INFO - allennlp.common.params - model_details.description = The model implements a reading comprehension model patterned after the proposed model in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al, 2018)](https://api.semanticscholar.org/CorpusID:52967399), with improvements borrowed from the SQuAD model in the transformers project. It predicts start tokens and end tokens with a linear laye

2022-03-28 12:51:03,639 - INFO - allennlp.common.params - model_details.paper.url = https://api.semanticscholar.org/CorpusID:198953378
2022-03-28 12:51:03,640 - INFO - allennlp.common.params - model_details.license = None
2022-03-28 12:51:03,641 - INFO - allennlp.common.params - model_details.contact = allennlp-contact@allenai.org
2022-03-28 12:51:03,642 - INFO - allennlp.common.params - intended_use.primary_uses = None
2022-03-28 12:51:03,643 - INFO - allennlp.common.params - intended_use.primary_users = None
2022-03-28 12:51:03,648 - INFO - allennlp.common.params - intended_use.out_of_scope_use_cases = None
2022-03-28 12:51:03,650 - INFO - allennlp.common.params - factors.relevant_factors = None
2022-03-28 12:51:03,652 - INFO - allennlp.common.params - factors.evaluation_factors = None
2022-03-28 12:51:03,653 - INFO - allennlp.common.params - metrics.model_performance_measures = The chosen metric is accuracy, since it is a multiple choice model.
2022-03-28 12:51:03,654 - INFO - allen

2022-03-28 12:51:03,801 - INFO - allennlp.common.params - registered_model_name = bidaf
2022-03-28 12:51:03,802 - INFO - allennlp.common.params - model_class = None
2022-03-28 12:51:03,803 - INFO - allennlp.common.params - registered_predictor_name = None
2022-03-28 12:51:03,803 - INFO - allennlp.common.params - display_name = ELMo-BiDAF
2022-03-28 12:51:03,804 - INFO - allennlp.common.params - task_id = rc
2022-03-28 12:51:03,804 - INFO - allennlp.common.params - model_usage.archive_file = bidaf-elmo.2021-02-11.tar.gz
2022-03-28 12:51:03,805 - INFO - allennlp.common.params - model_usage.training_config = rc/bidaf_elmo.jsonnet
2022-03-28 12:51:03,805 - INFO - allennlp.common.params - model_usage.install_instructions = pip install allennlp==2.1.0 allennlp-models==2.1.0
2022-03-28 12:51:03,806 - INFO - allennlp.common.params - model_usage.overrides = None
2022-03-28 12:51:03,806 - INFO - allennlp.common.params - model_details.description = This is an implementation of the BiDAF model wit

2022-03-28 12:51:03,884 - INFO - allennlp.common.params - model_details.model_type = RoBERTa
2022-03-28 12:51:03,884 - INFO - allennlp.common.params - model_details.paper.citation = 
@article{Zhang2018MitigatingUB,
title={Mitigating Unwanted Biases with Adversarial Learning},
author={B. H. Zhang and B. Lemoine and Margaret Mitchell},
journal={Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society},
year={2018}
}
2022-03-28 12:51:03,885 - INFO - allennlp.common.params - model_details.paper.title = Mitigating Unwanted Biases with Adversarial Learning
2022-03-28 12:51:03,886 - INFO - allennlp.common.params - model_details.paper.url = https://api.semanticscholar.org/CorpusID:9424845
2022-03-28 12:51:03,887 - INFO - allennlp.common.params - model_details.license = None
2022-03-28 12:51:03,888 - INFO - allennlp.common.params - model_details.contact = allennlp-contact@allenai.org
2022-03-28 12:51:03,890 - INFO - allennlp.common.params - intended_use.primary_uses = None
2022-03

2022-03-28 12:51:03,989 - INFO - allennlp.common.params - evaluation_data.dataset.processed_url = /path/to/dataset
2022-03-28 12:51:03,990 - INFO - allennlp.common.params - evaluation_data.dataset.url = https://catalog.ldc.upenn.edu/LDC99T42
2022-03-28 12:51:03,991 - INFO - allennlp.common.params - evaluation_data.motivation = None
2022-03-28 12:51:03,992 - INFO - allennlp.common.params - evaluation_data.preprocessing = None
2022-03-28 12:51:03,994 - INFO - allennlp.common.params - training_data.dataset.name = PTB 3.0
2022-03-28 12:51:03,995 - INFO - allennlp.common.params - training_data.dataset.notes = The dependency parser was evaluated on the Penn Tree Bank dataset. Unfortunately we cannot release this data due to licensing restrictions by the LDC. You can download the PTB data from the LDC website.
2022-03-28 12:51:03,996 - INFO - allennlp.common.params - training_data.dataset.processed_url = /path/to/dataset
2022-03-28 12:51:03,997 - INFO - allennlp.common.params - training_data.

2022-03-28 12:51:04,145 - INFO - allennlp.common.params - model_details.developed_by = Liu et al
2022-03-28 12:51:04,145 - INFO - allennlp.common.params - model_details.contributed_by = Dirk Groeneveld
2022-03-28 12:51:04,146 - INFO - allennlp.common.params - model_details.date = 2020-07-29
2022-03-28 12:51:04,147 - INFO - allennlp.common.params - model_details.version = 1
2022-03-28 12:51:04,148 - INFO - allennlp.common.params - model_details.model_type = RoBERTa
2022-03-28 12:51:04,149 - INFO - allennlp.common.params - model_details.paper.citation = 
@article{Liu2019RoBERTaAR,
title={RoBERTa: A Robustly Optimized BERT Pretraining Approach},
author={Y. Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and M. Lewis and Luke Zettlemoyer and Veselin Stoyanov},
journal={ArXiv},
year={2019},
volume={abs/1907.11692}}

2022-03-28 12:51:04,152 - INFO - allennlp.common.params - model_details.paper.title = RoBERTa: A Robustly Optimized BERT Pretrainin

2022-03-28 12:51:04,252 - INFO - allennlp.common.params - evaluation_data.dataset.processed_url = https://allennlp.s3.amazonaws.com/datasets/snli/snli_1.0_test.jsonl
2022-03-28 12:51:04,253 - INFO - allennlp.common.params - evaluation_data.dataset.url = https://nlp.stanford.edu/projects/snli/
2022-03-28 12:51:04,254 - INFO - allennlp.common.params - evaluation_data.motivation = None
2022-03-28 12:51:04,254 - INFO - allennlp.common.params - evaluation_data.preprocessing = None
2022-03-28 12:51:04,256 - INFO - allennlp.common.params - training_data.dataset.name = Stanford Natural Language Inference (SNLI) train set
2022-03-28 12:51:04,257 - INFO - allennlp.common.params - training_data.dataset.processed_url = https://allennlp.s3.amazonaws.com/datasets/snli/snli_1.0_train.jsonl
2022-03-28 12:51:04,257 - INFO - allennlp.common.params - training_data.dataset.url = https://nlp.stanford.edu/projects/snli/
2022-03-28 12:51:04,258 - INFO - allennlp.common.params - training_data.motivation = Non

2022-03-28 12:51:04,400 - INFO - allennlp.common.params - model_details.short_description = RoBERTa-based multiple choice model for PIQA.
2022-03-28 12:51:04,401 - INFO - allennlp.common.params - model_details.developed_by = Devlin et al
2022-03-28 12:51:04,401 - INFO - allennlp.common.params - model_details.contributed_by = Dirk Groeneveld
2022-03-28 12:51:04,401 - INFO - allennlp.common.params - model_details.date = 2020-07-08
2022-03-28 12:51:04,402 - INFO - allennlp.common.params - model_details.version = 1
2022-03-28 12:51:04,403 - INFO - allennlp.common.params - model_details.model_type = RoBERTa large
2022-03-28 12:51:04,404 - INFO - allennlp.common.params - model_details.paper.citation = 
@article{Liu2019RoBERTaAR,
title={RoBERTa: A Robustly Optimized BERT Pretraining Approach},
author={Y. Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and M. Lewis and Luke Zettlemoyer and Veselin Stoyanov},
journal={ArXiv},
year={2019},
volume={ab

2022-03-28 12:51:04,510 - INFO - allennlp.common.params - training_data.preprocessing = Dragnet and [Newspaper](https://github.com/codelucas/newspaper) content extractors are used. Wikipedia articles are removed.
2022-03-28 12:51:04,511 - INFO - allennlp.common.params - quantitative_analyses.unitary_results = None
2022-03-28 12:51:04,512 - INFO - allennlp.common.params - quantitative_analyses.intersectional_results = None
2022-03-28 12:51:04,513 - INFO - allennlp.common.params - model_ethical_considerations.ethical_considerations = None
2022-03-28 12:51:04,517 - INFO - allennlp.common.params - model_caveats_and_recommendations.caveats_and_recommendations = None
2022-03-28 12:51:04,554 - INFO - allennlp.common.params - id = tagging-fine-grained-crf-tagger
2022-03-28 12:51:04,555 - INFO - allennlp.common.params - registered_model_name = crf_tagger
2022-03-28 12:51:04,556 - INFO - allennlp.common.params - model_class = None
2022-03-28 12:51:04,557 - INFO - allennlp.common.params - registe

2022-03-28 12:51:04,657 - INFO - allennlp.common.params - model_details.model_type = BART
2022-03-28 12:51:04,657 - INFO - allennlp.common.params - model_details.paper.citation = 
@inproceedings{Lewis2020BARTDS,
title={BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension},
author={M. Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and A. Mohamed and Omer Levy and Ves Stoyanov and L. Zettlemoyer},
booktitle={ACL},
year={2020}}

2022-03-28 12:51:04,658 - INFO - allennlp.common.params - model_details.paper.title = BART: Denosing Sequence-to-Sequence Pre-training for Natural Language Generation,Translation, and Comprehension
2022-03-28 12:51:04,659 - INFO - allennlp.common.params - model_details.paper.url = https://api.semanticscholar.org/CorpusID:204960716
2022-03-28 12:51:04,660 - INFO - allennlp.common.params - model_details.license = None
2022-03-28 12:51:04,660 - INFO - allennlp.common.params - model_details.co

2022-03-28 12:51:05,765 - INFO - allennlp.common.params - model.encoder.num_layers = 8
2022-03-28 12:51:05,766 - INFO - allennlp.common.params - model.encoder.recurrent_dropout_probability = 0.1
2022-03-28 12:51:05,767 - INFO - allennlp.common.params - model.encoder.use_highway = True
2022-03-28 12:51:05,767 - INFO - allennlp.common.params - model.encoder.use_input_projection_bias = True
2022-03-28 12:51:05,768 - INFO - allennlp.common.params - model.encoder.stateful = False
2022-03-28 12:51:06,073 - INFO - allennlp.common.params - model.binary_feature_dim = 100
2022-03-28 12:51:06,074 - INFO - allennlp.common.params - model.embedding_dropout = 0.0
2022-03-28 12:51:06,075 - INFO - allennlp.common.params - model.initializer = <allennlp.nn.initializers.InitializerApplicator object at 0x7fa1995f4cd0>
2022-03-28 12:51:06,076 - INFO - allennlp.common.params - model.label_smoothing = None
2022-03-28 12:51:06,076 - INFO - allennlp.common.params - model.ignore_span_metric = False
2022-03-28 12

2022-03-28 12:51:06,867 - INFO - allennlp.common.params - model_details.paper.url = https://api.semanticscholar.org/CorpusID:34032948
2022-03-28 12:51:06,868 - INFO - allennlp.common.params - model_details.license = None
2022-03-28 12:51:06,869 - INFO - allennlp.common.params - model_details.contact = allennlp-contact@allenai.org
2022-03-28 12:51:06,870 - INFO - allennlp.common.params - intended_use.primary_uses = None
2022-03-28 12:51:06,871 - INFO - allennlp.common.params - intended_use.primary_users = None
2022-03-28 12:51:06,871 - INFO - allennlp.common.params - intended_use.out_of_scope_use_cases = None
2022-03-28 12:51:06,873 - INFO - allennlp.common.params - factors.relevant_factors = None
2022-03-28 12:51:06,873 - INFO - allennlp.common.params - factors.evaluation_factors = None
2022-03-28 12:51:06,874 - INFO - allennlp.common.params - metrics.model_performance_measures = Accuracy
2022-03-28 12:51:06,875 - INFO - allennlp.common.params - metrics.decision_thresholds = None
2022-

2022-03-28 12:51:07,029 - INFO - allennlp.common.params - model_details.short_description = RoBERTa-based binary classifier for Stanford Sentiment Treebank
2022-03-28 12:51:07,030 - INFO - allennlp.common.params - model_details.developed_by = Devlin et al
2022-03-28 12:51:07,031 - INFO - allennlp.common.params - model_details.contributed_by = Zhaofeng Wu
2022-03-28 12:51:07,031 - INFO - allennlp.common.params - model_details.date = 2020-06-08
2022-03-28 12:51:07,032 - INFO - allennlp.common.params - model_details.version = 1
2022-03-28 12:51:07,033 - INFO - allennlp.common.params - model_details.model_type = RoBERTa large
2022-03-28 12:51:07,034 - INFO - allennlp.common.params - model_details.paper.citation = 
@article{Liu2019RoBERTaAR,
title={RoBERTa: A Robustly Optimized BERT Pretraining Approach},
author={Y. Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and M. Lewis and Luke Zettlemoyer and Veselin Stoyanov},
journal={ArXiv},
year={201

2022-03-28 12:51:07,138 - INFO - allennlp.common.params - training_data.dataset.notes = Please download the data from the url provided.
2022-03-28 12:51:07,138 - INFO - allennlp.common.params - training_data.dataset.url = https://github.com/jonathanherzig/commonsenseqa
2022-03-28 12:51:07,139 - INFO - allennlp.common.params - training_data.motivation = None
2022-03-28 12:51:07,140 - INFO - allennlp.common.params - training_data.preprocessing = None
2022-03-28 12:51:07,141 - INFO - allennlp.common.params - quantitative_analyses.unitary_results = None
2022-03-28 12:51:07,143 - INFO - allennlp.common.params - quantitative_analyses.intersectional_results = None
2022-03-28 12:51:07,144 - INFO - allennlp.common.params - model_ethical_considerations.ethical_considerations = None
2022-03-28 12:51:07,145 - INFO - allennlp.common.params - model_caveats_and_recommendations.caveats_and_recommendations = None
2022-03-28 12:51:07,188 - INFO - allennlp.common.params - id = glove-sst
2022-03-28 12:51:

2022-03-28 12:51:07,289 - INFO - allennlp.common.params - model_details.license = None
2022-03-28 12:51:07,290 - INFO - allennlp.common.params - model_details.contact = allennlp-contact@allenai.org
2022-03-28 12:51:07,291 - INFO - allennlp.common.params - intended_use.primary_uses = None
2022-03-28 12:51:07,291 - INFO - allennlp.common.params - intended_use.primary_users = None
2022-03-28 12:51:07,292 - INFO - allennlp.common.params - intended_use.out_of_scope_use_cases = None
2022-03-28 12:51:07,293 - INFO - allennlp.common.params - factors.relevant_factors = None
2022-03-28 12:51:07,295 - INFO - allennlp.common.params - factors.evaluation_factors = None
2022-03-28 12:51:07,296 - INFO - allennlp.common.params - metrics.model_performance_measures = CoNLL coref scores and Mention Recall
2022-03-28 12:51:07,297 - INFO - allennlp.common.params - metrics.decision_thresholds = None
2022-03-28 12:51:07,298 - INFO - allennlp.common.params - metrics.variation_approaches = None
2022-03-28 12:51

2022-03-28 12:51:07,409 - INFO - allennlp.common.params - quantitative_analyses.intersectional_results = None
2022-03-28 12:51:07,410 - INFO - allennlp.common.params - model_ethical_considerations.ethical_considerations = None
2022-03-28 12:51:07,411 - INFO - allennlp.common.params - model_caveats_and_recommendations = None
2022-03-28 12:51:07,455 - INFO - allennlp.common.params - id = tagging-elmo-crf-tagger
2022-03-28 12:51:07,455 - INFO - allennlp.common.params - registered_model_name = crf_tagger
2022-03-28 12:51:07,456 - INFO - allennlp.common.params - model_class = None
2022-03-28 12:51:07,456 - INFO - allennlp.common.params - registered_predictor_name = None
2022-03-28 12:51:07,456 - INFO - allennlp.common.params - display_name = ELMo-based Named Entity Recognition
2022-03-28 12:51:07,457 - INFO - allennlp.common.params - task_id = ner
2022-03-28 12:51:07,457 - INFO - allennlp.common.params - model_usage.archive_file = ner-elmo.2021-02-12.tar.gz
2022-03-28 12:51:07,458 - INFO - 

2022-03-28 12:51:07,546 - INFO - allennlp.common.params - model_details.date = 2020-02-10
2022-03-28 12:51:07,546 - INFO - allennlp.common.params - model_details.version = 1
2022-03-28 12:51:07,547 - INFO - allennlp.common.params - model_details.model_type = None
2022-03-28 12:51:07,548 - INFO - allennlp.common.params - model_details.paper.citation = 
@inproceedings{Dasigi2019IterativeSF,
title={Iterative Search for Weakly Supervised Semantic Parsing},
author={Pradeep Dasigi and Matt Gardner and Shikhar Murty and Luke Zettlemoyer and E. Hovy},
booktitle={NAACL-HLT},
year={2019}}

2022-03-28 12:51:07,549 - INFO - allennlp.common.params - model_details.paper.title = Iterative Search for Weakly Supervised Semantic Parsing
2022-03-28 12:51:07,550 - INFO - allennlp.common.params - model_details.paper.url = https://api.semanticscholar.org/CorpusID:174799945
2022-03-28 12:51:07,551 - INFO - allennlp.common.params - model_details.license = None
2022-03-28 12:51:07,551 - INFO - allennlp.common.

2022-03-28 12:51:07,708 - INFO - allennlp.common.params - id = nlvr2-vilbert
2022-03-28 12:51:07,709 - INFO - allennlp.common.params - registered_model_name = nlvr2
2022-03-28 12:51:07,709 - INFO - allennlp.common.params - model_class = None
2022-03-28 12:51:07,710 - INFO - allennlp.common.params - registered_predictor_name = None
2022-03-28 12:51:07,711 - INFO - allennlp.common.params - display_name = Visual Entailment - NLVR2
2022-03-28 12:51:07,712 - INFO - allennlp.common.params - task_id = nlvr2
2022-03-28 12:51:07,712 - INFO - allennlp.common.params - model_usage.archive_file = vilbert-nlvr2-2021.06.01.tar.gz
2022-03-28 12:51:07,713 - INFO - allennlp.common.params - model_usage.training_config = vilbert_nlvr2_pretrained.jsonnet
2022-03-28 12:51:07,713 - INFO - allennlp.common.params - model_usage.install_instructions = pip install allennlp>=2.5.1 allennlp-models>=2.5.1
2022-03-28 12:51:07,714 - INFO - allennlp.common.params - model_usage.overrides = None
2022-03-28 12:51:07,715 -

2022-03-28 12:51:07,816 - INFO - allennlp.common.params - intended_use.primary_users = None
2022-03-28 12:51:07,817 - INFO - allennlp.common.params - intended_use.out_of_scope_use_cases = None
2022-03-28 12:51:07,818 - INFO - allennlp.common.params - factors.relevant_factors = None
2022-03-28 12:51:07,819 - INFO - allennlp.common.params - factors.evaluation_factors = None
2022-03-28 12:51:07,821 - INFO - allennlp.common.params - metrics.model_performance_measures = Accuracy
2022-03-28 12:51:07,823 - INFO - allennlp.common.params - metrics.decision_thresholds = None
2022-03-28 12:51:07,824 - INFO - allennlp.common.params - metrics.variation_approaches = None
2022-03-28 12:51:07,825 - INFO - allennlp.common.params - evaluation_data.dataset.name = Stanford Natural Language Inference (SNLI) dev set
2022-03-28 12:51:07,827 - INFO - allennlp.common.params - evaluation_data.dataset.processed_url = https://allennlp.s3.amazonaws.com/datasets/snli/snli_1.0_test.jsonl
2022-03-28 12:51:07,827 - IN

2022-03-28 12:51:07,976 - INFO - allennlp.common.params - model_usage.training_config = structured_prediction/bert_base_srl.jsonnet
2022-03-28 12:51:07,977 - INFO - allennlp.common.params - model_usage.install_instructions = pip install allennlp==2.1.0 allennlp-models==2.1.0
2022-03-28 12:51:07,977 - INFO - allennlp.common.params - model_usage.overrides = None
2022-03-28 12:51:07,978 - INFO - allennlp.common.params - model_details.description = An implementation of a BERT based model (Shi et al, 2019) with some modifications (no additional parameters apart from a linear classification layer), which is currently the state of the art single model for English PropBank SRL (Newswire sentences). It achieves 86.49 test F1 on the Ontonotes 5.0 dataset.
2022-03-28 12:51:07,979 - INFO - allennlp.common.params - model_details.short_description = A BERT based model (Shi et al, 2019) with some modifications (no additional parameters apart from a linear classification layer)
2022-03-28 12:51:07,979

2022-03-28 12:51:08,095 - INFO - allennlp.common.params - evaluation_data.dataset.url = https://github.com/gabrielStanovsky/oie-benchmark
2022-03-28 12:51:08,096 - INFO - allennlp.common.params - evaluation_data.motivation = None
2022-03-28 12:51:08,097 - INFO - allennlp.common.params - evaluation_data.preprocessing = None
2022-03-28 12:51:08,098 - INFO - allennlp.common.params - training_data.dataset.name = All Words Open IE
2022-03-28 12:51:08,100 - INFO - allennlp.common.params - training_data.dataset.url = https://github.com/gabrielStanovsky/supervised-oie/tree/master/data
2022-03-28 12:51:08,101 - INFO - allennlp.common.params - training_data.motivation = None
2022-03-28 12:51:08,103 - INFO - allennlp.common.params - training_data.preprocessing = None
2022-03-28 12:51:08,105 - INFO - allennlp.common.params - quantitative_analyses.unitary_results = None
2022-03-28 12:51:08,105 - INFO - allennlp.common.params - quantitative_analyses.intersectional_results = None
2022-03-28 12:51:08,

2022-03-28 12:51:08,241 - INFO - allennlp.common.params - model_details.contributed_by = Jacob Morrison
2022-03-28 12:51:08,241 - INFO - allennlp.common.params - model_details.date = 2021-05-07
2022-03-28 12:51:08,241 - INFO - allennlp.common.params - model_details.version = 2
2022-03-28 12:51:08,242 - INFO - allennlp.common.params - model_details.model_type = ViLBERT based on BERT large
2022-03-28 12:51:08,243 - INFO - allennlp.common.params - model_details.paper.citation = 
@inproceedings{Lu2019ViLBERTPT,
title={ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks},
author={Jiasen Lu and Dhruv Batra and D. Parikh and Stefan Lee},
booktitle={NeurIPS},
year={2019}
}
2022-03-28 12:51:08,243 - INFO - allennlp.common.params - model_details.paper.title = ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
2022-03-28 12:51:08,244 - INFO - allennlp.common.params - model_details.paper.url = https://api.se

2022-03-28 12:51:08,343 - INFO - allennlp.common.params - metrics.decision_thresholds = None
2022-03-28 12:51:08,344 - INFO - allennlp.common.params - metrics.variation_approaches = None
2022-03-28 12:51:08,345 - INFO - allennlp.common.params - evaluation_data.dataset.name = On Measuring and Mitigating Biased Gender-Occupation Inferences SNLI Dataset
2022-03-28 12:51:08,345 - INFO - allennlp.common.params - evaluation_data.dataset.processed_url = https://storage.googleapis.com/allennlp-public-models/binary-gender-bias-mitigated-snli-dataset.jsonl
2022-03-28 12:51:08,346 - INFO - allennlp.common.params - evaluation_data.dataset.url = https://github.com/sunipa/On-Measuring-and-Mitigating-Biased-Inferences-of-Word-Embeddings
2022-03-28 12:51:08,347 - INFO - allennlp.common.params - evaluation_data.motivation = None
2022-03-28 12:51:08,347 - INFO - allennlp.common.params - evaluation_data.preprocessing = None
2022-03-28 12:51:08,349 - INFO - allennlp.common.params - training_data.dataset.n

2022-03-28 12:51:08,472 - INFO - allennlp.common.params - id = rc-naqanet
2022-03-28 12:51:08,473 - INFO - allennlp.common.params - registered_model_name = naqanet
2022-03-28 12:51:08,473 - INFO - allennlp.common.params - model_class = None
2022-03-28 12:51:08,474 - INFO - allennlp.common.params - registered_predictor_name = None
2022-03-28 12:51:08,475 - INFO - allennlp.common.params - display_name = Numerically Augmented QA Net
2022-03-28 12:51:08,475 - INFO - allennlp.common.params - task_id = rc
2022-03-28 12:51:08,476 - INFO - allennlp.common.params - model_usage.archive_file = naqanet-2021.02.26.tar.gz
2022-03-28 12:51:08,477 - INFO - allennlp.common.params - model_usage.training_config = rc/naqanet.jsonnet
2022-03-28 12:51:08,478 - INFO - allennlp.common.params - model_usage.install_instructions = pip install allennlp==2.1.0 allennlp-models==2.1.0
2022-03-28 12:51:08,478 - INFO - allennlp.common.params - model_usage.overrides = None
2022-03-28 12:51:08,479 - INFO - allennlp.comm

2022-03-28 12:51:08,568 - INFO - allennlp.common.params - model_details.paper.title = Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples
2022-03-28 12:51:08,569 - INFO - allennlp.common.params - model_details.paper.url = https://api.semanticscholar.org/CorpusID:21712653
2022-03-28 12:51:08,569 - INFO - allennlp.common.params - model_details.license = None
2022-03-28 12:51:08,570 - INFO - allennlp.common.params - model_details.contact = allennlp-contact@allenai.org
2022-03-28 12:51:08,571 - INFO - allennlp.common.params - intended_use.primary_uses = None
2022-03-28 12:51:08,572 - INFO - allennlp.common.params - intended_use.primary_users = None
2022-03-28 12:51:08,572 - INFO - allennlp.common.params - intended_use.out_of_scope_use_cases = None
2022-03-28 12:51:08,574 - INFO - allennlp.common.params - factors.relevant_factors = None
2022-03-28 12:51:08,574 - INFO - allennlp.common.params - factors.evaluation_factors = None
2022-03-28 12:51:08,576 - INFO 

2022-03-28 12:51:08,661 - INFO - allennlp.common.params - quantitative_analyses.intersectional_results = None
2022-03-28 12:51:08,662 - INFO - allennlp.common.params - model_ethical_considerations.ethical_considerations = None
2022-03-28 12:51:08,664 - INFO - allennlp.common.params - model_caveats_and_recommendations.caveats_and_recommendations = None
2022-03-28 12:51:08,711 - INFO - allennlp.common.params - id = lm-masked-language-model
2022-03-28 12:51:08,712 - INFO - allennlp.common.params - registered_model_name = masked_language_model
2022-03-28 12:51:08,712 - INFO - allennlp.common.params - model_class = None
2022-03-28 12:51:08,713 - INFO - allennlp.common.params - registered_predictor_name = None
2022-03-28 12:51:08,714 - INFO - allennlp.common.params - display_name = BERT-based Masked Language Model
2022-03-28 12:51:08,715 - INFO - allennlp.common.params - task_id = masked-language-modeling
2022-03-28 12:51:08,716 - INFO - allennlp.common.params - model_usage.archive_file = be

2022-03-28 12:51:08,811 - INFO - allennlp.common.params - model_details.paper.title = RoBERTa: A Robustly Optimized BERT Pretraining Approach
2022-03-28 12:51:08,812 - INFO - allennlp.common.params - model_details.paper.url = https://api.semanticscholar.org/CorpusID:198953378
2022-03-28 12:51:08,813 - INFO - allennlp.common.params - model_details.license = None
2022-03-28 12:51:08,813 - INFO - allennlp.common.params - model_details.contact = allennlp-contact@allenai.org
2022-03-28 12:51:08,814 - INFO - allennlp.common.params - intended_use.primary_uses = None
2022-03-28 12:51:08,815 - INFO - allennlp.common.params - intended_use.primary_users = None
2022-03-28 12:51:08,816 - INFO - allennlp.common.params - intended_use.out_of_scope_use_cases = None
2022-03-28 12:51:08,817 - INFO - allennlp.common.params - factors.relevant_factors = None
2022-03-28 12:51:08,818 - INFO - allennlp.common.params - factors.evaluation_factors = None
2022-03-28 12:51:08,820 - INFO - allennlp.common.params - m

2022-03-28 12:51:08,906 - INFO - allennlp.common.params - training_data.preprocessing = None
2022-03-28 12:51:08,907 - INFO - allennlp.common.params - quantitative_analyses.unitary_results = None
2022-03-28 12:51:08,908 - INFO - allennlp.common.params - quantitative_analyses.intersectional_results = None
2022-03-28 12:51:08,909 - INFO - allennlp.common.params - model_ethical_considerations.ethical_considerations = None
2022-03-28 12:51:08,910 - INFO - allennlp.common.params - model_caveats_and_recommendations.caveats_and_recommendations = None
2022-03-28 12:51:08,950 - INFO - allennlp.common.params - id = rc-bidaf
2022-03-28 12:51:08,952 - INFO - allennlp.common.params - registered_model_name = bidaf
2022-03-28 12:51:08,952 - INFO - allennlp.common.params - model_class = None
2022-03-28 12:51:08,953 - INFO - allennlp.common.params - registered_predictor_name = None
2022-03-28 12:51:08,953 - INFO - allennlp.common.params - display_name = BiDAF
2022-03-28 12:51:08,955 - INFO - allennlp.c

2022-03-28 12:51:09,052 - INFO - allennlp.common.params - model_details.paper.title = Bidirectional Attention Flow for Machine Comprehension
2022-03-28 12:51:09,053 - INFO - allennlp.common.params - model_details.paper.url = https://api.semanticscholar.org/CorpusID:8535316
2022-03-28 12:51:09,054 - INFO - allennlp.common.params - model_details.license = None
2022-03-28 12:51:09,054 - INFO - allennlp.common.params - model_details.contact = allennlp-contact@allenai.org
2022-03-28 12:51:09,056 - INFO - allennlp.common.params - intended_use.primary_uses = None
2022-03-28 12:51:09,056 - INFO - allennlp.common.params - intended_use.primary_users = None
2022-03-28 12:51:09,057 - INFO - allennlp.common.params - intended_use.out_of_scope_use_cases = None
2022-03-28 12:51:09,058 - INFO - allennlp.common.params - factors.relevant_factors = None
2022-03-28 12:51:09,059 - INFO - allennlp.common.params - factors.evaluation_factors = None
2022-03-28 12:51:09,060 - INFO - allennlp.common.params - metr

2022-03-28 12:51:09,144 - INFO - allennlp.common.params - evaluation_data.preprocessing = None
2022-03-28 12:51:09,146 - INFO - allennlp.common.params - training_data.dataset.name = Stanford Natural Language Inference (SNLI) train set
2022-03-28 12:51:09,146 - INFO - allennlp.common.params - training_data.dataset.processed_url = https://allennlp.s3.amazonaws.com/datasets/snli/snli_1.0_train.jsonl
2022-03-28 12:51:09,147 - INFO - allennlp.common.params - training_data.dataset.url = https://nlp.stanford.edu/projects/snli/
2022-03-28 12:51:09,147 - INFO - allennlp.common.params - training_data.motivation = None
2022-03-28 12:51:09,148 - INFO - allennlp.common.params - training_data.preprocessing = None
2022-03-28 12:51:09,149 - INFO - allennlp.common.params - quantitative_analyses.unitary_results = Net Neutral: 0.613096454815352, Fraction Neutral: 0.6704967487937075, Threshold:0.5: 0.6637061892722586, Threshold:0.7: 0.49490217463150243
2022-03-28 12:51:09,149 - INFO - allennlp.common.para

2022-03-28 12:51:09,287 - INFO - allennlp.common.params - task_id = semparse-nlvr
2022-03-28 12:51:09,288 - INFO - allennlp.common.params - model_usage.archive_file = https://allennlp.s3.amazonaws.com/models/nlvr-erm-model-2020.02.10-rule-vocabulary-updated.tar.gz
2022-03-28 12:51:09,289 - INFO - allennlp.common.params - model_usage.training_config = None
2022-03-28 12:51:09,289 - INFO - allennlp.common.params - model_usage.install_instructions = pip install allennlp==1.0.0 allennlp-models==1.0.0
2022-03-28 12:51:09,290 - INFO - allennlp.common.params - model_usage.overrides = None
2022-03-28 12:51:09,291 - INFO - allennlp.common.params - model_details.description = The model is a semantic parser trained on Cornell NLVR.
2022-03-28 12:51:09,292 - INFO - allennlp.common.params - model_details.short_description = The model is a semantic parser trained on Cornell NLVR.
2022-03-28 12:51:09,293 - INFO - allennlp.common.params - model_details.developed_by = Dasigi et al
2022-03-28 12:51:09,2

2022-03-28 12:51:09,400 - INFO - allennlp.common.params - evaluation_data.dataset.processed_url = https://allennlp.s3.amazonaws.com/datasets/multinli/multinli_1.0_dev_mismatched.jsonl
2022-03-28 12:51:09,401 - INFO - allennlp.common.params - evaluation_data.dataset.url = https://cims.nyu.edu/~sbowman/multinli/
2022-03-28 12:51:09,402 - INFO - allennlp.common.params - evaluation_data.motivation = None
2022-03-28 12:51:09,402 - INFO - allennlp.common.params - evaluation_data.preprocessing = None
2022-03-28 12:51:09,404 - INFO - allennlp.common.params - training_data.dataset.name = Multi-genre Natural Language Inference (MultiNLI) train set
2022-03-28 12:51:09,405 - INFO - allennlp.common.params - training_data.dataset.processed_url = https://allennlp.s3.amazonaws.com/datasets/multinli/multinli_1.0_train.jsonl
2022-03-28 12:51:09,405 - INFO - allennlp.common.params - training_data.dataset.url = https://cims.nyu.edu/~sbowman/multinli/
2022-03-28 12:51:09,406 - INFO - allennlp.common.params

2022-03-28 12:51:09,548 - INFO - allennlp.common.params - model_usage.training_config = None
2022-03-28 12:51:09,549 - INFO - allennlp.common.params - model_usage.install_instructions = The model is available at https://github.com/anthonywchen/MOCHA.
2022-03-28 12:51:09,551 - INFO - allennlp.common.params - model_usage.overrides = None
2022-03-28 12:51:09,553 - INFO - allennlp.common.params - model_details.description = LERC is a BERT model that is trained to mimic human judgement scores on candidate answers in the MOCHA dataset. LERC outputs scores that range from 1 to 5, however, to stay consistent with metrics such as BLEU and ROUGE, we normalize the output of LERC to be between 0 and 1 in this demo.
2022-03-28 12:51:09,554 - INFO - allennlp.common.params - model_details.short_description = A BERT model that scores candidate answers from 0 to 1.
2022-03-28 12:51:09,555 - INFO - allennlp.common.params - model_details.developed_by = Chen et al
2022-03-28 12:51:09,556 - INFO - allennlp

2022-03-28 12:51:09,672 - INFO - allennlp.common.params - metrics.decision_thresholds = None
2022-03-28 12:51:09,672 - INFO - allennlp.common.params - metrics.variation_approaches = None
2022-03-28 12:51:09,674 - INFO - allennlp.common.params - evaluation_data.dataset.name = PIQA (validation set)
2022-03-28 12:51:09,675 - INFO - allennlp.common.params - evaluation_data.dataset.notes = Please download the data from the url provided.
2022-03-28 12:51:09,676 - INFO - allennlp.common.params - evaluation_data.dataset.url = https://yonatanbisk.com/piqa/
2022-03-28 12:51:09,676 - INFO - allennlp.common.params - evaluation_data.motivation = None
2022-03-28 12:51:09,677 - INFO - allennlp.common.params - evaluation_data.preprocessing = None
2022-03-28 12:51:09,679 - INFO - allennlp.common.params - training_data.dataset.name = PIQA (train set)
2022-03-28 12:51:09,680 - INFO - allennlp.common.params - training_data.dataset.notes = Please download the data from the url provided.
2022-03-28 12:51:09

2022-03-28 12:51:09,839 - INFO - allennlp.common.params - model_details.developed_by = Lample et al
2022-03-28 12:51:09,840 - INFO - allennlp.common.params - model_details.contributed_by = None
2022-03-28 12:51:09,841 - INFO - allennlp.common.params - model_details.date = 2020-06-24
2022-03-28 12:51:09,842 - INFO - allennlp.common.params - model_details.version = 1
2022-03-28 12:51:09,843 - INFO - allennlp.common.params - model_details.model_type = BiLSTM
2022-03-28 12:51:09,843 - INFO - allennlp.common.params - model_details.paper.citation = 
@article{Lample2016NeuralAF,
title={Neural Architectures for Named Entity Recognition},
author={Guillaume Lample and Miguel Ballesteros and Sandeep Subramanian and K. Kawakami and Chris Dyer},
journal={ArXiv},
year={2016},
volume={abs/1603.01360}}

2022-03-28 12:51:09,846 - INFO - allennlp.common.params - model_details.paper.title = Neural Architectures for Named Entity Recognition
2022-03-28 12:51:09,847 - INFO - allennlp.common.params - model_d

2022-03-28 12:51:09,952 - INFO - allennlp.common.params - evaluation_data.preprocessing = None
2022-03-28 12:51:09,953 - INFO - allennlp.common.params - training_data.dataset.name = CNN/DailyMail
2022-03-28 12:51:09,955 - INFO - allennlp.common.params - training_data.dataset.notes = Please download the data from the url provided.
2022-03-28 12:51:09,956 - INFO - allennlp.common.params - training_data.dataset.url = https://github.com/abisee/cnn-dailymail
2022-03-28 12:51:09,957 - INFO - allennlp.common.params - training_data.motivation = None
2022-03-28 12:51:09,957 - INFO - allennlp.common.params - training_data.preprocessing = None
2022-03-28 12:51:09,959 - INFO - allennlp.common.params - quantitative_analyses.unitary_results = None
2022-03-28 12:51:09,960 - INFO - allennlp.common.params - quantitative_analyses.intersectional_results = None
2022-03-28 12:51:09,961 - INFO - allennlp.common.params - model_ethical_considerations.ethical_considerations = None
2022-03-28 12:51:09,962 - INF

2022-03-28 12:51:29,456 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.1.attention.output.LayerNorm.bias
2022-03-28 12:51:29,456 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.1.attention.output.LayerNorm.weight
2022-03-28 12:51:29,457 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.1.attention.output.dense.bias
2022-03-28 12:51:29,458 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.1.attention.output.dense.weight
2022-03-28 12:51:29,459 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.1.attention.self.key.bias
2022-03-28 12:51:29,461 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.1.attention.self.key.weight
2022-03-28 12:51:29,462 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.1.attention.self.query.bias
2022-03-28 12:51:29,462 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.1.attention.self.query.weight
2022-03-28 12:51:29,463 - INFO - allennlp.nn.initial

2022-03-28 12:51:29,512 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.3.attention.self.query.weight
2022-03-28 12:51:29,513 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.3.attention.self.value.bias
2022-03-28 12:51:29,515 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.3.attention.self.value.weight
2022-03-28 12:51:29,515 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.3.intermediate.dense.bias
2022-03-28 12:51:29,516 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.3.intermediate.dense.weight
2022-03-28 12:51:29,517 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.3.output.LayerNorm.bias
2022-03-28 12:51:29,517 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.3.output.LayerNorm.weight
2022-03-28 12:51:29,518 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.3.output.dense.bias
2022-03-28 12:51:29,519 - INFO - allennlp.nn.initializers -    bert_model.encoder.la

2022-03-28 12:51:29,580 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.7.output.dense.bias
2022-03-28 12:51:29,581 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.7.output.dense.weight
2022-03-28 12:51:29,582 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.8.attention.output.LayerNorm.bias
2022-03-28 12:51:29,583 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.8.attention.output.LayerNorm.weight
2022-03-28 12:51:29,584 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.8.attention.output.dense.bias
2022-03-28 12:51:29,584 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.8.attention.output.dense.weight
2022-03-28 12:51:29,587 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.8.attention.self.key.bias
2022-03-28 12:51:29,588 - INFO - allennlp.nn.initializers -    bert_model.encoder.layer.8.attention.self.key.weight
2022-03-28 12:51:29,588 - INFO - allennlp.nn.initializers -    bert_

In [7]:
#functions to create model predictions for a list containing sentences
### added by pia, edited by Goya ###

def predict_srl(data):
    pred = []
    for d in data:
        pred.append(srl_predictor.predict(d))
    return pred


def predict_srlbert(data):
    pred = []
    for d in data:
        pred.append(srlbert_predictor.predict(d))
    return pred

predict_srl = PredictorWrapper.wrap_predict(predict_srl)
predict_srlbert = PredictorWrapper.wrap_predict(predict_srlbert)

### Define output file paths

In [8]:
#create lists to store test sentences and model predictions in 
test_data = []
SRLBERT_predictions = []
SRL_predictions = []

In [9]:
#define paths to output files
test_sents_path = './JSON_test_and_predict_files/test_data_negation.json'
bert_pred_path = './JSON_test_and_predict_files/BERT_predictions_negation.json'
srl_pred_path = './JSON_test_and_predict_files/SRL_predictions_negation.json'

#set name of current capability
capability = 'negation'

### Load Checklist tests (Load functions defined in utils)
Load functions to test arguments are correctly identified

In [10]:
expect_arg0_verb0 = Expect.single(found_arg0_verb0)
expect_arg0_verb1 = Expect.single(found_arg0_verb1)
expect_arg1_verb0 = Expect.single(found_arg1_verb0)
expect_arg1_verb1 = Expect.single(found_arg1_verb1)
expect_arg2_verb0 = Expect.single(found_arg2_verb0)
expect_arg2_verb1 = Expect.single(found_arg2_verb1)
expect_argloc_verb0 = Expect.single(found_argloc_verb0)
expect_argloc_verb1 = Expect.single(found_argloc_verb1)
expect_argmnr_verb0 = Expect.single(found_arg_manner_verb0)
expect_argmnr_verb1 = Expect.single(found_arg_manner_verb1)

### Load wordlists to use in sample sentences

In [11]:
# initialize editor object
editor = Editor()

#negation words
neg = ["did not", "would not", "should not", "could not", "does not", "doesn't", "didn't", "wouldn't", "shouldn't", "couldn't"]
#activities
activity = ['does the dishes', 'attends the party', 'prepares dinner', 'makes breakfast', 'hosts the event', 'takes the picture', 'watches tv all day']
neg_activity = ['do the dishes', 'attend the party', 'prepare dinner', 'make breakfast', 'host the event', 'take the picture', 'watch tv all day']

# a list of verbs to use in the test cases
patient_verbs = ['kissed', 'killed', 'hurt', 'touched', 'ignored', 'silenced', 'hit', 'greeted']
patient_neg_verbs = ['kiss', 'kill', 'hurt', 'touch', 'ignore', 'silence', 'hit', 'greet']
#names
english_firstname = editor.lexicons.female_from.United_Kingdom + editor.lexicons.male_from.United_Kingdom
#instruments
instrument = ['knife', 'stone', 'bottle', 'table', 'chair', 'fist', 'rollerblade', 'shoelace', 'discoball', 'fork', 'racket']
#locations
locations = ['in the kitchen', 'in the hallway', 'at the busstop', 'at university', 'on the street', 'in the supermarket', 'on the balcony', 'at the theatre', 'in the museum', 'on the roof']
#lists of manner words to test
manner_adv = ['gently', 'softly', 'powerfully', 'wisely', 'quickly', 'slowly', 'patiently', 'tactically', 'generously', 'blatantly', 'kindly']
manner_verbs = ['hit', 'kicked', 'stopped', 'touched', 'missed', 'smashed']
manner_verbs_neg = ['hit', 'kick', 'stop', 'touch', 'miss', 'smash']

## Tests
### Agent recognition  invariance

In [12]:
#create samples
testcase_name = 'agent_base'
t = editor.template("{first_name} {activity}", activity=activity, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg0_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    4 (4.0%)

Example fails:
[ARG0: Carolyn] [V: does] [ARG0: the dishes]
----
[ARG0: Keith] [V: does] [ARG0: the dishes]
----
[ARGM-DIS: Dan] [V: does] [ARG1: the dishes]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


In [13]:
#create samples
testcase_name = 'agent_negated'
t = editor.template("{first_name} {neg} {activity}", activity=neg_activity, neg=neg, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg0_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    2 (2.0%)

Example fails:
[ARGM-DIS: Dan] did [ARGM-NEG: n't] [V: do] [ARG1: the dishes]
----
[ARGM-DIS: Andrea] did [ARGM-NEG: n't] [V: host] [ARG1: the event]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    1 (1.0%)

Example fails:
[ARGM-DIS: Andrea] did [ARGM-NEG: n't] [V: host] [ARG1: the event]
----


### Patient recognition

In [14]:
#create samples
testcase_name = 'patient_base'
t = editor.template("{first_name} {verb} {first} yesterday.", first=english_firstname, verb=patient_verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg1_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    3 (3.0%)

Example fails:
[ARG1: Chris] [V: hit] [ARG2: Norman] [ARGM-TMP: yesterday] .
----
[ARG0: Philip] [V: hurt] [ARGM-TMP: Judith] [ARGM-TMP: yesterday] .
----
[ARG1: Annie] [V: greeted] [ARG2: Pamela] [ARGM-TMP: yesterday] .
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


In [15]:
#create samples
testcase_name = 'patient_negated'
t = editor.template("{first_name} {neg} {verb} {first}.", first=english_firstname, neg=neg, verb=patient_neg_verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg1_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    1 (1.0%)

Example fails:
[ARG0: Albert] [ARGM-MOD: would] [ARGM-NEG: n't] [V: kiss] [ARGM-EXT: Norman] .
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


### Instrument recognition

In [16]:
#create samples
testcase_name = 'instrument_base'
t = editor.template("{first_name} killed {firstname} with a {instrument}.", instrument=instrument, firstname=english_firstname, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg2_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    47 (47.0%)

Example fails:
[ARG0: Jim] [V: killed] [ARG1: Albert] [ARGM-MNR: with a knife] .
----
[ARG0: Sandra] [V: killed] [ARG1: Laura] [ARGM-MNR: with a bottle] .
----
[ARG0: Al] [V: killed] [ARG1: Sara] [ARGM-MNR: with a discoball] .
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    32 (32.0%)

Example fails:
[ARG0: Helen] [V: killed] Rose with a discoball .
----
[ARG0: Andrew] [V: killed] [ARG1: Tom] [ARGM-MNR: with a table] .
----
[ARG0: Melissa] [V: killed] [ARG1: Rose] with a table .
----


In [17]:
#create samples
testcase_name = 'instrument_negated'
t = editor.template("{first_name} {neg} kill {firstname} with a {instrument}.", neg=neg, instrument=instrument, firstname=english_firstname, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_arg2_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    55 (55.0%)

Example fails:
[ARG0: Jonathan] did [ARGM-NEG: n't] [V: kill] [ARG1: Margaret] [ARGM-MNR: with a fork] .
----
[ARG0: Deborah] did [ARGM-NEG: not] [V: kill] [ARG1: Hugh] [ARGM-MNR: with a fist] .
----
[ARG0: Diana] does [ARGM-NEG: not] [V: kill] [ARG1: Nigel] [ARGM-MNR: with a fist] .
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    3 (3.0%)

Example fails:
[ARG0: Matthew] [ARGM-MOD: should] [ARGM-NEG: not] [V: kill] [ARG1: Caroline] [ARGM-MNR: with a racket] .
----
[ARG0: Amy] [ARGM-MOD: could] [ARGM-NEG: not] [V: kill] [ARG1: Gordon] [ARGM-MNR: with a racket] .
----
[ARG0: Stephen] did [ARGM-NEG: n't] [V: kill] [ARG1: Patricia] [ARGM-MNR: with a racket] .
----


### Location 

In [18]:
#create samples
testcase_name = 'location_base'
t = editor.template("{first_name} {verb} {firstname} {location}.", verb=patient_verbs, firstname=english_firstname, location=locations, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_argloc_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    10 (10.0%)

Example fails:
[ARG0: Ellen] [V: touched] [ARG1: Adam] [ARG2: on the street] .
----
[ARG0: Colin] [V: touched] [ARG1: Bobby] [ARG2: on the roof] .
----
[ARG0: Patricia] [V: kissed] [ARGM-PRD: Charlotte on the balcony] .
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


In [19]:
#create samples
testcase_name = 'location_negated'
t = editor.template("{first_name} {neg} {verb} {firstname} {location}.", neg=neg, verb=patient_neg_verbs, firstname=english_firstname, location=locations, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_argloc_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    23 (23.0%)

Example fails:
[ARG0: Greg] does [ARGM-NEG: not] [V: touch] [ARG1: Kathleen] [ARG2: at the busstop] .
----
[ARG0: Virginia] [ARGM-MOD: would] [ARGM-NEG: n't] [V: ignore] [ARG1: Martin in the kitchen] .
----
[ARG0: Anna] [ARGM-MOD: would] [ARGM-NEG: n't] [V: ignore] [ARG1: Jonathan in the hallway] .
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


### Manner

In [20]:
#create samples
testcase_name = 'manner_base'
t = editor.template("{first_name} {verb} the ball {manner}", manner=manner_adv, verb=manner_verbs, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_argmnr_verb0, format_srl_verb0, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    12 (12.0%)

Example fails:
[ARG0: Sara] [V: kicked] [ARG1: the ball] [ARGM-PRD: tactically]
----
[ARG0: Kim] [V: stopped] [ARG1: the ball tactically]
----
[ARG0: Patrick] [V: missed] [ARG1: the ball tactically]
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


In [21]:
#create samples
testcase_name = 'manner_negated'
t = editor.template("{first_name} {neg} {verb} the ball {manner}", neg=neg, manner=manner_adv, verb=manner_verbs_neg, meta=True, nsamples=100)

#make and store predictions for the two models
test_data, SRL_predictions, SRLBERT_predictions = predict_and_store(t, capability, testcase_name, \
                                                                    expect_argmnr_verb1, format_srl_verb1, \
                                                                    predict_srl, predict_srlbert, test_data, \
                                                                    SRL_predictions, SRLBERT_predictions)

SRL
Predicting 100 examples
Test cases:      100
Fails (rate):    8 (8.0%)

Example fails:
[ARGM-MNR: Kathy] [ARGM-MOD: should] [ARGM-NEG: not] [V: smash] [ARG1: the ball] [ARGM-MNR: kindly]
----
[ARG0: Matthew] does [ARGM-NEG: n't] [V: miss] [ARG1: the ball tactically]
----
[ARG0: Arthur] [ARGM-MOD: should] [ARGM-NEG: n't] [V: hit] [ARG1: the ball] tactically
----
SRL BERT
Predicting 100 examples
Test cases:      100
Fails (rate):    0 (0.0%)


### Store all data to JSON

In [22]:
#store the test sentences
store_data(test_sents_path, test_data, new_file=True)
#store the model predictions
store_data(bert_pred_path, SRLBERT_predictions, new_file=True)
store_data(srl_pred_path, SRL_predictions, new_file=True)