GitHub - imcsq/FSE21-MT4MRC: Supplementary material for the ESEC/FSE'21 research paper "Validation on Machine Reading Comprehension Software Without Annotated Labels: A Property-Based Method".

Supplementary Material for paper "Validation on Machine Reading Comprehension Software Without Annotated Labels: A Property-Based Method"

This is the supplementary material for ESEC/FSE'21 research paper "Validation on Machine Reading Comprehension Software Without Annotated Labels: A Property-Based Method".

It contains the test case generating tool, experimental replication package, and detailed experimental results for this paper.

Test Case Generating Tool

We implement a python library to generate the validation input sets with the given one out of seven proposed MRs. Each sample in the input sets is a pair of inputs (one eligible source input and its corresponding follow-up input).

All the codes for this tool are stored in tool directory.

USAGE

At first, you should prepare the necessary dependent libraries for this tool. You can type the command pip install -r requirements.txt to easily construct the experiment environment.
Use from MT4MRC import MRs to import the library, and instantiate a handler with handler = MRs().
Load a raw dataset (no need for labels), from which the tool will pick eligible source inputs. (Current version only adapts the BoolQ datasets used in our experiments. You can extend it to other datasets by considering their data format and fields.)
Iterate all the samples in the dataset and use the handler to produce corresponding follow-up cases with given MR.
Export the generated test cases.

One example to obtain eligible source inputs and follow-up inputs from BoolQ test set with MR1-1:

from MT4MRC import MRs
handler = MRs()
file_path = "boolq_test/test.jsonl"
with open(file_path, "r", encoding="utf-8") as f:
    lines = f.readlines()
inputsets = []
for line in lines:
    cases = handler.generate(data=line, mr="1_1")
    if cases is not None: # judge the eligibility of the sample in 'line'
        inputsets.append( (cases[0], cases[1]) )
        # cases[0] and cases[1] are the source input and follow-up input, respectively
with open("source.jsonl", 'w', encoding='utf-8') as f_source, \
     open("follow-up.jsonl", 'w', encoding='utf-8') as f_followup:
    for case in inputsets: # dump generated inputs into jsonl files
        print(case[0], file=f_source)
        print(case[1], file=f_followup)

Experimental Replication Package

We provide the codes to replicate our experiments, including the scripts to build and train the four objective models and validate the trained models with one generated test case set.

All the codes for replication are stored in replicate directory.

Train Objective Models

Since the four objective models are with different architectures and training paradigms, we provide four independent scripts to realize the building and training for each objective model. The usage of these scripts are as follows:

# RNN
python train_rnn.py --data_dir /path/to/dir_with_train.jsonl --output_dir /path/to/save_model

# BERT
python train_boolq_bert.py \
    --model_type bert --model_name_or_path bert-base-cased \
    --do_train --do_eval --do_lower_case \
    --data_file /path/to/dir_with_train.jsonl \
    --max_seq_length 256 --learning_rate 1e-5 --num_train_epochs 1000 --logging_steps 500 \
    --per_gpu_eval_batch_size=8 --per_gpu_train_batch_size=8 \
    --output_dir /path/to/save_model --tbname boolq_bert

# ROBERTa
python train_boolq_roberta.py \
    --model_type roberta --model_name_or_path roberta-large \
    --do_train --do_eval --do_lower_case \
    --data_file /path/to/dir_with_train.jsonl \
    --max_seq_length 256 --learning_rate 1e-5 --num_train_epochs 1000 --logging_steps 500 \
    --per_gpu_eval_batch_size=8 --per_gpu_train_batch_size=8 \
    --output_dir /path/to/save_model --tbname boolq_roberta

# T5
python train_t5.py --data_dir /path/to/dir_with_train.jsonl --output_dir /path/to/save_model

Validate a Model with Given Input Set

We also provide four independent scripts to validate corresponding objective model. The usage of these scripts are as follows:

# RNN
python eval_rnn.py --mr MRID --data_dir /path/to/dir_with_source&followup.jsonl --model_dir /path/to/saved_model

# BERT
python eval_boolq_bert.py \
    --mr MRID \
    --model_type bert --model_name_or_path bert-base-cased \
    --do_eval --do_lower_case \
    --data_file /path/to/dir_with_source&followup.jsonl \
    --per_gpu_eval_batch_size=8 \
    --output_dir /path/to/saved_model

# ROBERTa
python eval_boolq_roberta.py \
    --mr MRID \
    --model_type roberta --model_name_or_path roberta-large \
    --do_eval --do_lower_case \
    --data_file /path/to/dir_with_source&followup.jsonl \
    --per_gpu_eval_batch_size=8 \
    --output_dir /path/to/saved_model

# T5
python eval_t5.py --mr MRID --data_dir /path/to/data --model_dir /path/to/saved_model

One example to evaluate T5 on BoolQ dev set:

Run with python eval_t5.py --mr 1-1 --data_dir boolq_val/MR1-1/T5 --model_dir /model/T5.
The script will output 0.5470459518599562, where the violation rate 54.70% is thus obtained.

Detailed Experimental Results

Due to the limited space, we do not provide all the detailed results for RQ2 and RQ4 in the paper. Here we release the results for all the four objective models, i.e., RNN, BERT, ROBERTa, and T5.

These detailed results are stored in figure directory.

File Structure of `figure`

figure
    RQ2_full.png: the results of RQ2 on all the four objective models. 
    RQ4_full.csv: the results of RQ4 on all the four objective models.

If you find our paper useful, please kindly cite it as:

@inproceedings{fse21-MT4MRC,
  author = {Chen, Songqiang and Jin, Shuo and Xie, Xiaoyuan},
  editor = {Spinellis, Diomidis and Gousios, Georgios and Chechik, Marsha and Penta, Massimiliano Di},
  title = {Validation on Machine Reading Comprehension Software without Annotated
                 Labels: A Property-Based Method},
  booktitle = {29th {ACM} Joint European Software Engineering Conference
                 and Symposium on the Foundations of Software Engineering, {ESEC/FSE} 2021, Athens,
                 Greece, August 23-28, 2021},
  pages = {590--602},
  publisher = {{ACM}},
  year = {2021},
  doi = {10.1145/3468264.3468569}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
figure		figure
replicate		replicate
tool		tool
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figure

figure

replicate

replicate

tool

tool

LICENSE.md

LICENSE.md

README.md

README.md

Repository files navigation

Supplementary Material for paper "Validation on Machine Reading Comprehension Software Without Annotated Labels: A Property-Based Method"

Test Case Generating Tool

USAGE

Experimental Replication Package

Train Objective Models

Validate a Model with Given Input Set

Detailed Experimental Results

File Structure of `figure`

About

Contributors 2

Languages

License

imcsq/FSE21-MT4MRC

Folders and files

Latest commit

History

Repository files navigation

Supplementary Material for paper "Validation on Machine Reading Comprehension Software Without Annotated Labels: A Property-Based Method"

Test Case Generating Tool

USAGE

Experimental Replication Package

Train Objective Models

Validate a Model with Given Input Set

Detailed Experimental Results

File Structure of figure

About

Resources

License

Stars

Watchers

Forks

Languages

File Structure of `figure`