A Rationale-Centric Framework for Human-in-the-loop Machine Learning (ACL2022)

This repository is associated with the paper A Rationale-Centric Framework for Human-in-the-loop Machine Learning (Accepted to the main conference of ACL2022)

Usage

Dependencies

Tested Python 3.6, and requiring the following packages, which are available via PIP:

Required: numpy >= 1.19.5
Required: scikit-learn >= 0.21.1
Required: pandas >= 1.1.5
Required: torch >= 1.9.0
Required: transformers >= 4.8.2
Required: datasets>=1.14.0
Required: nltk>=3.6.5

Top-level directory layout

.
├── datasets                   # IMDb datasets, human labelled rationales, counterfactuals examples (Hovy et al.)
├── AL_results                 # Experimental outputs of baseline active learning
├── DP_results                 # Experimental outputs of baseline duplication
├── RR_results                 # Experimental outputs of baseline random replacement
├── MR_results                 # Experimental outputs of baseline missing rationales
├── FR_results                 # Experimental outputs of baseline false rationales
├── SF_results                 # Experimental outputs of our approach static semi-factuals
├── full_results               # Experimental outputs of baseline training with the full training set
├── Hybrid_results             # Experimental outputs of our approach dynamic human-intervened correction
└── README.md

Preliminaries

For running the code, you should add some code in trainer.py (as shown below) under the transformers directory (in my device on ~/Anaconda3/Lib/site-packages/transformers/trainer.py):

Random sampling a certain number (25 in our experiments) of positives and negatives storing in AL_results/AL_step0_IMDb_trainer_{seed}_ {num_instances_each_class}/keys.txt
Please run scripts with step0 first, then go to IMDb_AL_example_selection_step1.ipynb to extract another 50 examples from the unlabelled pool according to uncertainty sampling. Then can run scripts with step1.

Generate static semi-factual augmented examples by replacing non-rationales

See static_semi_factual_generation.ipynb

Generate false rationales augmented data

Run IMDb_step1_generate_false_rationales_position.py Then see IMDb_generate_false_rationales_examples.ipynb

Generate missing rationales augmented data

Run IMDb_step1_generate_missing_rationales_examples.py

In-domain and OOD test

For Yelp and Amazon OOD test please go OOD_Testing_Amazon and OOD_Testing_Yelp, for in-domain and other OOD testing please go In-domain_OOD_all.py. Since the size limited, we won't put the OOD data on github, you can search and download those datasets online freely or contact jinghui.lu@ucdconnect.ie or lujinghui1@sensetime.com

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
AL_results/AL_step0_IMDb_trainer_2019_25		AL_results/AL_step0_IMDb_trainer_2019_25
DP_results/IMDb_step0_dp_trainer_2019_25_7		DP_results/IMDb_step0_dp_trainer_2019_25_7
FR_results/IMDb_step0_fr_trainer_2019_25_7		FR_results/IMDb_step0_fr_trainer_2019_25_7
Hybrid_results/IMDb_step0_hybrid_trainer_2019_25_7		Hybrid_results/IMDb_step0_hybrid_trainer_2019_25_7
MR_results/IMDb_step0_mr_trainer_2019_25_7		MR_results/IMDb_step0_mr_trainer_2019_25_7
RR_results/IMDb_step0_rr_trainer_2019_25_7		RR_results/IMDb_step0_rr_trainer_2019_25_7
SF_results/IMDb_step0_sf_trainer_2019_25_7		SF_results/IMDb_step0_sf_trainer_2019_25_7
datasets		datasets
full_results/IMDb_full_trainer_2019/runs/Mar27_14-54-19_DESKTOP-D5Q6O83		full_results/IMDb_full_trainer_2019/runs/Mar27_14-54-19_DESKTOP-D5Q6O83
plots		plots
IMDb_AL_example_selection_step1.ipynb		IMDb_AL_example_selection_step1.ipynb
IMDb_full.py		IMDb_full.py
IMDb_generate_false_rationales_examples.ipynb		IMDb_generate_false_rationales_examples.ipynb
IMDb_step0_AL.py		IMDb_step0_AL.py
IMDb_step0_duplication.py		IMDb_step0_duplication.py
IMDb_step0_random_replacement.py		IMDb_step0_random_replacement.py
IMDb_step0_static_semi_factual.py		IMDb_step0_static_semi_factual.py
IMDb_step1_false_rationales.py		IMDb_step1_false_rationales.py
IMDb_step1_generate_false_rationales_position.ipynb		IMDb_step1_generate_false_rationales_position.ipynb
IMDb_step1_generate_false_rationales_position.py		IMDb_step1_generate_false_rationales_position.py
IMDb_step1_generate_missing_rationales.py		IMDb_step1_generate_missing_rationales.py
IMDb_step1_generate_missing_rationales_examples.ipynb		IMDb_step1_generate_missing_rationales_examples.ipynb
IMDb_step1_generate_missing_rationales_examples.py		IMDb_step1_generate_missing_rationales_examples.py
IMDb_step1_hybrid.py		IMDb_step1_hybrid.py
IMDb_step1_missing_rationales.py		IMDb_step1_missing_rationales.py
In-domain_OOD_all.py		In-domain_OOD_all.py
OOD_Testing_amazon_subsample.py		OOD_Testing_amazon_subsample.py
OOD_yelp.py		OOD_yelp.py
README.md		README.md
lossoutput.txt		lossoutput.txt
static_semi_factual_generation.ipynb		static_semi_factual_generation.ipynb

GeorgeLuImmortal/RDL-Rationales-centric-Double-robustness-Learning

Folders and files

Latest commit

History

Repository files navigation

A Rationale-Centric Framework for Human-in-the-loop Machine Learning (ACL2022)

Usage

Dependencies

Top-level directory layout

Preliminaries

Generate static semi-factual augmented examples by replacing non-rationales

Generate false rationales augmented data

Generate missing rationales augmented data

In-domain and OOD test

About

Resources

Stars

Watchers

Forks

Languages