Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Commit

Permalink
Fast ACUTE in Mephisto (#3297)
Browse files Browse the repository at this point in the history
* Refactor existing crowdsourcing end-to-end tasks

* Update import

* Add new files

* Pass in model config directly

* Sample model config

* Update RunScriptConfigs

* Param tweaks

* Clarify task directory var

* Clarify var

* Fix JSON

* Various fixes

* Dump Fast ACUTE test

* Revisions

* Work on unit test

* Try to use data regressions

* Finish prototype

* Various fixes

* Move

* Fix file issue

* Fix tests

* Fix tests

* Fix tests

* Temp tweak

* Fix turn annotations static test

* Temp raise import errors

* Call analysis

* Check analysis inputs

* Various fixes

* Various fixes

* Minor

* Pytest fixtures

* Fix fixture

* Fix Mephisto version

* Bump reqs again

* Partial work on tests

* Clean up fast acute tests

* Add more tests

* Add remaining tests

* Comment out some functions for now

* Don't yield in superclass

* Try to make fast ACUTE code work

* Fix tests

* Temp test to understand why tests aren't working on CI

* Revert "Temp test to understand why tests aren't working on CI"

This reverts commit b51680f.

* Temporarily block the Q-function runs

* Modify fixture

* Run setup/teardown once per function

* Revert "Run setup/teardown once per function"

This reverts commit 9732bd7.

* Now just disable base fast acute

* Revert "Now just disable base fast acute"

This reverts commit 2a3500e.

* Just give time for a worker to be registered

* Waiting for longer before retrying

* Is it about alphabetical order?

* Another rename

* Add back in setup/teardown for chat demo

* Lint

* Remove old ACUTE code

* Revert temp crowdsourcing changes

* Typo

* Cleanup

* Fix dir

* More cleanup

* Tweaks

* Rename variant

* Lint

* PR changes

* TODO for future flags

* Tweak

* More nuanced waiting

* Get example scripts to work

* Fix import

* Add back in dependency

* Defaults fix

* Path tweak

* Analysis tweaks

* Move blueprints to their own file

* Black

* Convenience message

* Don't remove old ACUTE-Eval in this PR

* README notes
  • Loading branch information
EricMichaelSmith committed Dec 12, 2020
1 parent 32e10b4 commit e661a15
Show file tree
Hide file tree
Showing 55 changed files with 8,027 additions and 1,354 deletions.
2 changes: 1 addition & 1 deletion .circleci/config.yml
Expand Up @@ -172,7 +172,7 @@ commands:
command: |
cd ..
git clone git@github.com:facebookresearch/Mephisto.git Mephisto
cd Mephisto; git checkout 8f315bfe42ba9643164d3d3d61d7d22609b3ab10 -b stable
cd Mephisto; git checkout v0.3.0 -b stable
pip install -r requirements.txt
python setup.py develop
# `echo` so that ENTER will be pressed at the prompt
Expand Down
932 changes: 465 additions & 467 deletions parlai/crowdsourcing/tasks/acute_eval/webapp/package-lock.json

Large diffs are not rendered by default.

101 changes: 101 additions & 0 deletions parlai/crowdsourcing/tasks/fast_acute/README.md
@@ -0,0 +1,101 @@
# Fast ACUTE

**NOTE**: this code is a nearly feature-complete version of the code in [`parlai.mturk.tasks.acute_eval`](https://github.com/facebookresearch/ParlAI/tree/master/parlai/mturk/tasks/acute_eval), which will be deprecated soon. The only missing features in this version are the ability to run ACUTE-Evals on ParlAI tasks (datasets), as well as minor differences with rendering conversations in HTML using the analysis script. Use the old version of this task if those features are needed.

The scripts in this directory will allow you to run all the steps of [ACUTE-Eval](https://github.com/facebookresearch/ParlAI/tree/master/parlai/crowdsourcing/tasks/acute_eval) with one simple command. Two types of Fast ACUTE can be run:
1. The base version (`run.py`), which includes having models chat with each other (known as "self-chats")
1. A variant that skips self-chats (`run_no_self_chat.py`)

Both types are discussed below.

## How to run Fast ACUTE if you need to produce model self-chats

### 1. Choose the self-chat task

First, determine which ParlAI task you will use to run model self-chat on. This task must have been set up for self-chat, i.e. it must have the appropriate worlds (typically called with the `parlai self_chat` command) used for conducting self-chat.

### 2. Create a file that specifies model configurations

Create a JSON file of the ParlAI parameters used for running self-chat on all models: see `task_config/model_config.json` for an example file. The parameters are any that you would need to specify to run self-chat on the command line.

### 3. Define settings for running onboarding

Create a JSON file of onboarding settings used to make sure that crowdsourcing workers perform necessary quality checks. See `task_config/onboarding.json` for an example file, and see the [ACUTE-Eval README](https://github.com/facebookresearch/ParlAI/blob/master/parlai/crowdsourcing/tasks/acute_eval/README.md) for more details.

### 4. Run Fast ACUTEs

Now that you've set up everything, launch Fast ACUTEs in the sandbox with a command like the following:
```
python parlai/crowdsourcing/tasks/fast_acute/run.py \
mephisto.blueprint.config_path=${PATH_TO_MODEL_CONFIG_JSON} \
mephisto.blueprint.models=\'model1,model2,model3\' \
mephisto.blueprint.num_self_chats=100 \
mephisto.blueprint.root_dir=${DIR_TO_SAVE_IN} \
mephisto.blueprint.onboarding_path=${PATH_TO_ONBOARDING_JSON} \
mephisto.blueprint.task=${SELF_CHAT_TASK}
```

You can also specify running Fast ACUTEs between only specific model pairs, with a syntax like `mephisto.blueprint.model_pairs=model1:model2`. In this case, the `mephisto.blueprint.models` flag is not used.

When you are ready to run a **live** ACUTE-Eval, add `mephisto.provider.requester_name=${REQUESTER_NAME} mephisto/architect=heroku` to this command, where `${REQUESTER_NAME}` is the MTurk requester name that you specified when setting up Mephisto.

Fast ACUTE operates in three phases:

#### 4a. Self-chat

First, the script attempts to run self-chat with all models; each models' self-chats are saved in a path in `${ROOT_DIR}/self_chats/` that is unique according to the model and self-chat task. This saves the trouble of re-running self-chat if a corresponding self-chat file already exists.

#### 4b. ACUTE-Eval

The script will then prepare each conversation-pairs file and save it in `${ROOT_DIR}/pairings_files/`, with a unique string according to which two self-chat files were used to create it. It will then run ACUTE-Eval with appropriate arguments.

#### 4c. Analysis

After finishing ACUTE-Eval, the script will analyze the results, and save files with information such as the win/loss rate and significance scores. Tables of results are saved to `${ROOT_DIR}/acute_results/<date>/`.

Generated result files include the following:
1. A CSV file of the win rates of all model pairs, as is typically shown when displaying ACUTE-Eval results in papers. These can be viewed by running a command like `cat acute_eval_<timestamp>.grid.csv | column -t -s, | less -S`.
2. A CSV file of the statistical significances of results, given by the *p*-values of the win rates of model pairs. View these with `cat acute_eval_<timestamp>.significance.csv | column -t -s, | less -S`.
3. HTML files of nicely visualized conversations.

**NOTE**: Analysis can be run on its own by calling `analysis.py`, specifying the ACUTE-Eval `run_id` and the `root_dir` that you used when running Fast ACUTE:
```
python parlai/crowdsourcing/tasks/fast_acute/analysis.py \
--root-dir ${FAST_ACUTE_ROOT_DIR} \
--run-id ${RUN_ID}
```
Use `--outdir` to save analysis results in a custom folder.


## How to run Fast ACUTE if you already have model self-chats

### 1. Create a file that specifies model configurations

Save a JSON file containing paths to all model self-chat files. The file should have the following structure:
```
{
"model1": {
"log_path": "/path/to/model1/selfchats.jsonl",
"is_selfchat": true
},
"model2": {
"log_path": "/path/to/model2/selfchats.jsonl",
"is_selfchat": true
}
}
```

See `task_config/self_chats/` for examples of what these self-chat files should look like.

### 2. Build ACUTE-Eval pairs and run

Launch Fast ACUTEs with a command like the following:
```
python parlai/crowdsourcing/tasks/fast_acute/run_no_self_chat.py \
mephisto.blueprint.config_path=${PATH_TO_MODEL_SELFCHAT_JSON} \
mephisto.blueprint.models=\'model1,model2,model3\' \
mephisto.blueprint.num_self_chats=100 \
mephisto.blueprint.root_dir=${DIR_TO_SAVE_IN} \
mephisto.blueprint.onboarding_path=${PATH_TO_ONBOARDING_JSON}
```
Here, the `mephisto.blueprint.task` parameter is not needed because we are not running self-chats.
5 changes: 5 additions & 0 deletions parlai/crowdsourcing/tasks/fast_acute/__init__.py
@@ -0,0 +1,5 @@
#!/usr/bin/env python3

# Copyright (c) Facebook, Inc. and its affiliates.
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.

0 comments on commit e661a15

Please sign in to comment.