Multiple bugs for evaluating selfplay #9

HMJiangGatech · 2020-06-03T05:17:40Z

In README, Section 5 Scoring:

airdialogue score --pred_data ./data/out_dir/dev_selfplay_out.txt \
                  --true_data ./data/airdialogue/tokenized/dev.selfplay.eval.data \
                  --true_kb ./data/airdialogue/tokenized/dev.selfplay.eval.kb \
                  --task selfplay \
                  --output ./data/out_dir/dev_selfplay.json

It loads tokenized true_data and true_kb.
However according to
https://github.com/josephch405/airdialogue/blob/c74072f8667d92839dc39e98b386ce8e932c8c68/airdialogue/evaluator/evaluator_main.py#L240-L256
, it actually needs json files.
May be change it to

                  --true_data ./data/airdialogue/json/dev_data.json \
                  --true_kb ./data/airdialogue/json/dev_kb.json \

?

After fixing the previous bug, another one appears:

https://github.com/josephch405/airdialogue/blob/c74072f8667d92839dc39e98b386ce8e932c8c68/airdialogue/evaluator/evaluator_main.py#L247

it process pred_json_obj['action'] using action_obj_to_str. This step, however, has been done when generating dev_selfplay_out.txt

maybe remove action_obj_to_str?

After that, another one appears:
https://github.com/josephch405/airdialogue/blob/c74072f8667d92839dc39e98b386ce8e932c8c68/airdialogue/evaluator/evaluator_main.py#L252

pred_json_obj is not compatible with json_obj_to_tokens, where pred_json_obj do not have key dialogue. Instead pred_json_obj has a key called utterance

I can get the program run via replacing that line by

pred_raw_text = pred_json_obj['utterance'].replace('<t1> ','').replace('<t2> ','').split(' ')

However, it think that may not be the optimal solution.

The text was updated successfully, but these errors were encountered:

josephch405 · 2020-06-08T17:42:57Z

This is addressed in a working version of a PR at the Airdialogue repository - will close once both are merged in with the README updates. At the moment we're basically doing what you mention in question 3 of transforming between utterances and dialogues, but on the Airdialogue cli side.

josephch405 · 2020-06-17T21:03:16Z

Address in #10

josephch405 closed this as completed Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple bugs for evaluating selfplay #9

Multiple bugs for evaluating selfplay #9

HMJiangGatech commented Jun 3, 2020

josephch405 commented Jun 8, 2020

josephch405 commented Jun 17, 2020

Multiple bugs for evaluating selfplay #9

Multiple bugs for evaluating selfplay #9

Comments

HMJiangGatech commented Jun 3, 2020

josephch405 commented Jun 8, 2020

josephch405 commented Jun 17, 2020