fix dialogpt dual usage of END_IDX #3256

jxmsML · 2020-11-10T15:31:14Z

Patch description

Problem description: Unable to reproduce the model generation of DialoGPT under -bs 1 after this pr landed (Fix error with using GPT2 + DistributedDataParallel #3207) even though it does not intend to change anything for -bs 1.
It turns out in DialoGPT, both pad token and global history end token share the same |<endoftext>| (thus NULL_IDX = END_IDX), which after this pr landed, would cause the attention mask NOT to mask away end token.(https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/hugging_face/gpt2.py#L93).
In this pr, I reset NULL_IDX within the DialoGPTModel if NULL_IDX == END_IDX and bs == 1 and add_special_token=False
The test examples for reproducibility of DialoGPTModel generation before and after (https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/hugging_face/gpt2.py#L93) are

test_cases = [
            ("What a nice weather!", "I'm in the UK and it's raining here."),
            ("Nice to meet you!", "Hello! I'm from the future!"),
        ]

with each being a (input, output) pair.

Testing steps
CI

Logs

Other information

Data tests (if applicable)
If you added a new teacher, you will be asked to run
python tests/datatests/test_new_tasks.py. Please paste this log here.

stephenroller · 2020-11-10T16:40:47Z

parlai/agents/hugging_face/dict.py

+        super()._define_special_tokens(opt)
+        if not opt["add_special_tokens"]:
+            # the original pad token '|<endoftext>|' has another usage in global history end token.
+            self.tokenizer.add_special_tokens({"pad_token": DIALOGPT_PAD_TOKEN})


you could also just make the mask in model.forward do nothing if NULL_IDX == END_IDX, idk.

yea what you suggest also solves my problem. I'm just wondering adding pad token here regardless would be closer to fixing a root cause? or changes here would have any corner cases?

I mean, even simpler is just set NULL_IDX = -1 and don't add any special tokens. It already says bs=1 is mandatory for not add_special_tokens, and all we really need is for the |<endoftext>| token to not be masked out.

stephenroller · 2020-11-10T19:52:33Z

I think we need a test

emilydinan · 2020-11-13T19:28:24Z

parlai/agents/hugging_face/dialogpt.py

@@ -26,6 +26,12 @@ class DialoGPTDecoder(GPT2Decoder):
    This decoder is initialized with the pretrained model from Hugging Face.
    """

+    def __init__(self, opt, dict):
+        super().__init__(opt, dict)
+        if opt.get('batchsize', 1) == 1 and self.END_IDX == self.NULL_IDX:


when is the latter condition not going to be true? if you are inheriting from this model but changing things?

when -bs 1 --add_special_token True ? basically I only want to override the NULL_IDX if it's the same as END_IDX.

mojtaba-komeili · 2020-11-18T00:45:31Z

tests/nightly/gpu/test_dialogpt.py

+        """
+        Ensures dialogpt provides the same generation results regardless of batchsize.
+        """
+        for batchsize in [2, 2, 4, 2]:


I do not understand, why the batch size 2 is repeated many times?

Since you have 4 utterances, I think it is not a bad idea to test with a batch size that results in the last batch being less than batch size (for example 3).

Ah This is me testing generation consistensy on randomized initialization. The pr is work in progress.

tests/nightly/gpu/test_dialogpt.py

stephenroller

lgtm

stephenroller · 2020-12-14T15:51:41Z

parlai/agents/hugging_face/dialogpt.py

+        ):
+            # get around the dual usage of end_idx that would otherwise mask endtoken during forward pass.
+            null_idx = -1
+            warn_once("WARNING: null_idx is set to -1 otherwise null_idx = end_idx")


Do we need the warning? IDTS, no?

stephenroller · 2020-12-14T15:52:12Z

tests/nightly/gpu/test_gpt2.py



 class TestGpt2(unittest.TestCase):
+    warn_once(


just a comment, not a warning plz

fix dialogpt dual usage of endoftext

6ade0eb

facebook-github-bot added the CLA Signed label Nov 10, 2020

jxmsML requested review from stephenroller and emilydinan November 10, 2020 15:42

jxmsML changed the title ~~fix dialogpt dual usage of endoftext~~ fix dialogpt dual usage of END_IDX Nov 10, 2020

stephenroller reviewed Nov 10, 2020

View reviewed changes

jxmsML marked this pull request as draft November 11, 2020 15:46

add null_idx = -1

0946ef0

jxmsML requested a review from stephenroller November 11, 2020 17:47

emilydinan reviewed Nov 13, 2020

View reviewed changes

dialog bs test

0a37ef0

mojtaba-komeili reviewed Nov 18, 2020

View reviewed changes

stephenroller reviewed Dec 8, 2020

View reviewed changes

tests/nightly/gpu/test_dialogpt.py Outdated Show resolved Hide resolved

jxmsML added 3 commits December 8, 2020 14:31

Set null_idx in model and decoder, add to dialogpt test

7ee25bf

small formats

3665e94

accidental delete old test

38796fd

jxmsML marked this pull request as ready for review December 8, 2020 22:50

jxmsML requested review from stephenroller, emilydinan and mojtaba-komeili December 10, 2020 15:22

stephenroller approved these changes Dec 14, 2020

View reviewed changes

jxmsML added 2 commits December 15, 2020 09:00

reviewer comment

590ceb7

Merge branch 'master' into dialogptdebug

1fec4f8

stephenroller merged commit a2adbe1 into master Dec 16, 2020

stephenroller deleted the dialogptdebug branch December 16, 2020 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix dialogpt dual usage of END_IDX #3256

fix dialogpt dual usage of END_IDX #3256

jxmsML commented Nov 10, 2020 •

edited

stephenroller Nov 10, 2020

jxmsML Nov 10, 2020 •

edited

stephenroller Nov 10, 2020

stephenroller commented Nov 10, 2020

emilydinan Nov 13, 2020

jxmsML Nov 13, 2020

mojtaba-komeili Nov 18, 2020

jxmsML Nov 18, 2020

stephenroller left a comment

stephenroller Dec 14, 2020

stephenroller Dec 14, 2020

fix dialogpt dual usage of END_IDX #3256

fix dialogpt dual usage of END_IDX #3256

Conversation

jxmsML commented Nov 10, 2020 • edited

Choose a reason for hiding this comment

jxmsML Nov 10, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephenroller commented Nov 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephenroller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jxmsML commented Nov 10, 2020 •

edited

jxmsML Nov 10, 2020 •

edited