Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

fix dialogpt dual usage of END_IDX #3256

Merged
merged 8 commits into from Dec 16, 2020
Merged

fix dialogpt dual usage of END_IDX #3256

merged 8 commits into from Dec 16, 2020

Conversation

jxmsML
Copy link
Contributor

@jxmsML jxmsML commented Nov 10, 2020

Patch description

test_cases = [
            ("What a nice weather!", "I'm in the UK and it's raining here."),
            ("Nice to meet you!", "Hello! I'm from the future!"),
        ]

with each being a (input, output) pair.

Testing steps
CI

Logs

Other information

Data tests (if applicable)
If you added a new teacher, you will be asked to run
python tests/datatests/test_new_tasks.py. Please paste this log here.

@jxmsML jxmsML changed the title fix dialogpt dual usage of endoftext fix dialogpt dual usage of END_IDX Nov 10, 2020
super()._define_special_tokens(opt)
if not opt["add_special_tokens"]:
# the original pad token '|<endoftext>|' has another usage in global history end token.
self.tokenizer.add_special_tokens({"pad_token": DIALOGPT_PAD_TOKEN})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also just make the mask in model.forward do nothing if NULL_IDX == END_IDX, idk.

Copy link
Contributor Author

@jxmsML jxmsML Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea what you suggest also solves my problem. I'm just wondering adding pad token here regardless would be closer to fixing a root cause? or changes here would have any corner cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, even simpler is just set NULL_IDX = -1 and don't add any special tokens. It already says bs=1 is mandatory for not add_special_tokens, and all we really need is for the |<endoftext>| token to not be masked out.

@stephenroller
Copy link
Contributor

I think we need a test

@jxmsML jxmsML marked this pull request as draft November 11, 2020 15:46
@@ -26,6 +26,12 @@ class DialoGPTDecoder(GPT2Decoder):
This decoder is initialized with the pretrained model from Hugging Face.
"""

def __init__(self, opt, dict):
super().__init__(opt, dict)
if opt.get('batchsize', 1) == 1 and self.END_IDX == self.NULL_IDX:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is the latter condition not going to be true? if you are inheriting from this model but changing things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when -bs 1 --add_special_token True ? basically I only want to override the NULL_IDX if it's the same as END_IDX.

"""
Ensures dialogpt provides the same generation results regardless of batchsize.
"""
for batchsize in [2, 2, 4, 2]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I do not understand, why the batch size 2 is repeated many times?
  • Since you have 4 utterances, I think it is not a bad idea to test with a batch size that results in the last batch being less than batch size (for example 3).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah This is me testing generation consistensy on randomized initialization. The pr is work in progress.

@jxmsML jxmsML marked this pull request as ready for review December 8, 2020 22:50
Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

):
# get around the dual usage of end_idx that would otherwise mask endtoken during forward pass.
null_idx = -1
warn_once("WARNING: null_idx is set to -1 otherwise null_idx = end_idx")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the warning? IDTS, no?



class TestGpt2(unittest.TestCase):
warn_once(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a comment, not a warning plz

@stephenroller stephenroller merged commit a2adbe1 into master Dec 16, 2020
@stephenroller stephenroller deleted the dialogptdebug branch December 16, 2020 01:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants