Conversation
parlai/agents/hugging_face/dict.py
Outdated
super()._define_special_tokens(opt) | ||
if not opt["add_special_tokens"]: | ||
# the original pad token '|<endoftext>|' has another usage in global history end token. | ||
self.tokenizer.add_special_tokens({"pad_token": DIALOGPT_PAD_TOKEN}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also just make the mask in model.forward do nothing if NULL_IDX == END_IDX, idk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea what you suggest also solves my problem. I'm just wondering adding pad token here regardless would be closer to fixing a root cause? or changes here would have any corner cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, even simpler is just set NULL_IDX = -1 and don't add any special tokens. It already says bs=1 is mandatory for not add_special_tokens
, and all we really need is for the |<endoftext>|
token to not be masked out.
I think we need a test |
@@ -26,6 +26,12 @@ class DialoGPTDecoder(GPT2Decoder): | |||
This decoder is initialized with the pretrained model from Hugging Face. | |||
""" | |||
|
|||
def __init__(self, opt, dict): | |||
super().__init__(opt, dict) | |||
if opt.get('batchsize', 1) == 1 and self.END_IDX == self.NULL_IDX: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is the latter condition not going to be true? if you are inheriting from this model but changing things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when -bs 1 --add_special_token True
? basically I only want to override the NULL_IDX if it's the same as END_IDX.
tests/nightly/gpu/test_dialogpt.py
Outdated
""" | ||
Ensures dialogpt provides the same generation results regardless of batchsize. | ||
""" | ||
for batchsize in [2, 2, 4, 2]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I do not understand, why the batch size 2 is repeated many times?
- Since you have 4 utterances, I think it is not a bad idea to test with a batch size that results in the last batch being less than batch size (for example 3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah This is me testing generation consistensy on randomized initialization. The pr is work in progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
): | ||
# get around the dual usage of end_idx that would otherwise mask endtoken during forward pass. | ||
null_idx = -1 | ||
warn_once("WARNING: null_idx is set to -1 otherwise null_idx = end_idx") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the warning? IDTS, no?
tests/nightly/gpu/test_gpt2.py
Outdated
|
||
|
||
class TestGpt2(unittest.TestCase): | ||
warn_once( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a comment, not a warning plz
Patch description
Problem description: Unable to reproduce the model generation of DialoGPT under
-bs 1
after this pr landed (Fix error with using GPT2 + DistributedDataParallel #3207) even though it does not intend to change anything for-bs 1
.It turns out in DialoGPT, both pad token and global history end token share the same
|<endoftext>|
(thusNULL_IDX = END_IDX
), which after this pr landed, would cause the attention mask NOT to mask away end token.(https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/hugging_face/gpt2.py#L93).In this pr, I reset NULL_IDX within the DialoGPTModel if
NULL_IDX == END_IDX and bs == 1 and add_special_token=False
The test examples for reproducibility of DialoGPTModel generation before and after (https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/hugging_face/gpt2.py#L93) are
with each being a (input, output) pair.
Testing steps
CI
Logs
Other information
Data tests (if applicable)
If you added a new teacher, you will be asked to run
python tests/datatests/test_new_tasks.py
. Please paste this log here.