GPT's (and GPT-2's) architecture #11

ZheMann · 2019-07-13T18:05:45Z

@rkfg This issue does not specifically concern this repository, but perhaps you could give some more insight into GPT's architecture. In their paper it is stated that GPT (and GPT-2) is a multi-layer decoder-only Transformer. From a higher perspective I can understand that an encoder+decoder architecture is useful for sequence2sequence applications, but becomes less attractive for language modeling tasks. Therefore, it seems logical OpenAI decided to stick with the multi-layer decoder only. However, during the training/fine-tuning stage of GPT, in these decoding-layers, tokens are still encoded and eventually decoded, right?

I'm not sure whether my question is clear, but it basically comes down to this: in GPT's paper it is stated that they use a decoder-only transformer, but I cannot find their arguments for this decision. Why not just use the regular Transformer architecture for example?

Thanks in advance!

ZheMann changed the title ~~GPT-2's architecture~~ GPT's (and GPT-2's) architecture Jul 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT's (and GPT-2's) architecture #11

GPT's (and GPT-2's) architecture #11

ZheMann commented Jul 13, 2019 •

edited

Loading

GPT's (and GPT-2's) architecture #11

GPT's (and GPT-2's) architecture #11

Comments

ZheMann commented Jul 13, 2019 • edited Loading

ZheMann commented Jul 13, 2019 •

edited

Loading