Skip to content

lm_head and v_head, why re-initialize and why dropout? #43

Closed
@clam004

Description

@clam004

First off, thank you for building this! 3 questions regarding the two heads of the policy model:

  1. why re-initialize the weights in the language model head in
class GPT2HeadWithValueModel

     self.lm_head = nn.Linear(config.n_embd, config.vocab_size, bias=False)

when a trained lm_head already exist in GPT2LMHeadModel?

  1. why does the model still speak coherently before training even though the lm_head weights of the model are random?

from 01-gpt2-with-value-head.ipynb

My most favourite movie is Captain America: Civil War, which moved into the
My least favourite movie is Jon Favreau's Log Horizon, complete with psychedelic
  1. Why use dropout on your value ? value is not like the entire layer of a neural network where you dont want the model to reply too heavily on one activate, value is the one and only signal you get for that layer, so why drop it out?
  (v_head): ValueHead(
    (summary): Linear(in_features=768, out_features=1, bias=True)
    (activation): Identity()
    (first_dropout): Dropout(p=0.1, inplace=False)
    (last_dropout): Identity()
    (flatten): Flatten(start_dim=1, end_dim=-1)
  )

Thanks again!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions