Why the activation function is tanh in BertPooler #782

xinliweiyuan · 2019-07-12T01:23:47Z

I found the activation function in the BertPooler layer is tanh, but Bert never mentions that it uses the tanh. It says gelu activation function is applied in the paper.

So why there is a tanh here ? Waiting for some explanation. Thanks.

class BertPooler(nn.Module):
    def __init__(self, config):
        super(BertPooler, self).__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        self.activation = nn.Tanh()

    def forward(self, hidden_states):
        # We "pool" the model by simply taking the hidden state corresponding
        # to the first token.
        first_token_tensor = hidden_states[:, 0]
        pooled_output = self.dense(first_token_tensor)
        pooled_output = self.activation(pooled_output)
        return pooled_output

The text was updated successfully, but these errors were encountered:

thomwolf · 2019-07-12T06:30:06Z

Because that's what Bert's authors do in the official TF code:
https://github.com/google-research/bert/blob/bee6030e31e42a9394ac567da170a89a98d2062f/modeling.py#L231

tonyduan · 2020-06-10T01:49:02Z

Just wanted to point out for future reference the motivation has been answered by the original BERT authors in [this GitHub issue].

thomwolf closed this as completed Jul 13, 2019

NielsRogge mentioned this issue Nov 25, 2020

Documentation and source for RobertaClassificationHead #8776

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the activation function is tanh in BertPooler #782

Why the activation function is tanh in BertPooler #782

xinliweiyuan commented Jul 12, 2019 •

edited

Loading

thomwolf commented Jul 12, 2019

tonyduan commented Jun 10, 2020 •

edited

Loading

Why the activation function is tanh in BertPooler #782

Why the activation function is tanh in BertPooler #782

Comments

xinliweiyuan commented Jul 12, 2019 • edited Loading

thomwolf commented Jul 12, 2019

tonyduan commented Jun 10, 2020 • edited Loading

xinliweiyuan commented Jul 12, 2019 •

edited

Loading

tonyduan commented Jun 10, 2020 •

edited

Loading