Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAPAS tanh activation on the pooling layer #14543

Closed
xhluca opened this issue Nov 27, 2021 · 2 comments
Closed

TAPAS tanh activation on the pooling layer #14543

xhluca opened this issue Nov 27, 2021 · 2 comments

Comments

@xhluca
Copy link
Contributor

xhluca commented Nov 27, 2021

I noticed the following in the TAPAS pooling layer:

# Copied from transformers.models.bert.modeling_bert.BertPooler
class TapasPooler(nn.Module):
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config.hidden_size, config.hidden_size)
self.activation = nn.Tanh()
def forward(self, hidden_states):
# We "pool" the model by simply taking the hidden state corresponding
# to the first token.
first_token_tensor = hidden_states[:, 0]
pooled_output = self.dense(first_token_tensor)
pooled_output = self.activation(pooled_output)
return pooled_output

I'm curious about the use of nn.Tanh(). I wasn't able to find more information about that activation in the paper. Is it possible to know where it comes from? Thanks!

@NielsRogge
Copy link
Contributor

Hi,

The TAPAS authors borrowed this from the original BERT paper, which decided to apply a tanh layer.

The BERT author explains why he did that here.

@xhluca
Copy link
Contributor Author

xhluca commented Nov 27, 2021

Ah thanks, you are right. They indeed use tanh in the code: https://github.com/google-research/tapas/blob/f3d9f068e6eedb252883049b582516a1294ff951/tapas/models/bert/modeling.py#L269-L277

Wish it was mentioned in the appendix of the TAPAS paper 🤷 Thanks for clarifying!

@xhluca xhluca closed this as completed Nov 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants