-
Notifications
You must be signed in to change notification settings - Fork 26.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation and source for RobertaClassificationHead
#8776
Comments
Actually, the final hidden representation of the For the second question, actually BERT does the same, it is just implemented differently. In So your confusion probably comes from the different ways in which this is implemented in BERT vs RoBERTa, and the meaning of |
Thank you very much for the explanation @NielsRogge ! So this makes it consistent for the HuggingFace transformers library. But do you know the origin of it (now I am interested for both models)? Why is the |
Interesting question! Turns out this has already been asked before here and the answer by the author is here. |
Thank you again @NielsRogge ! |
The docstring for
RobertaForSequenceClassification
saysLooking at the code, this does not seem correct. Here, the RoBERTa output is fed into an instance of the class
RobertaClassificationHead
, which feeds the pooled output into a multilayer feedforward-network with one hidden layer and tanh activation. So this is more than only a simple linear layer.I have two questions:
I would be glad if someone could shed light on this.
The text was updated successfully, but these errors were encountered: