Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add classifier dropout in ALBERT #2679

Merged
merged 1 commit into from
Jan 30, 2020

Conversation

peteriz
Copy link
Contributor

@peteriz peteriz commented Jan 30, 2020

As mentioned in the original paper, they separated the dropout rates of the transformer cells and the classifier, moreover, in V2 the dropouts are 0 (expect for the classifier, again).

Current implementation does not supports this and models are not training well (can't reproduce results of GLUE benchmark using V2 models). I manually updated these values and got V2 models converging.

This issue was raised in #2337 and also mentioned in google-research/albert#23

I added a separate parameter in the config file and update the sequence classification head.

Please also update the configuration of ALBERT V2 models (base, large, xlarge) in your repository.
More specifically, the configuration of the attention and hidden dropout rates of ALBERT V2 models in your repository as well (see as in https://tfhub.dev/google/albert_base/3, https://tfhub.dev/google/albert_large/3, https://tfhub.dev/google/albert_xlarge/3 and https://tfhub.dev/google/albert_xxlarge/3)

@codecov-io
Copy link

codecov-io commented Jan 30, 2020

Codecov Report

Merging #2679 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2679      +/-   ##
==========================================
+ Coverage   74.59%   74.59%   +<.01%     
==========================================
  Files          89       89              
  Lines       14971    14972       +1     
==========================================
+ Hits        11168    11169       +1     
  Misses       3803     3803
Impacted Files Coverage Δ
src/transformers/modeling_albert.py 79.14% <100%> (ø) ⬆️
src/transformers/configuration_albert.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83446a8...12c7809. Read the comment docs.

@LysandreJik
Copy link
Member

That's fantastic, thank you for taking the time to do this @peteriz !

@LysandreJik LysandreJik merged commit a538149 into huggingface:master Jan 30, 2020
@LysandreJik
Copy link
Member

The configuration files were updated. The type of GELU activation function used was also changed to "gelu_new", which is the appropriate activation function that is used in the google-research repository.

Original gelu

Our gelu new

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants