Add classifier dropout in ALBERT #2679

peteriz · 2020-01-30T08:22:23Z

As mentioned in the original paper, they separated the dropout rates of the transformer cells and the classifier, moreover, in V2 the dropouts are 0 (expect for the classifier, again).

Current implementation does not supports this and models are not training well (can't reproduce results of GLUE benchmark using V2 models). I manually updated these values and got V2 models converging.

This issue was raised in #2337 and also mentioned in google-research/albert#23

I added a separate parameter in the config file and update the sequence classification head.

Please also update the configuration of ALBERT V2 models (base, large, xlarge) in your repository.
More specifically, the configuration of the attention and hidden dropout rates of ALBERT V2 models in your repository as well (see as in https://tfhub.dev/google/albert_base/3, https://tfhub.dev/google/albert_large/3, https://tfhub.dev/google/albert_xlarge/3 and https://tfhub.dev/google/albert_xxlarge/3)

codecov-io · 2020-01-30T08:29:27Z

Codecov Report

Merging #2679 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2679      +/-   ##
==========================================
+ Coverage   74.59%   74.59%   +<.01%     
==========================================
  Files          89       89              
  Lines       14971    14972       +1     
==========================================
+ Hits        11168    11169       +1     
  Misses       3803     3803

Impacted Files	Coverage Δ
src/transformers/modeling_albert.py	`79.14% <100%> (ø)`	⬆️
src/transformers/configuration_albert.py	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83446a8...12c7809. Read the comment docs.

LysandreJik · 2020-01-30T14:52:31Z

That's fantastic, thank you for taking the time to do this @peteriz !

LysandreJik · 2020-01-30T15:03:37Z

The configuration files were updated. The type of GELU activation function used was also changed to "gelu_new", which is the appropriate activation function that is used in the google-research repository.

Original gelu

Our gelu new

Added classifier dropout rate in ALBERT

12c7809

LysandreJik merged commit a538149 into huggingface:master Jan 30, 2020

ahotrod mentioned this pull request Feb 6, 2020

XLNET SQuAD2.0 Fine-Tuning - What May Have Changed? #2651

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add classifier dropout in ALBERT #2679

Add classifier dropout in ALBERT #2679

peteriz commented Jan 30, 2020 •

edited

codecov-io commented Jan 30, 2020 •

edited

LysandreJik commented Jan 30, 2020

LysandreJik commented Jan 30, 2020

Add classifier dropout in ALBERT #2679

Add classifier dropout in ALBERT #2679

Conversation

peteriz commented Jan 30, 2020 • edited

codecov-io commented Jan 30, 2020 • edited

Codecov Report

LysandreJik commented Jan 30, 2020

LysandreJik commented Jan 30, 2020

peteriz commented Jan 30, 2020 •

edited

codecov-io commented Jan 30, 2020 •

edited