Add DeeBERT (entropy-based early exiting for *BERT) #5477

ji-xin · 2020-07-02T21:38:19Z

Add DeeBERT (entropy-based early exiting for *BERT).
Paper: https://www.aclweb.org/anthology/2020.acl-main.204/
Based on its original repository: https://github.com/castorini/DeeBERT

codecov · 2020-07-03T00:10:55Z

Codecov Report

Merging #5477 into master will increase coverage by 0.41%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #5477      +/-   ##
==========================================
+ Coverage   77.83%   78.25%   +0.41%     
==========================================
  Files         141      141              
  Lines       24634    24634              
==========================================
+ Hits        19175    19278     +103     
+ Misses       5459     5356     -103

Impacted Files	Coverage Δ
src/transformers/modeling_tf_roberta.py	`43.47% <0.00%> (-49.57%)`	⬇️
src/transformers/modeling_openai.py	`81.09% <0.00%> (+1.37%)`	⬆️
src/transformers/generation_tf_utils.py	`86.71% <0.00%> (+1.50%)`	⬆️
src/transformers/modeling_tf_openai.py	`94.98% <0.00%> (+74.19%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58cca47...f44de41. Read the comment docs.

JetRunner

Thanks for your contribution!

some high-level comments here:

how many iterations do your test run? You may want to reduce the max_iteration a little to make it faster.
/examples/deebert/src/ seems to be better than /examples/deebert/deebert/

JetRunner · 2020-07-03T00:51:57Z

examples/deebert/README.md

+
+## Installation
+
+First, install [pytorch](https://pytorch.org/) and the [transformer library](https://github.com/huggingface/transformers/blob/master/README.md)


This is not necessary.

JetRunner · 2020-07-03T00:52:49Z

examples/deebert/README.md

+    publisher = "Association for Computational Linguistics",
+    url = "https://www.aclweb.org/anthology/2020.acl-main.204",
+    pages = "2246--2251",
+    abstract = "Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. However, they are also notorious for being slow in inference, which makes them difficult to deploy in real-time applications. We propose a simple but effective method, DeeBERT, to accelerate BERT inference. Our approach allows samples to exit earlier without passing through the entire model. Experiments show that DeeBERT is able to save up to {\textasciitilde}40{\%} inference time with minimal degradation in model quality. Further analyses show different behaviors in the BERT transformer layers and also reveal their redundancy. Our work provides new ideas to efficiently apply deep transformer-based models to downstream tasks. Code is available at https://github.com/castorini/DeeBERT.",


You may want to exclude the abstract.

JetRunner · 2020-07-03T11:52:54Z

examples/deebert/requirements.txt

@@ -0,0 +1,5 @@
+boto3


why we need boto3?

JetRunner · 2020-07-03T11:53:34Z

examples/deebert/requirements.txt

+tensorboard
+tensorboardX
+scikit-learn
+seqeval


already included in https://github.com/huggingface/transformers/blob/master/examples/requirements.txt

JetRunner · 2020-07-03T11:54:30Z

examples/deebert/requirements.txt

@@ -0,0 +1,5 @@
+boto3
+tensorboard
+tensorboardX


tensorboardX can be replaced with pytorch built-in tensorboard. So no need for this dependency!

I think all of these are already specified in the top-level requirements from examples folder:

https://github.com/huggingface/transformers/blob/master/examples/requirements.txt

So there's no need to add it here (exept boto3, but it may not be needed here anyway)

Yes! @stefan-it w.r.t. #5477 (comment)

JetRunner · 2020-07-03T12:07:22Z

examples/deebert/run_glue_deebert.py

+            model = model_class.from_pretrained(checkpoint)
+            if args.model_type == "bert":
+                model.bert.encoder.set_early_exit_entropy(args.early_exit_entropy)
+            else:


JetRunner · 2020-07-03T12:13:49Z

examples/deebert/README.md

+There are a few other packages to install. Assuming we are in `transformers/examples/deebert`, simply run
+
+```
+pip install -r requirements.txt
+```


See the comment for requirements.txt

JetRunner · 2020-07-03T12:18:52Z

examples/deebert/deebert/modeling_highway_bert.py

+
+        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
+        # ourselves in which case we just need to make it broadcastable to all heads.
+        if attention_mask.dim() == 3:


transformers/examples/bert-loses-patience/pabee/modeling_pabee_bert.py

Line 155 in 9735533

extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape, device)

we now have a util function. you can refactor these codes

JetRunner · 2020-07-03T12:19:52Z

examples/deebert/deebert/modeling_highway_bert.py

+        # attention_probs has shape bsz x n_heads x N x N
+        # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
+        # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
+        if head_mask is not None:


transformers/examples/bert-loses-patience/pabee/modeling_pabee_bert.py

Line 173 in 9735533

head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

we now have a util function. you can refactor these codes

JetRunner · 2020-07-03T12:21:22Z

examples/deebert/README.md

+#### train_deebert.sh
+
+This is for fine-tuning DeeBERT models.
+
+#### eval_deebert.sh
+
+This is for evaluating each exit layer for fine-tuned DeeBERT models.
+
+#### entropy_eval.sh
+
+This is for evaluating fine-tuned DeeBERT models, given a number of different early exit entropy thresholds.


You may want to elaborate a little on what variable is what in these scripts?

stefan-it · 2020-07-03T14:17:20Z

Btw: would be awesome so see a token classification example 😅

Update test for Deebert

…quirements.txt

ji-xin · 2020-07-03T19:23:12Z

Hi @JetRunner, thanks for the review! I have updated according to your suggestions.

ji-xin · 2020-07-03T19:48:41Z

2 checks fail, however they don't seem relevant to my commits.

JetRunner

Thanks @ji-xin ! It looks much better now!

Please wait for the final approval of @LysandreJik

LysandreJik

Cool, very clean!

LysandreJik · 2020-07-06T22:11:12Z

examples/deebert/src/modeling_highway_bert.py

+
+class DeeBertEncoder(nn.Module):
+    def __init__(self, config):
+        super(DeeBertEncoder, self).__init__()


(nit) we don't need to specify that since we're not python 2 compatible

Suggested change

super(DeeBertEncoder, self).__init__()

super().__init__()

LysandreJik · 2020-07-06T22:17:16Z

examples/deebert/src/modeling_highway_roberta.py

+    def get_input_embeddings(self):
+        return self.embeddings.word_embeddings
+
+    def set_input_embeddings(self, value):
+        self.embeddings.word_embeddings = value


Those should already be defined in the DeeBertModel?

ji-xin · 2020-07-07T20:42:56Z

@LysandreJik Thanks for the comments and I've updated accordingly!

ji-xin force-pushed the deebert branch from fce4acf to 39deaca Compare July 3, 2020 00:04

JetRunner self-requested a review July 3, 2020 00:27

JetRunner suggested changes Jul 3, 2020

View reviewed changes

ji-xin added 4 commits July 3, 2020 14:43

Add readme of deebert

4a52859

Add deebert code

82a6c44

Add test for deebert

8af3c00

Update test for Deebert

Update DeeBert (README, class names, function refactoring); remove re…

f5d859d

…quirements.txt

ji-xin force-pushed the deebert branch from 39deaca to f5d859d Compare July 3, 2020 19:19

Format update

808b63d

Update test

f9a6cb8

ji-xin requested a review from JetRunner July 3, 2020 19:48

JetRunner approved these changes Jul 5, 2020

View reviewed changes

JetRunner requested a review from LysandreJik July 5, 2020 16:42

LysandreJik approved these changes Jul 6, 2020

View reviewed changes

Update readme and model init methods

f44de41

JetRunner merged commit cfbb982 into huggingface:master Jul 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DeeBERT (entropy-based early exiting for *BERT) #5477

Add DeeBERT (entropy-based early exiting for *BERT) #5477

ji-xin commented Jul 2, 2020

codecov bot commented Jul 3, 2020 •

edited

Loading

JetRunner left a comment •

edited

Loading

JetRunner Jul 3, 2020

JetRunner Jul 3, 2020

JetRunner Jul 3, 2020

JetRunner Jul 3, 2020

JetRunner Jul 3, 2020

stefan-it Jul 3, 2020 •

edited

Loading

JetRunner Jul 3, 2020

JetRunner Jul 3, 2020

JetRunner Jul 3, 2020

JetRunner Jul 3, 2020

JetRunner Jul 3, 2020

JetRunner Jul 3, 2020

stefan-it commented Jul 3, 2020

ji-xin commented Jul 3, 2020

ji-xin commented Jul 3, 2020

JetRunner left a comment •

edited

Loading

LysandreJik left a comment

LysandreJik Jul 6, 2020

LysandreJik Jul 6, 2020

ji-xin commented Jul 7, 2020


		## Installation

		First, install [pytorch](https://pytorch.org/) and the [transformer library](https://github.com/huggingface/transformers/blob/master/README.md)

Add DeeBERT (entropy-based early exiting for *BERT) #5477

Add DeeBERT (entropy-based early exiting for *BERT) #5477

Conversation

ji-xin commented Jul 2, 2020

codecov bot commented Jul 3, 2020 • edited Loading

Codecov Report

JetRunner left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefan-it Jul 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefan-it commented Jul 3, 2020

ji-xin commented Jul 3, 2020

ji-xin commented Jul 3, 2020

JetRunner left a comment • edited Loading

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ji-xin commented Jul 7, 2020

codecov bot commented Jul 3, 2020 •

edited

Loading

JetRunner left a comment •

edited

Loading

stefan-it Jul 3, 2020 •

edited

Loading

JetRunner left a comment •

edited

Loading