Refactor Code samples; Test code samples #5036

LysandreJik · 2020-06-15T22:17:06Z

Refactoring the code samples in order to prevent copy/pasting the same code samples across classes while updating the model/tokenizer classes and checkpoint names.

All models now have their docstrings updated.
Doctest is used for testing
Fixed a bunch of bugs in all docstrings as well as a few models. All non-cosmetic changes are highlighted below.

codecov · 2020-06-15T23:08:42Z

Codecov Report

Merging #5036 into master will increase coverage by 0.22%.
The diff coverage is 97.44%.

@@            Coverage Diff             @@
##           master    #5036      +/-   ##
==========================================
+ Coverage   79.08%   79.30%   +0.22%     
==========================================
  Files         138      138              
  Lines       24078    24265     +187     
==========================================
+ Hits        19041    19243     +202     
+ Misses       5037     5022      -15

Impacted Files	Coverage Δ
src/transformers/configuration_albert.py	`100.00% <ø> (ø)`
src/transformers/configuration_bart.py	`93.75% <ø> (ø)`
src/transformers/configuration_bert.py	`100.00% <ø> (ø)`
src/transformers/configuration_ctrl.py	`97.05% <ø> (ø)`
src/transformers/configuration_distilbert.py	`100.00% <ø> (ø)`
src/transformers/configuration_electra.py	`100.00% <ø> (ø)`
src/transformers/configuration_encoder_decoder.py	`100.00% <ø> (ø)`
src/transformers/configuration_gpt2.py	`97.22% <ø> (ø)`
src/transformers/configuration_longformer.py	`100.00% <ø> (ø)`
src/transformers/configuration_mobilebert.py	`97.05% <ø> (ø)`
... and 50 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24f46ea...a9bb134. Read the comment docs.

sgugger · 2020-06-15T23:32:26Z

This is amazing! This way we won't do as many mistakes while copy-pasting code for introducing those task-specific models :-)

thomwolf

Yes I like this idea a lot! Will be easier to add more Tensorflow examples as well!

thomwolf · 2020-06-16T01:41:32Z

src/transformers/file_utils.py

+PYTORCH_MULTIPLE_CHOICE_CODE_SAMPLE_DOCSTRING = r"""
+    Examples::
+
+        from transformers import BertTokenizer, BertForMultipleChoice


This should be « {tokenizer_class}, {model_class} «

thomwolf · 2020-06-16T01:42:07Z

src/transformers/file_utils.py

+        from transformers import BertTokenizer, BertForMultipleChoice
+        import torch
+
+        tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')


Change also here and below

thomwolf · 2020-06-16T01:43:54Z

src/transformers/file_utils.py

+        choice1 = "It is eaten while held in the hand."
+        labels = torch.tensor(0).unsqueeze(0)  # choice0 is correct (according to Wikipedia ;)), batch size 1
+
+        encoding = tokenizer.batch_encode_plus([[prompt, choice0], [prompt, choice1]], return_tensors='pt', pad_to_max_length=True)


And we probably want to remove batch_encode_plus here hahah

LysandreJik · 2020-06-25T18:12:15Z

src/transformers/modeling_tf_openai.py

@@ -633,7 +610,7 @@ def call(
            mc_token_ids = inputs[6] if len(inputs) > 6 else mc_token_ids
            output_attentions = inputs[7] if len(inputs) > 7 else output_attentions
            assert len(inputs) <= 8, "Too many inputs."
-        elif isinstance(inputs, dict):
+        elif isinstance(inputs, (dict, BatchEncoding)):


Non-cosmetic change

LysandreJik · 2020-06-25T18:12:29Z

src/transformers/modeling_tf_roberta.py

-        self.roberta = TFBertMainLayer(config, name="roberta")
+        self.roberta = TFRobertaMainLayer(config, name="roberta")


Non cosmetic change

LysandreJik · 2020-06-25T18:12:44Z

src/transformers/modeling_tf_transfo_xl.py

@@ -863,7 +843,7 @@ def call(
            labels = inputs[4] if len(inputs) > 4 else labels
            output_attentions = inputs[5] if len(inputs) > 5 else output_attentions
            assert len(inputs) <= 6, "Too many inputs."
-        elif isinstance(inputs, dict):
+        elif isinstance(inputs, (BatchEncoding, dict)):


Non cosmetic change

LysandreJik · 2020-06-25T18:13:36Z