New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Flax Masked Language Modeling training example #8728

Merged

mfuntowicz merged 36 commits into master from flax-lm-example

Dec 9, 2020

Member

mfuntowicz commented Nov 23, 2020 •

edited

Include a training example running with Flax/JAX framework. (cc @avital @marcvanzee)

TODOs:

Make the collator working with Numpy/JAX array
Make sure the training actually works on larger scale
Make it possible to train from scratch
Support TPU (bfloat16)
Support GPU amp (float16)
Improve overall UX

mfuntowicz requested review from thomwolf, LysandreJik and sgugger

November 23, 2020 14:00

avital reviewed

View reviewed changes

src/transformers/modeling_flax_bert.py Outdated Show resolved Hide resolved

thomwolf reviewed

View reviewed changes

examples/language-modeling/run_mlm_wwm_flax.py Outdated

+                      # Model forward
+                      # TODO: Remove this conversion by replacing the collator
+                      model_inputs = {var_name: tensor.numpy() for var_name, tensor in model_inputs.items()}
+                      loss, optimizer = training_step(optimizer, model_inputs)

Member

thomwolf Nov 23, 2020

This is really cool!

Can you also easily have a learning rate schedule?

Member Author

mfuntowicz Nov 25, 2020

Yap sure, didn't put it in at first, focusing on making things clear and almost a no-brainer 😄.

Will look at it soon 👍

Member Author

mfuntowicz Dec 1, 2020

Now included 👍

thomwolf reviewed

View reviewed changes

examples/language-modeling/run_mlm_wwm_flax.py Outdated

+                  for epoch in track(range(int(training_args.num_train_epochs)), description="Training..."):
+                      samples_idx = np.random.choice(len(tokenized_datasets["train"]), (training_args.train_batch_size, ))
+                      samples = [tokenized_datasets["train"][idx.item()] for idx in samples_idx]
+                      model_inputs = data_collator(samples)

Member

thomwolf Nov 23, 2020

What's your first impression feeling on having a FLAX Trainer with a similar API to the PT Trainer at some point @sgugger?

Collaborator

sgugger Nov 24, 2020

Doesn't look like it's going to be too hard to build.

sgugger reviewed

View reviewed changes

Collaborator

sgugger left a comment

Argh, you actually took the one example that is a bit flaky (we merged DataCollatorForWholeWordMasking a bit too fast and the data preprocessing part of this script needs to be completely rewritten as it works for BERT only for now).
Could you do the same with the run_mlm script instead? This one won't change :-)

examples/language-modeling/run_mlm_wwm_flax.py Outdated

+                  for epoch in track(range(int(training_args.num_train_epochs)), description="Training..."):
+                      samples_idx = np.random.choice(len(tokenized_datasets["train"]), (training_args.train_batch_size, ))
+                      samples = [tokenized_datasets["train"][idx.item()] for idx in samples_idx]
+                      model_inputs = data_collator(samples)

Collaborator

sgugger Nov 24, 2020

Doesn't look like it's going to be too hard to build.

mfuntowicz force-pushed the flax-lm-example branch from 3bb9f4f to 7ffb79a Compare

November 30, 2020 09:33

marcvanzee reviewed

View reviewed changes

examples/language-modeling/run_mlm_flax.py Outdated

+              Here is the full list of checkpoints on the hub that can be fine-tuned by this script:
+              https://huggingface.co/models?filter=masked-lm
+              """
+              # You can also adapt this script on your own masked language modeling task. Pointers for this are left as comments.

Contributor

marcvanzee Dec 1, 2020

Suggested change

      
            # You can also adapt this script on your own masked language modeling task. Pointers for this are left as comments.
          
            # You can also adapt this script to your own masked language modeling task. Pointers for this are left as comments.

Suggested change

      
            # You can also adapt this script on your own masked language modeling task. Pointers for this are left as comments.
          
            # You can also adapt this script on your own masked language modeling task. Pointers for this are left as comments.

examples/language-modeling/run_mlm_flax.py

+              # See the License for the specific language governing permissions and
+              # limitations under the License.
+              """
+              Fine-tuning the library models for masked language modeling (BERT, ALBERT, RoBERTa...) with whole word masking on a

Contributor

marcvanzee Dec 1, 2020

I find it quite useful to have a comment at the top of a binary with an example command-line command allowing users to run this code directly. What do you think of this?

examples/language-modeling/run_mlm_flax.py

+              @dataclass
+              class ModelArguments:
+                  """
+                  Arguments pertaining to which model/config/tokenizer we are going to fine-tune, or train from scratch.

Contributor

marcvanzee Dec 1, 2020

It seems we will never fine-tune with this code, right? At least it looks like the model always is FlaxBertForMaskedLM, which has a pre-training objective.

examples/language-modeling/run_mlm_flax.py

Comment on lines +141 to +168

+                          if self.train_file is not None:
+                              extension = self.train_file.split(".")[-1]
+                              assert extension in ["csv", "json", "txt"], "`train_file` should be a csv, a json or a txt file."
+                          if self.validation_file is not None:
+                              extension = self.validation_file.split(".")[-1]
+                              assert extension in ["csv", "json", "txt"], "`validation_file` should be a csv, a json or a txt file."

Contributor

marcvanzee Dec 1, 2020

Maybe create some inner function check_file_extension to avoid code duplication?

src/transformers/models/bert/modeling_flax_bert.py Outdated Show resolved Hide resolved

examples/language-modeling/run_mlm_flax.py Outdated

+                  #     return -jnp.mean(jnp.sum(one_hot(labels, config.vocab_size) * logits, axis=-1), axis=-1)
+                  #
+                  def cross_entropy(logits, targets, label_smoothing=0.0):

Contributor

marcvanzee Dec 1, 2020

Maybe factor this function out of training_step to make it easier to read?

examples/language-modeling/run_mlm_flax.py Outdated Show resolved Hide resolved

mfuntowicz and others added 21 commits

December 8, 2020 23:37


          Remove "Model" suffix from Flax models to look more 🤗

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>


          Initial working (forward + backward) for Flax MLM training example.

0639cdb

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>


          Simply code

6d92a0d

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>


          Addressing comments, using module and moving to LM task.

c709240

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>


          Restore parameter name "module" wrongly renamed model.

744625f

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>


          Restore correct output ordering...

3464f97

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>


          Actually commit the example 😅

170ccd0

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>


          Add FlaxBertModelForMaskedLM after rebasing.

1b6b992

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Make it possible to initialize the training from scratch

ac62563

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Reuse flax linen example of cross entropy loss

ecacf7f

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Added specific data collator for flax

818790b

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Remove todo for data collator

1428df6

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Added evaluation step

f52bcc3

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Added ability to provide dtype to support bfloat16 on TPU

28ceba0

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Enable flax tensorboard output

6578fe8

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Enable jax.pmap support.

8cbd8f5

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Ensure batches are correctly sized to be dispatched with jax.pmap

c0d1958

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Enable bfloat16 with --fp16 cmdline args

f0d199c

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Correctly export metrics to tensorboard

6c5f7d8

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Added dropout and ability to use it.

334ffa4

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Effectively enable & disable during training and evaluation steps.

e8bde23

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

mfuntowicz added 7 commits

December 8, 2020 23:37


          Oops.

8b89476

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Enable specifying kernel initializer scale

41f82b8

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Style.

907e026

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Added warmup step to the learning rate scheduler.

0f9f101

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Fix typo.

d55309c

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Print training loss

aa394bd

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Make style

7dd4a85

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

mfuntowicz force-pushed the master branch from 447808c to 18c32ee Compare

December 8, 2020 22:38

mfuntowicz force-pushed the flax-lm-example branch from ffc1f34 to 7dd4a85 Compare

December 8, 2020 22:45


          fix linter issue (flake8)

73e87c2

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

mfuntowicz marked this pull request as ready for review

December 8, 2020 22:50

mfuntowicz and others added 7 commits

December 8, 2020 23:58


          Fix model matching

ea5a20a

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Fix dummies

d37a163

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Fix non default dtype on Flax models

e97eef1

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Use the same create_position_ids_from_input_ids for FlaxRoberta

ca55946

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Make Roberta attention as Bert

06d7dc3

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          fix copy

41825de

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>


          Wording.

098a0a5

Co-authored-by: Marc van Zee <marcvanzee@gmail.com>

mfuntowicz merged commit 7562714 into master

mfuntowicz deleted the flax-lm-example branch

December 9, 2020 16:13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

avital avital left review comments

marcvanzee marcvanzee left review comments

thomwolf thomwolf left review comments

sgugger sgugger left review comments

LysandreJik Awaiting requested review from LysandreJik