Add Flax image captioning example #14864

ydshieh · 2021-12-21T16:51:12Z

What does this PR do?

Add run_image_captioning_flax.py (modified from run_summarization_flax.py).

Who can review

Examples: @patil-suraj + cc @patrickvonplaten @NielsRogge @sgugger for info

patil-suraj

Great work @ydshieh !
The examples look nice. I left a few comments below, specifically the initialization logic looks quite complex, it would be nice if we could keep it simple so the script would be easier to read.

Maybe just not support training from scratch if it's not useful and adds a lot of code. It's important to keep the script simple so users could understand it and modify it to their needs.

examples/flax/image-captioning/README.md

examples/flax/image-captioning/run_image_captioning_flax.py

sgugger

Thanks for your PR!

I think this example tries to do too much in one file and should be made easier to read. Examples are not supposed to do everything for every possible model, and here there are way too many lines just for the model creation which makes it super hard to read. Users would benefit more from simpler code that they can understand and customize IMO.

examples/flax/image-captioning/run_image_captioning_flax.py

ydshieh · 2021-12-23T18:21:06Z

@patil-suraj , @sgugger

Thank you for the comments. I will make the example much simpler by only supporting loading pre-trained vision encoder and text models.

About casting jax array to np arrary, there is a significant slow down (at least, in image examples) when using jax array as indices for accessing datasets.Dataset. I will re-produce the timing comparison.

ydshieh · 2021-12-31T13:14:30Z

Hi, @patil-suraj @sgugger

I simplified the config/model initialization parts (only support loading pretrained encoder & decoder).

For @patil-suraj

About using numpy array instead of jnp.array when it comes to datasets,

For this line

transformers/examples/flax/image-captioning/run_image_captioning_flax.py

Line 854 in 650fb4a

_ds = ds.select(selected_indices)

takes 30 seconds (for selecting 16384 elements) using jax.numpy, while using numpy only takes 0.005 second.

For this line (take 256 elements - with image data)

transformers/examples/flax/image-captioning/run_image_captioning_flax.py

Line 348 in 650fb4a

batch = dataset[selected_indices]

jax.numpy: 0.45 second / numpy: 0.10 - 0.15 second

A singe training step (global batch size: 256 images) takes < 0.5 seconds on TPU.

Due to this significant differences in processing speed, I think it is worth keeping using numpy when dealing with datasets.
Let me know if you have different opinions about this :-)

patrickvonplaten · 2022-01-03T09:33:44Z

examples/flax/image-captioning/run_image_captioning_flax.py

+    # Replicate the train state on each device
+    state = state.replicate()
+
+    if training_args.do_train:


think we can reduce the lines quite significantly here by making use of somehow using f"Num {dataset.keys()}"

Hi, the keys() is available for 'datasets.dataset_dict.DatasetDict' (the one with the different splits), and it gives dict_keys(['train', 'validation', 'test']).

And after taking the splits, like train_dataset = dataset['train'], it becomes 'datasets.arrow_dataset.Dataset' and there is no more keys() method.

It's not clear to me how to use it at this place.

patrickvonplaten

This looks already really nice. I think we should try to make the notebook a bit easier (maybe at the expense of not covering every edge case) and try to shorten some of the code a bit.

Co-authored-by: Suraj Patil <surajp815@gmail.com>

patil-suraj

It's looking good now! I left a few more comments. It would be nice if we could just use from_pretrained instead of from_encoder_decoder_pretrained as @patrickvonplaten suggested.

examples/flax/image-captioning/README.md

examples/flax/image-captioning/run_image_captioning_flax.py

patil-suraj · 2022-01-04T13:53:11Z

examples/flax/image-captioning/run_image_captioning_flax.py

+    def decay_mask_fn(params):
+        flat_params = traverse_util.flatten_dict(params)
+        layer_norm_params = [
+            (name, "scale") for name in ["self_attn_layer_norm", "layernorm_embedding", "final_layer_norm"]
+        ]
+        flat_mask = {path: (path[-1] != "bias" and path[-2:] not in layer_norm_params) for path in flat_params}
+        return traverse_util.unflatten_dict(flat_mask)


What is the default decoder model, is it bart or gpt2 ?

I used GPT2 for my image captioning training. (no default value in the argument)

examples/flax/image-captioning/run_image_captioning_flax.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

…m/ydshieh/transformers into add_flax_example_image_captioning

ydshieh · 2022-01-05T15:58:00Z

Hi

support from_pretrained (the model creation from encoder/decoder is done in another script)
README updated
rename to coco_dataset_script to avoid confusion
other nits applied

Thanks for the reviews :-)

patil-suraj

LGTM! Great work @ydshieh and thanks a lot for being patient with the review :)

@patrickvonplaten do you want to take another look?

examples/flax/image-captioning/README.md

patil-suraj · 2022-01-05T16:09:59Z

examples/flax/image-captioning/create_model_from_encoder_decoder_models.py

+"""
+Create a VisionEncoderDecoderModel instance from pretrained encoder/decoder models.
+
+The cross-attention will be randomly initialized.
+"""


examples/flax/image-captioning/run_image_captioning_flax.py

patil-suraj · 2022-01-05T16:11:07Z

examples/flax/image-captioning/run_image_captioning_flax.py

+    # We use `numpy.ndarray` to interact with `datasets.Dataset`, since using `jax.numpy.array` to index into a
+    # dataset is significantly slow. Using JAX array at the 1st place is only to keep JAX's PRNGs generation
+    # mechanism, which works differently from NumPy/SciPy.


Nice comment!

examples/flax/image-captioning/run_image_captioning_flax.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

patrickvonplaten · 2022-01-06T12:48:47Z

examples/flax/image-captioning/create_model_from_encoder_decoder_models.py

+    model.config.decoder_start_token_id = decoder_start_token_id
+    model.config.pad_token_id = pad_token_id
+
+    feature_extractor = AutoFeatureExtractor.from_pretrained(model_args.encoder_model_name_or_path)


(nit) Is there no ...Processor class? Would be nice to save with a Processor class so that it can be loaded with AutoProcessor. But I'm fine with doing it in a follow-up PR

We could probably add a generic processor class that can accept any tokenizer and feature extractor, we have one such processor for the VisionTextDudalEncoder model https://github.com/huggingface/transformers/blob/master/src/transformers/models/vision_text_dual_encoder/processing_vision_text_dual_encoder.py

Not very familiar with Processor class, but it seems to me for composite models, e.g. (Vision or Text) Encoder-Decoder models (not standalone like Bart or Marian), the encoder & decoder's feature extractors and/or tokenizers are not packed into a single class, at this moment.

We can discuss if it is good to create something like VisionEncoderDecoderProcessor, EncoderDecoderProcessor etc

patrickvonplaten

Good for me for merge. Could combine tokenizer and feature extractor into a processor class but happy to do it in a follow-up PR

* add image captioning example * update README * fix style & quality * simplify * apply review suggestions * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * Apply review suggestions * add comments about using np instead jax array * remove unused lines * add model creation script * only support from_pretrained * fix style * fix * not use cache_dir when creating model * fix tokenizer creation * update README * fix quality * apply suggestion * simplify some blocks * Update examples/flax/image-captioning/README.md * Update examples/flax/image-captioning/run_image_captioning_flax.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * apply suggestion Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com>

ydshieh changed the title ~~Add Flax image captioning example~~ [WIP] Add Flax image captioning example Dec 21, 2021

ydshieh changed the title ~~[WIP] Add Flax image captioning example~~ Add Flax image captioning example Dec 21, 2021

patil-suraj reviewed Dec 21, 2021

View reviewed changes

patil-suraj requested a review from patrickvonplaten December 21, 2021 18:14

sgugger reviewed Dec 23, 2021

View reviewed changes

ydshieh marked this pull request as draft December 23, 2021 21:06

ydshieh changed the title ~~Add Flax image captioning example~~ [WIP] Add Flax image captioning example Dec 23, 2021

ydshieh added 4 commits December 31, 2021 13:46

add image captioning example

5b8bcfe

update README

129da4f

fix style & quality

5ddf697

simplify

650fb4a

ydshieh force-pushed the add_flax_example_image_captioning branch from 807c92f to 650fb4a Compare December 31, 2021 12:47

ydshieh marked this pull request as ready for review December 31, 2021 13:14

ydshieh changed the title ~~[WIP] Add Flax image captioning example~~ Add Flax image captioning example Dec 31, 2021

patrickvonplaten reviewed Jan 3, 2022

View reviewed changes

ydshieh and others added 2 commits January 3, 2022 12:17

apply review suggestions

feb414d

Apply suggestions from code review

e39771d

Co-authored-by: Suraj Patil <surajp815@gmail.com>

patrickvonplaten requested a review from patil-suraj January 4, 2022 11:32

patil-suraj reviewed Jan 4, 2022

View reviewed changes

ydshieh and others added 8 commits January 4, 2022 22:55

Apply suggestions from code review

a2a350f

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Apply review suggestions

c945850

Merge branch 'add_flax_example_image_captioning' of https://github.co…

907cc6c

…m/ydshieh/transformers into add_flax_example_image_captioning

add comments about using np instead jax array

3c87293

remove unused lines

8c19aa7

add model creation script

ed86c19

only support from_pretrained

0233ed8

fix style

60a6922

ydshieh added 7 commits January 5, 2022 11:31

fix

3a74bb1

not use cache_dir when creating model

0cf8c3d

fix tokenizer creation

b0105c6

update README

f800ad9

fix quality

122d15a

apply suggestion

b76a60a

simplify some blocks

42b3768

patil-suraj approved these changes Jan 5, 2022

View reviewed changes

ydshieh and others added 3 commits January 5, 2022 17:24

Update examples/flax/image-captioning/README.md

eef5dc6

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Update examples/flax/image-captioning/run_image_captioning_flax.py

be1b4b4

Co-authored-by: Suraj Patil <surajp815@gmail.com>

apply suggestion

2c08d64

patrickvonplaten reviewed Jan 6, 2022

View reviewed changes

patrickvonplaten approved these changes Jan 6, 2022

View reviewed changes

patil-suraj merged commit 9f89fa0 into huggingface:master Jan 6, 2022

ydshieh deleted the add_flax_example_image_captioning branch May 5, 2022 10:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Flax image captioning example #14864

Add Flax image captioning example #14864

ydshieh commented Dec 21, 2021

patil-suraj left a comment

sgugger left a comment

ydshieh commented Dec 23, 2021

ydshieh commented Dec 31, 2021 •

edited

patrickvonplaten Jan 3, 2022

ydshieh Jan 3, 2022

patrickvonplaten left a comment

patil-suraj left a comment

patil-suraj Jan 4, 2022

ydshieh Jan 4, 2022

ydshieh commented Jan 5, 2022

patil-suraj left a comment

patil-suraj Jan 5, 2022

patil-suraj Jan 5, 2022

patrickvonplaten Jan 6, 2022

patil-suraj Jan 6, 2022

ydshieh Jan 6, 2022 •

edited

patrickvonplaten left a comment

Add Flax image captioning example #14864

Add Flax image captioning example #14864

Conversation

ydshieh commented Dec 21, 2021

What does this PR do?

Who can review

patil-suraj left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

ydshieh commented Dec 23, 2021

ydshieh commented Dec 31, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

patil-suraj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh commented Jan 5, 2022

patil-suraj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh Jan 6, 2022 • edited

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

ydshieh commented Dec 31, 2021 •

edited

ydshieh Jan 6, 2022 •

edited