Add TF ViT MAE #16255

sayakpaul · 2022-03-18T15:51:08Z

This PR adds the MAE [1] model in TensorFlow. It was developed by @ariG23498 and myself.

Fun facts about this PR:

Probably the third pure vision model in TensorFlow in transformers.

References:

[1] Masked Autoencoders Are Scalable Vision Learners

Update

The PR is now ready for review. @gante @Rocketknight1 @sgugger

* partially ported pt methods and classes of vit mae to tensorflow. * ported TFViTMAEIntermediate and TFViTMAEOutput. * chore: addresses PR feedback. * added TFViTMAEModel and started TFViTMAEDecoder. * add: initial implementation of tf vit mae. * fix: model output type. * fix: a bunch of inconsistencies but need to investigate tf.repeat(). * chore: pr feedback. * fix: gather error. * chore: resorted to the original pt vit_mae model. * fix: fix gather error partially (ral this time). * feat: adding tf vit mae; model initializing but weight porting is flawed. * partial fix: fixing the parameter names Fixed the parameter names of dropout layers, decoder_embed and the decoder_layers. This helps with the proper cross-loading of the weights. With proper cross loading the debug test does not pass. My intuition is to dive a little deeper into the computations of the model * chore: applied make style. Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>

OK

…ility.

src/transformers/models/vit_mae/modeling_vit_mae.py

ydshieh · 2022-03-18T17:43:32Z

I just have a quick look and left a comment.

I feel strange that there are style changes like omega = 1.0 / 10000 ** omega # (D/2,). Do you have a previous version of black?

I thought you already updated the version (during your work on TFConvNext), via

pip install -e .[quality]

Maybe you were in a different virtual Python environment while working on this PR?

sayakpaul · 2022-03-18T18:13:35Z

I just have a quick look and left a comment.

I feel strange that there are style changes like omega = 1.0 / 10000 ** omega # (D/2,). Do you have a previous version of black?

I thought you already updated the version (during your work on TFConvNext), via
pip install -e .[quality]
Maybe you were in a different virtual Python environment while working on this PR?

So, first I deactivate the current Python virtual environment and then run the installation and run make style?

I think I should fetch upstream before that and rebase.

ydshieh · 2022-03-18T18:21:16Z

So, first I deactivate the current Python virtual environment and then run the installation and run make style?

Yeah, if your current venv is specific to your other work/projects, and you don't want to change its installed packages.
Maybe it would better if you create a new virtual environment, say, venv-transformers-pr, and switch to it.

I think I should fetch upstream before that and rebase.

You can try it. I always try to have (very) recent commit from master to work on a new PR. Hope the rebase is smooth in your case here.

sayakpaul · 2022-03-19T03:18:12Z

src/transformers/models/vit_mae/modeling_vit_mae.py

+            torch.zeros(1, self.num_patches + 1, config.hidden_size),
+            requires_grad=False,


@ydshieh I am unable to do the rebasing as mentioned in #16255 (comment) as @ariG23498 is the repo owner.

Here's what I did before committing the changes:

I cloned https://github.com/huggingface/transformers in a separate directory.

Initialized a new Python virtual environment.

cd'd to this cloned transformers and ran pip install -e .[quality].

After that from the forked transformers directory (from the tf-mae branch) I ran make style while being in the new virtual environment as mentioned above.

Let me know if these steps are good to go.

I am also new to this usecase. It seems logical and the effects are there. Therefore I think it is OK.

HuggingFaceDocBuilderDev · 2022-03-19T03:45:35Z

The documentation is not available anymore as the PR was closed or merged.

OK

sayakpaul · 2022-03-26T03:40:05Z

@gante @ydshieh

Updates:

Repo consistency tells us that copied components have inconsistencies. Sorry, if this sounds foolish but we could not actually figure out those inconsistencies. Class name changes and other relevant changes in the comment do not count here I hope. To ensure the copying was indeed right we went ahead used a comparator tool to verify if that's the case. Here are the individual links to the comparison results for the components that are said to have copy inconsistencies:

SelfAttention: https://www.diffchecker.com/m5VY19Xl
SelfOutput: https://www.diffchecker.com/j6Pri9fN
MAEAttention: https://www.diffchecker.com/5qtn9IRh
Intermediate: https://www.diffchecker.com/hNy0Sg7c
ViTOutput: https://www.diffchecker.com/CyK4zZUl
Layer: https://www.diffchecker.com/BESXzSTD
Encoder: https://www.diffchecker.com/BXHYohx1

Note that we did run make fix-copies within the environment as described in #16255 (comment) (refer to 565ec4c) and it replaced ViTMAEConfig with ViTConfig which is wrong, I guess.

We have also followed the copy comment format from the PT script ensuring they are well aligned.

What else are we missing out here?

ydshieh · 2022-03-26T08:18:49Z

@ariG23498 @sayakpaul

Probably my previous comment didn't explain things clear enough.

Let's take an example with:

# Copied from transformers.models.vit.modeling_tf_vit.TFViTSelfAttention with TFViT->TFViTMAE
class TFViTMAESelfAttention(tf.keras.layers.Layer):
    def __init__(self, config: ViTMAEConfig, **kwargs):

This # copied from will check if the block is a copy from TFViTSelfAttention after replaceing TFViT by TFViTMAE (in the RAM).
However, there is ViTConfig in the ViT file, and ViTMAEConfig in this block (note: there is no TF prefix here), and the instruction with TFViT->TFViTMAE in the comment # Copied from won't apply to ViTConfig (from the ViT file), and during the check, it is ViTConfig vs ViTMAEConfig, and therefore there is a difference.

In order to fix the issue, they are 2 potential options:

In # Copied from, change with TFViT->TFViTMAE to with ViT->ViTMAE
- This will only work if the 2 blocks in ViT and ViTMAE are indeed the same after this replace!
In # Copied from, change with TFViT->TFViTMAE to with TFViT->TFViTMAE, ViTConfig->ViTMAEConfig
- This is more verbose, but sometimes not really necessary

So I would suggest try option 1 first. If there are remaining places, we can use option 2 for that remaining places.

ydshieh · 2022-03-26T09:59:06Z

For the remaining # copied from issue,

# in ViT, layernorm is applied before self-attention

to

# in ViTMAE, layernorm is applied before self-attention

should work.
(This is also what has done in PyTorch ViTMAE)

ydshieh · 2022-03-26T10:14:46Z

It turns out that there is still

# in ViT, layernorm is also applied after self-attention

to be changed ...I didn't check the whole block previously, sorry.

I don't always pull the latest commit from this PR. In general, it would be easier to detect if you run make fix-copies locally and see what have been changed, and you will get the idea where should be fixed.

ariG23498 · 2022-03-26T10:22:04Z

Thanks for the prompt feedback @ydshieh

It was my bad to not check the entire code block.

ydshieh · 2022-03-26T10:35:03Z

Don't worry.

And just a potential correction (not really important): my previous comment "it would be easier to detect ..." might be not True in this particular line: I don't think there will be a diff visible to this particular line, after running make fix-copies.

ariG23498 · 2022-03-26T10:51:40Z

All green ✅

Thanks @ydshieh for your prompt and valuable feedback!

ydshieh · 2022-03-26T14:15:05Z

tests/vit_mae/test_modeling_vit_mae.py

-        # source: https://discuss.pytorch.org/t/random-seed-that-spans-across-devices/19735
-        torch.manual_seed(2)
+        # make random mask reproducible across the PT and TF model
+        np.random.seed(2)


For this change, it would be great to hear from @NielsRogge too.

Previously we only saw the issue caused by CPU/GPU PRGNs (for the PyTorch ViTMAE).
But now we also see the issue caused by the different frameworks.

I think using numpy is a good choice to address this issue. However, let's wait Niels' opinion.

It was discussed with @NielsRogge beforehand, though.

But I don't think this PR is contingent on this change as it was all well discussed before and reviewed as well. I might wrong though.

But now we also see the issue caused by the different frameworks.

Additionally, could you elaborate on the above?

I know we discussed on Slack. As far as I know, Niels knows we are going to use the new noise argument and he is happy with this change.

However, regarding using numpy instead of torch (and updating the expected slice), I don't remember if Niels is aware of this change and expressed his opinion. I might be wrong (?).

Regarding But now we also see the issue caused by the different frameworks., it is just about we decided to use numpy because the PRNGs from torch and tf will give different results even with the same seed, and we don't want to import torch in tf (non pt-tf cross) test.

Thanks for clarifying!

because the PRNGs from torch and tf will give different results even with the same seed,

I think you probably meant TF PT PRNGs won't produce same results even if they were seeded with the same number?

yes, that's what I mean.

Once Niels is OK with using NumPy, I think it is better to have the comment like

# make random mask reproducible across different frameworks (PyTorch, TensorFlow, etc.), as well as across different accelerators (CPU, GPU, etc.). # note that the same seed on CPU and on GPU doesn’t mean they spew the same random number sequences, # as they both have fairly different PRNGs (for efficiency reasons). # source (for PyTorch): https://discuss.pytorch.org/t/random-seed-that-spans-across-devices/19735

(just at this place and the corresponding place in test_modeling_tf_vit_mae.py should be fine, no need to apply to other places in this PR)

Much more comprehensive. Thanks for suggesting 👌

So Niels is OK with this change, I will just let you update the comment, @sayakpaul . Thank you!

ydshieh · 2022-03-26T21:02:01Z

About shape_list() is a mess for graph mode. We have a plan to get rid of it, and just use .shape everywhere

#16255 (comment)

I was a bit worried it will break some testing at this moment, so I tested with shape. Indeed, the test test_save_load will give error at the line model.save_pretrained(tmpdirname, saved_model=True) - when calling autograph_handler inside. See the full log below.

Full error log

            with tempfile.TemporaryDirectory() as tmpdirname:
>               model.save_pretrained(tmpdirname, saved_model=True)

tests\vit_mae\test_modeling_tf_vit_mae.py:616: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src\transformers\modeling_tf_utils.py:1418: in save_pretrained
    self.save(saved_model_dir, include_optimizer=False, signatures=self.serving)
..\..\..\..\miniconda3\envs\py39\lib\site-packages\keras\utils\traceback_utils.py:67: in error_handler
    raise e.with_traceback(filtered_tb) from None
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = ({'pixel_values': <tf.Tensor 'pixel_values:0' shape=(None, None, None, None) dtype=float32>},)
kwargs = {}

    def autograph_handler(*args, **kwargs):
      """Calls a converted version of original_func."""
      # TODO(mdan): Push this block higher in tf.function's call stack.
      try:
        return autograph.converted_call(
            original_func,
            args,
            kwargs,
            options=autograph.ConversionOptions(
                recursive=True,
                optional_features=autograph_options,
                user_requested=True,
            ))
      except Exception as e:  # pylint:disable=broad-except
        if hasattr(e, "ag_error_metadata"):
>         raise e.ag_error_metadata.to_exception(e)
E         ValueError: in user code:
E         
E             File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 728, in serving  *
E                 return self.call(inputs)
E             File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\modeling_tf_utils.py", line 816, in run_call_with_unpacked_inputs  *
E                 return func(self, **unpacked_inputs)
E             File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 1074, in call  *
E                 loss = self.forward_loss(pixel_values, logits, mask)
E             File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 1007, in forward_loss  *
E                 target = self.patchify(imgs)
E             File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 975, in patchify  *
E                 imgs = tf.cond(
E         
E             ValueError: Tried to convert 'x' to a tensor and failed. Error: None values not supported.

..\..\..\..\miniconda3\envs\py39\lib\site-packages\tensorflow\python\framework\func_graph.py:1129: ValueError

ydshieh · 2022-03-26T22:19:16Z

Changing shape_list to shape will also cause a test failing

transformers/tests/utils/test_modeling_tf_core.py

Lines 124 to 135 in b320d87

    
           def test_xla_mode(self): 
        
               config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common() 
        
               for model_class in self.all_model_classes: 
        
                   inputs = self._prepare_for_class(inputs_dict, model_class) 
        
                   model = model_class(config) 
        
                   @tf.function(experimental_compile=True) 
        
                   def run_in_graph_mode(): 
        
                       return model(inputs) 
        
                   outputs = run_in_graph_mode() 
        
                   self.assertIsNotNone(outputs)

with similar error

E             in user code:
E             
E                 File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\modeling_tf_utils.py", line 816, in run_call_with_unpacked_inputs  *
E                     return func(self, **unpacked_inputs)
E                 File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 1074, in call  *
E                     loss = self.forward_loss(pixel_values, logits, mask)
E                 File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 1007, in forward_loss  *
E                     target = self.patchify(imgs)
E                 File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 980, in patchify  *
E                     tf.debugging.assert_equal(imgs.shape[1], imgs.shape[2])
E             
E                 ValueError: None values not supported.

(This test is currently only used for just a few core NLP models)

(This test will still fail with shape_list here, but the error seems coming from other lines rather than from shape_list itself)

sayakpaul · 2022-03-27T01:29:39Z

Thanks for investigating it, @ydshieh. Is there anything we can do in this PR to mitigate the problem?

ydshieh · 2022-03-27T08:29:12Z

Thanks for investigating it, @ydshieh. Is there anything we can do in this PR to mitigate the problem?

Let's see what gante think.

sayakpaul · 2022-03-27T08:40:19Z

I was a bit worried it will break some testing at this moment, so I tested with shape. Indeed, the test test_save_load will give error at the line model.save_pretrained(tmpdirname, saved_model=True) - when calling autograph_handler inside. See the full log below.

This is why I used shape_list() there. The root cause here is that inside patchify() with .shape there will be a None in the batch size and tf.random.uniform() will error out for that. I could not think of any other workaround to mitigate the problem.

For the second one (#16255 (comment)) I am not too sure since you mentioned that the test is only applied for some core NLP models.

ydshieh · 2022-03-27T09:00:14Z

For the second one (#16255 (comment)) I am not too sure since you mentioned that the test is only applied for some core NLP models.

My comments above are not about we should change to shape and fix all the failing tests.

It is more about a question (to gante) if we should make this decision and change in this PR, considering some tests will fail.
(And the second comment is included mainly for us (HF) not to forget there are some tests we want to run for all models ideally, but currently run only for a few models. It might be better for us (HF) not to make a big decision so quickly and see all tests being green for a particular model, but will fail some important tests that are not currently run due to some limitation).

sayakpaul · 2022-03-27T09:04:18Z

Now I understand. Appreciate the class clarification.

gante · 2022-03-29T16:15:09Z

@sayakpaul vit-mae-base TF weights are on the hub, the others will soon follow :) I think you can make the final changes now (remove the from_pt), to then merge the PR 💪

sayakpaul · 2022-03-29T16:23:35Z

@gante thank you! Changes made.

gante · 2022-03-29T16:30:52Z

@gante thank you! Changes made.

Can you confirm that the tests run with RUN_SLOW=1? Will merge after CI gets to green and I get your confirmation 🚀

gante · 2022-03-29T16:31:23Z

btw ignore the failure in Add model like runner / Add new model like template tests (pull_request), it's being looked after

sayakpaul · 2022-03-29T16:31:49Z

@gante thank you! Changes made.

Can you confirm that the tests run with RUN_SLOW=1? Will merge after CI gets to green and I get your confirmation 🚀

Yes, I did run the tests before pushing the changes.

sayakpaul · 2022-03-29T16:52:08Z

@gante over to you to take the reigns.

gante · 2022-03-29T17:22:06Z

CI green, slow tests were run, all TF weights on the hub 👉 merging.

Great work @sayakpaul @ariG23498 💪

sayakpaul and others added 8 commits March 12, 2022 00:08

Merge branch 'master' into tf-mae

ff49a29

OK

Merge branch 'master' into tf-mae

e461484

OK

Merge branch 'master' into tf-mae

18a6ee6

OK

Merge branch 'master' into tf-mae

8c4f903

OK

feat: monkey patching of tf.random.uniform to control randomness.

e70e147

feat: added a noise argument in the implementation for reproducibility.

a255b91

feat: vit mae models with an additional noise argument for reproducib…

b4d28e4

…ility.

sayakpaul marked this pull request as draft March 18, 2022 15:51

ydshieh reviewed Mar 18, 2022

View reviewed changes

src/transformers/models/vit_mae/modeling_vit_mae.py Outdated Show resolved Hide resolved

fix: formatting and styling.

f98eb63

sayakpaul commented Mar 19, 2022

View reviewed changes

sayakpaul added 2 commits March 19, 2022 08:57

fix: formatting and removed the copy messages to prevent build fail.

aa8c12f

chore: resolved conflict.

b2863a9

sayakpaul and others added 12 commits March 19, 2022 09:17

fix: styling and formatting based on the PR changes.

05f283b

fix: flake8 errors due to whitespaces in the blanklines.

d63ff28

fix: whitespaces in the blankline.

77948a5

removed unnecessary files from the previous commit.

54ce863

chore: removed noted about positional embedding prototype.

0ca5908

fix: styling for the tf vit-mae script.

d055ce6

Merge branch 'master' into tf-mae

21873c4

OK

chore: adding partial testing

e36383c

chore: partial fix for tests

3b6e653

chore: partial test fix

7d49796

chore: partial fix with training parameter

351ecc4

chore: make style

4e075b7

fix: vitconfig -> vitmaeconfig.

8aae602

chore: comment fix ViT->ViTMAE

507b1c9

chore: comment fix in TFViTMAELayer

7a69bb0

chore: fix comment in TFViTMAELayer

7baf904

ydshieh reviewed Mar 26, 2022

View reviewed changes

chore: removed from_pt argument.

b1ffbc2

gante merged commit 5b40a37 into huggingface:main Mar 29, 2022

sayakpaul mentioned this pull request May 2, 2022

Add Data2Vec for Vision in TF #17008

Merged

sayakpaul mentioned this pull request Dec 28, 2022

Adding doc page for the object detection task #20874

Closed

6 tasks

		torch.zeros(1, self.num_patches + 1, config.hidden_size),
		requires_grad=False,

Add TF ViT MAE #16255

Add TF ViT MAE #16255

Conversation

sayakpaul commented Mar 18, 2022 • edited

ydshieh commented Mar 18, 2022 • edited

sayakpaul commented Mar 18, 2022

ydshieh commented Mar 18, 2022

sayakpaul Mar 19, 2022 • edited

Choose a reason for hiding this comment

ydshieh Mar 19, 2022

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Mar 19, 2022 • edited

sayakpaul commented Mar 26, 2022 • edited

ydshieh commented Mar 26, 2022 • edited

ydshieh commented Mar 26, 2022 • edited

ydshieh commented Mar 26, 2022 • edited

ariG23498 commented Mar 26, 2022

ydshieh commented Mar 26, 2022

ariG23498 commented Mar 26, 2022

ydshieh Mar 26, 2022

Choose a reason for hiding this comment

sayakpaul Mar 26, 2022 • edited

Choose a reason for hiding this comment

ydshieh Mar 26, 2022 • edited

Choose a reason for hiding this comment

sayakpaul Mar 26, 2022

Choose a reason for hiding this comment

ydshieh Mar 26, 2022

Choose a reason for hiding this comment

ydshieh Mar 27, 2022 • edited

Choose a reason for hiding this comment

sayakpaul Mar 27, 2022

Choose a reason for hiding this comment

ydshieh Mar 28, 2022

Choose a reason for hiding this comment

ydshieh commented Mar 26, 2022 • edited

Full error log

ydshieh commented Mar 26, 2022 • edited

sayakpaul commented Mar 27, 2022

ydshieh commented Mar 27, 2022

sayakpaul commented Mar 27, 2022 • edited

ydshieh commented Mar 27, 2022 • edited

sayakpaul commented Mar 27, 2022 • edited

gante commented Mar 29, 2022 • edited

sayakpaul commented Mar 29, 2022

gante commented Mar 29, 2022

gante commented Mar 29, 2022

sayakpaul commented Mar 29, 2022

sayakpaul commented Mar 29, 2022

gante commented Mar 29, 2022

sayakpaul commented Mar 18, 2022 •

edited

ydshieh commented Mar 18, 2022 •

edited

sayakpaul Mar 19, 2022 •

edited

HuggingFaceDocBuilderDev commented Mar 19, 2022 •

edited

sayakpaul commented Mar 26, 2022 •

edited

ydshieh commented Mar 26, 2022 •

edited

ydshieh commented Mar 26, 2022 •

edited

ydshieh commented Mar 26, 2022 •

edited

sayakpaul Mar 26, 2022 •

edited

ydshieh Mar 26, 2022 •

edited

ydshieh Mar 27, 2022 •

edited

ydshieh commented Mar 26, 2022 •

edited

ydshieh commented Mar 26, 2022 •

edited

sayakpaul commented Mar 27, 2022 •

edited

ydshieh commented Mar 27, 2022 •

edited

sayakpaul commented Mar 27, 2022 •

edited

gante commented Mar 29, 2022 •

edited