Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TF ViT MAE #16255

Merged
merged 65 commits into from Mar 29, 2022
Merged

Add TF ViT MAE #16255

merged 65 commits into from Mar 29, 2022

Conversation

sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Mar 18, 2022

This PR adds the MAE [1] model in TensorFlow. It was developed by @ariG23498 and myself.

Fun facts about this PR:

  • Probably the third pure vision model in TensorFlow in transformers.

References:

[1] Masked Autoencoders Are Scalable Vision Learners

Update

The PR is now ready for review. @gante @Rocketknight1 @sgugger

sayakpaul and others added 8 commits March 12, 2022 00:08
* partially ported pt methods and classes of vit mae to tensorflow.

* ported TFViTMAEIntermediate and TFViTMAEOutput.

* chore: addresses PR feedback.

* added TFViTMAEModel and started TFViTMAEDecoder.

* add: initial implementation of tf vit mae.

* fix: model output type.

* fix: a bunch of inconsistencies but need to investigate tf.repeat().

* chore: pr feedback.

* fix: gather error.

* chore: resorted to the original pt vit_mae model.

* fix: fix gather error partially (ral this time).

* feat: adding tf vit mae; model initializing but weight porting is flawed.

* partial fix: fixing the parameter names

Fixed the parameter names of dropout layers, decoder_embed and the decoder_layers. This helps with the proper cross-loading of the weights. With proper cross loading the debug test does not pass. My intuition is to dive a little deeper into the computations of the model

* chore: applied make style.

Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
@sayakpaul sayakpaul marked this pull request as draft March 18, 2022 15:51
@ydshieh
Copy link
Collaborator

ydshieh commented Mar 18, 2022

I just have a quick look and left a comment.

I feel strange that there are style changes like omega = 1.0 / 10000 ** omega # (D/2,). Do you have a previous version of black?

I thought you already updated the version (during your work on TFConvNext), via

pip install -e .[quality]

Maybe you were in a different virtual Python environment while working on this PR?

@sayakpaul
Copy link
Member Author

I just have a quick look and left a comment.

I feel strange that there are style changes like omega = 1.0 / 10000 ** omega # (D/2,). Do you have a previous version of black?

I thought you already updated the version (during your work on TFConvNext), via

pip install -e .[quality]

Maybe you were in a different virtual Python environment while working on this PR?

So, first I deactivate the current Python virtual environment and then run the installation and run make style?

I think I should fetch upstream before that and rebase.

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 18, 2022

So, first I deactivate the current Python virtual environment and then run the installation and run make style?

Yeah, if your current venv is specific to your other work/projects, and you don't want to change its installed packages.
Maybe it would better if you create a new virtual environment, say, venv-transformers-pr, and switch to it.

I think I should fetch upstream before that and rebase.

You can try it. I always try to have (very) recent commit from master to work on a new PR. Hope the rebase is smooth in your case here.

Comment on lines 226 to 227
torch.zeros(1, self.num_patches + 1, config.hidden_size),
requires_grad=False,
Copy link
Member Author

@sayakpaul sayakpaul Mar 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ydshieh I am unable to do the rebasing as mentioned in #16255 (comment) as @ariG23498 is the repo owner.

Here's what I did before committing the changes:

  • I cloned https://github.com/huggingface/transformers in a separate directory.
  • Initialized a new Python virtual environment.
  • cd'd to this cloned transformers and ran pip install -e .[quality].
  • After that from the forked transformers directory (from the tf-mae branch) I ran make style while being in the new virtual environment as mentioned above.

Let me know if these steps are good to go.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also new to this usecase. It seems logical and the effects are there. Therefore I think it is OK.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 19, 2022

The documentation is not available anymore as the PR was closed or merged.

@sayakpaul
Copy link
Member Author

sayakpaul commented Mar 26, 2022

@gante @ydshieh

Updates:

Repo consistency tells us that copied components have inconsistencies. Sorry, if this sounds foolish but we could not actually figure out those inconsistencies. Class name changes and other relevant changes in the comment do not count here I hope. To ensure the copying was indeed right we went ahead used a comparator tool to verify if that's the case. Here are the individual links to the comparison results for the components that are said to have copy inconsistencies:

Note that we did run make fix-copies within the environment as described in #16255 (comment) (refer to 565ec4c) and it replaced ViTMAEConfig with ViTConfig which is wrong, I guess.

We have also followed the copy comment format from the PT script ensuring they are well aligned.

What else are we missing out here?

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 26, 2022

@ariG23498 @sayakpaul

Probably my previous comment didn't explain things clear enough.

Let's take an example with:

# Copied from transformers.models.vit.modeling_tf_vit.TFViTSelfAttention with TFViT->TFViTMAE
class TFViTMAESelfAttention(tf.keras.layers.Layer):
    def __init__(self, config: ViTMAEConfig, **kwargs):

This # copied from will check if the block is a copy from TFViTSelfAttention after replaceing TFViT by TFViTMAE (in the RAM).
However, there is ViTConfig in the ViT file, and ViTMAEConfig in this block (note: there is no TF prefix here), and the instruction with TFViT->TFViTMAE in the comment # Copied from won't apply to ViTConfig (from the ViT file), and during the check, it is ViTConfig vs ViTMAEConfig, and therefore there is a difference.

In order to fix the issue, they are 2 potential options:

  1. In # Copied from, change with TFViT->TFViTMAE to with ViT->ViTMAE

    • This will only work if the 2 blocks in ViT and ViTMAE are indeed the same after this replace!
  2. In # Copied from, change with TFViT->TFViTMAE to with TFViT->TFViTMAE, ViTConfig->ViTMAEConfig

    • This is more verbose, but sometimes not really necessary

So I would suggest try option 1 first. If there are remaining places, we can use option 2 for that remaining places.

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 26, 2022

For the remaining # copied from issue,

# in ViT, layernorm is applied before self-attention

to

# in ViTMAE, layernorm is applied before self-attention

should work.
(This is also what has done in PyTorch ViTMAE)

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 26, 2022

It turns out that there is still

# in ViT, layernorm is also applied after self-attention

to be changed ...I didn't check the whole block previously, sorry.

I don't always pull the latest commit from this PR. In general, it would be easier to detect if you run make fix-copies locally and see what have been changed, and you will get the idea where should be fixed.

@ariG23498
Copy link
Contributor

Thanks for the prompt feedback @ydshieh

It was my bad to not check the entire code block.

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 26, 2022

Don't worry.

And just a potential correction (not really important): my previous comment "it would be easier to detect ..." might be not True in this particular line: I don't think there will be a diff visible to this particular line, after running make fix-copies.

@ariG23498
Copy link
Contributor

All green ✅

Thanks @ydshieh for your prompt and valuable feedback!

# source: https://discuss.pytorch.org/t/random-seed-that-spans-across-devices/19735
torch.manual_seed(2)
# make random mask reproducible across the PT and TF model
np.random.seed(2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this change, it would be great to hear from @NielsRogge too.

Previously we only saw the issue caused by CPU/GPU PRGNs (for the PyTorch ViTMAE).
But now we also see the issue caused by the different frameworks.

I think using numpy is a good choice to address this issue. However, let's wait Niels' opinion.

Copy link
Member Author

@sayakpaul sayakpaul Mar 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was discussed with @NielsRogge beforehand, though.

But I don't think this PR is contingent on this change as it was all well discussed before and reviewed as well. I might wrong though.

But now we also see the issue caused by the different frameworks.

Additionally, could you elaborate on the above?

Copy link
Collaborator

@ydshieh ydshieh Mar 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we discussed on Slack. As far as I know, Niels knows we are going to use the new noise argument and he is happy with this change.

However, regarding using numpy instead of torch (and updating the expected slice), I don't remember if Niels is aware of this change and expressed his opinion. I might be wrong (?).

Regarding But now we also see the issue caused by the different frameworks., it is just about we decided to use numpy because the PRNGs from torch and tf will give different results even with the same seed, and we don't want to import torch in tf (non pt-tf cross) test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying!

because the PRNGs from torch and tf will give different results even with the same seed,

I think you probably meant TF PT PRNGs won't produce same results even if they were seeded with the same number?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's what I mean.

Copy link
Collaborator

@ydshieh ydshieh Mar 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once Niels is OK with using NumPy, I think it is better to have the comment like

# make random mask reproducible across different frameworks (PyTorch, TensorFlow, etc.), as well as across different accelerators (CPU, GPU, etc.).
# note that the same seed on CPU and on GPU doesn’t mean they spew the same random number sequences,
# as they both have fairly different PRNGs (for efficiency reasons).
# source (for PyTorch): https://discuss.pytorch.org/t/random-seed-that-spans-across-devices/19735

(just at this place and the corresponding place in test_modeling_tf_vit_mae.py should be fine, no need to apply to other places in this PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much more comprehensive. Thanks for suggesting 👌

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Niels is OK with this change, I will just let you update the comment, @sayakpaul . Thank you!

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 26, 2022

About shape_list() is a mess for graph mode. We have a plan to get rid of it, and just use .shape everywhere

#16255 (comment)

I was a bit worried it will break some testing at this moment, so I tested with shape. Indeed, the test test_save_load will give error at the line model.save_pretrained(tmpdirname, saved_model=True) - when calling autograph_handler inside. See the full log below.

Full error log

            with tempfile.TemporaryDirectory() as tmpdirname:
>               model.save_pretrained(tmpdirname, saved_model=True)

tests\vit_mae\test_modeling_tf_vit_mae.py:616: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src\transformers\modeling_tf_utils.py:1418: in save_pretrained
    self.save(saved_model_dir, include_optimizer=False, signatures=self.serving)
..\..\..\..\miniconda3\envs\py39\lib\site-packages\keras\utils\traceback_utils.py:67: in error_handler
    raise e.with_traceback(filtered_tb) from None
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = ({'pixel_values': <tf.Tensor 'pixel_values:0' shape=(None, None, None, None) dtype=float32>},)
kwargs = {}

    def autograph_handler(*args, **kwargs):
      """Calls a converted version of original_func."""
      # TODO(mdan): Push this block higher in tf.function's call stack.
      try:
        return autograph.converted_call(
            original_func,
            args,
            kwargs,
            options=autograph.ConversionOptions(
                recursive=True,
                optional_features=autograph_options,
                user_requested=True,
            ))
      except Exception as e:  # pylint:disable=broad-except
        if hasattr(e, "ag_error_metadata"):
>         raise e.ag_error_metadata.to_exception(e)
E         ValueError: in user code:
E         
E             File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 728, in serving  *
E                 return self.call(inputs)
E             File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\modeling_tf_utils.py", line 816, in run_call_with_unpacked_inputs  *
E                 return func(self, **unpacked_inputs)
E             File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 1074, in call  *
E                 loss = self.forward_loss(pixel_values, logits, mask)
E             File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 1007, in forward_loss  *
E                 target = self.patchify(imgs)
E             File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 975, in patchify  *
E                 imgs = tf.cond(
E         
E             ValueError: Tried to convert 'x' to a tensor and failed. Error: None values not supported.

..\..\..\..\miniconda3\envs\py39\lib\site-packages\tensorflow\python\framework\func_graph.py:1129: ValueError

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 26, 2022

Changing shape_list to shape will also cause a test failing

def test_xla_mode(self):
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
for model_class in self.all_model_classes:
inputs = self._prepare_for_class(inputs_dict, model_class)
model = model_class(config)
@tf.function(experimental_compile=True)
def run_in_graph_mode():
return model(inputs)
outputs = run_in_graph_mode()
self.assertIsNotNone(outputs)

with similar error

E             in user code:
E             
E                 File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\modeling_tf_utils.py", line 816, in run_call_with_unpacked_inputs  *
E                     return func(self, **unpacked_inputs)
E                 File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 1074, in call  *
E                     loss = self.forward_loss(pixel_values, logits, mask)
E                 File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 1007, in forward_loss  *
E                     target = self.patchify(imgs)
E                 File "C:\Users\33611\Desktop\Projects\transformers-huggingface\transformers\src\transformers\models\vit_mae\modeling_tf_vit_mae.py", line 980, in patchify  *
E                     tf.debugging.assert_equal(imgs.shape[1], imgs.shape[2])
E             
E                 ValueError: None values not supported.

(This test is currently only used for just a few core NLP models)

(This test will still fail with shape_list here, but the error seems coming from other lines rather than from shape_list itself)

@sayakpaul
Copy link
Member Author

Thanks for investigating it, @ydshieh. Is there anything we can do in this PR to mitigate the problem?

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 27, 2022

Thanks for investigating it, @ydshieh. Is there anything we can do in this PR to mitigate the problem?

Let's see what gante think.

@sayakpaul
Copy link
Member Author

sayakpaul commented Mar 27, 2022

I was a bit worried it will break some testing at this moment, so I tested with shape. Indeed, the test test_save_load will give error at the line model.save_pretrained(tmpdirname, saved_model=True) - when calling autograph_handler inside. See the full log below.

This is why I used shape_list() there. The root cause here is that inside patchify() with .shape there will be a None in the batch size and tf.random.uniform() will error out for that. I could not think of any other workaround to mitigate the problem.

For the second one (#16255 (comment)) I am not too sure since you mentioned that the test is only applied for some core NLP models.

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 27, 2022

For the second one (#16255 (comment)) I am not too sure since you mentioned that the test is only applied for some core NLP models.

My comments above are not about we should change to shape and fix all the failing tests.

It is more about a question (to gante) if we should make this decision and change in this PR, considering some tests will fail.
(And the second comment is included mainly for us (HF) not to forget there are some tests we want to run for all models ideally, but currently run only for a few models. It might be better for us (HF) not to make a big decision so quickly and see all tests being green for a particular model, but will fail some important tests that are not currently run due to some limitation).

@sayakpaul
Copy link
Member Author

sayakpaul commented Mar 27, 2022

Now I understand. Appreciate the class clarification.

@gante
Copy link
Member

gante commented Mar 29, 2022

@sayakpaul vit-mae-base TF weights are on the hub, the others will soon follow :) I think you can make the final changes now (remove the from_pt), to then merge the PR 💪

@sayakpaul
Copy link
Member Author

@gante thank you! Changes made.

@gante
Copy link
Member

gante commented Mar 29, 2022

@gante thank you! Changes made.

Can you confirm that the tests run with RUN_SLOW=1? Will merge after CI gets to green and I get your confirmation 🚀

@gante
Copy link
Member

gante commented Mar 29, 2022

btw ignore the failure in Add model like runner / Add new model like template tests (pull_request), it's being looked after

@sayakpaul
Copy link
Member Author

@gante thank you! Changes made.

Can you confirm that the tests run with RUN_SLOW=1? Will merge after CI gets to green and I get your confirmation 🚀

Yes, I did run the tests before pushing the changes.

@sayakpaul
Copy link
Member Author

@gante over to you to take the reigns.

@gante
Copy link
Member

gante commented Mar 29, 2022

CI green, slow tests were run, all TF weights on the hub 👉 merging.

Great work @sayakpaul @ariG23498 💪

@gante gante merged commit 5b40a37 into huggingface:main Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants