[metaformers] handling different normalizations + layer repetition #345

blefaudeux · 2022-07-01T21:07:19Z

What does this PR do?

Some changes, on the way to a proper EfficientFormer support. This is limited to the factory side of the repo, no changes in the actual parts

Make it possible to repeat layers in the config generator helper
Make it possible to define a different MLP per layer
Make it possible to skip the layernorm (will be extended to support other normalizations)
related to EfficientFormer #330

Before submitting

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

blefaudeux · 2022-07-01T21:21:36Z

examples/cifar_MetaFormer.py


-from examples.microViT import Classifier, VisionTransformer


min/micro was a follow up from Karpathy's minGPT, does not really apply here, I figured that cifar_ViT was probably more transparent ?

blefaudeux · 2022-07-01T21:22:28Z

examples/cifar_MetaFormer.py

@@ -131,34 +146,16 @@ def forward(self, x):
    torch.cuda.manual_seed_all(42)
    torch.manual_seed(42)

-    train_transforms = transforms.Compose(


the lightning-bolt datamodule already does all that

blefaudeux · 2022-07-01T21:23:15Z

examples/cifar_ViT.py

@@ -205,33 +203,15 @@ def test_step(self, batch, _):
    NUM_WORKERS = 4
    GPUS = 1

-    train_transforms = transforms.Compose(


same as above, these are actually the default transforms in the lightning bolt datamodule, not useful

blefaudeux · 2022-07-01T21:23:39Z

tests/test_residual.py

    outputs = wrap(inputs=[x, x, x])

    assert id(outputs[0]) == id(outputs[1])
+
+    # Check the BW pass


better code cov, and good idea in any case I believe

blefaudeux · 2022-07-01T21:25:41Z

xformers/components/residual.py

@@ -18,7 +18,7 @@
 from collections import namedtuple


-class LayerNormStyle(str, Enum):
+class ResidualNormStyle(str, Enum):


some of the new transformer variants for vision (metaformer, efficientformer,..) alter the actual normalization, can be something else than layernorm. RMSnorm is also used in NLP

blefaudeux · 2022-07-01T21:26:39Z

xformers/components/residual.py

+class NormalizationType(str, Enum):
+    LayerNorm = "layernorm"
+    Skip = "skip"
+    # TODO: BatchNorm = "batchnorm"


this is probably to be done in another PR, requires deferred init or some clever way to get the embedding size at that point, so for now just handle the "no normalization" path

dianaml0

LGTM!

dianaml0 · 2022-07-05T16:25:12Z

xformers/factory/block_configs.py

@@ -190,6 +195,7 @@ def __init__(
        multi_head_config_cross: Dict[str, Any],
        position_encoding_config: Optional[Dict[str, Any]] = None,
        layer_norm_style: str = "post",


Might the distinction between the two variable names be confusing? layer_norm_style and normalization

ahh good point, you're right, it's more of a "residual path style" I think. Do you think I can rename that ? It would break all the existing configs, it's a user facing change unfortunately

Ah true, may not be great to break. I guess if the difference is well documented should be okay?

we can also catch the name for some time and fix it with a warning, then remove this failsafe in a few releases ?

Sorry missed this comment, that sounds good!

codecov-commenter · 2022-07-08T19:40:42Z

Codecov Report

Merging #345 (87cd5a4) into main (6c003f1) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #345      +/-   ##
==========================================
+ Coverage   93.91%   93.95%   +0.03%     
==========================================
  Files          70       70              
  Lines        3961     3984      +23     
==========================================
+ Hits         3720     3743      +23     
  Misses        241      241

Flag	Coverage Δ
Python	`93.95% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
xformers/components/__init__.py	`100.00% <100.00%> (ø)`
xformers/components/residual.py	`98.73% <100.00%> (+0.18%)`	⬆️
xformers/factory/block_configs.py	`90.81% <100.00%> (+0.19%)`	⬆️
xformers/factory/block_factory.py	`97.03% <100.00%> (ø)`
xformers/factory/model_factory.py	`98.16% <100.00%> (ø)`
xformers/helpers/hierarchical_configs.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c003f1...87cd5a4. Read the comment docs.

matthias-weissenbacher · 2022-07-15T07:33:43Z

Potential bug report.
It works fine on device: "cuda:0" but gives an error when employed at "cuda:1":
"""
RuntimeError: CUDA error: an illegal memory access was encountered” at “File “/lib/python3.8/site-packages/xformers/triton/softmax.py", line 200, in _softmax_dispatch
return torch.softmax(x, dim=-1)”.
"""

We have Installed xformers 0.0.11 from pypi and triton-2.0.0(hash: 5b04331dd2efdd23f4475823761fa975de60a514) from source.
Also xformers-0.0.12.dev0(hash: 3a7b713) and got same error.

blefaudeux · 2022-08-14T07:55:53Z

Potential bug report. It works fine on device: "cuda:0" but gives an error when employed at "cuda:1": """ RuntimeError: CUDA error: an illegal memory access was encountered” at “File “/lib/python3.8/site-packages/xformers/triton/softmax.py", line 200, in _softmax_dispatch return torch.softmax(x, dim=-1)”. """

We have Installed xformers 0.0.11 from pypi and triton-2.0.0(hash: 5b04331dd2efdd23f4475823761fa975de60a514) from source. Also xformers-0.0.12.dev0(hash: 3a7b713) and got same error.

Hi @matthias-weissenbacher , I just saw this report buried in my mails, does that still happen ? Could you tell me more about the GPUs on 0 and 1, are they the same ?

blefaudeux · 2022-08-14T07:57:03Z

also, could you report what you get with CUDA_LAUNCH_BLOCKING=1 {your command} ? probably that it does not really fail on torch.softmax() but earlier

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 1, 2022

blefaudeux changed the title ~~handling different normalizations + layer repetition~~ [DRAFT] handling different normalizations + layer repetition Jul 1, 2022

blefaudeux marked this pull request as draft July 1, 2022 21:07

blefaudeux commented Jul 1, 2022

View reviewed changes

blefaudeux force-pushed the hierachical_models_improvement branch from 22f45db to 41b8516 Compare July 1, 2022 21:25

blefaudeux commented Jul 1, 2022

View reviewed changes

blefaudeux changed the title ~~[DRAFT] handling different normalizations + layer repetition~~ [metaformers] handling different normalizations + layer repetition Jul 2, 2022

blefaudeux requested review from dianaml0 and fmassa July 2, 2022 11:53

blefaudeux marked this pull request as ready for review July 2, 2022 11:53

blefaudeux force-pushed the hierachical_models_improvement branch from 41b8516 to e5f22e4 Compare July 2, 2022 11:59

handling different normalizations + layer repetition

ebc4f6f

blefaudeux force-pushed the hierachical_models_improvement branch from e5f22e4 to ebc4f6f Compare July 3, 2022 13:31

dianaml0 approved these changes Jul 5, 2022

View reviewed changes

bugfix localizing the layers in the stack (#348)

58b36eb

blefaudeux force-pushed the hierachical_models_improvement branch from 68d3a84 to 78f8b7b Compare July 8, 2022 19:01

renaming the layer_norm_style param when building from config

87cd5a4

blefaudeux force-pushed the hierachical_models_improvement branch from 78f8b7b to 87cd5a4 Compare July 8, 2022 19:15

blefaudeux merged commit 3a7b713 into main Jul 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[metaformers] handling different normalizations + layer repetition #345

[metaformers] handling different normalizations + layer repetition #345

blefaudeux commented Jul 1, 2022 •

edited

blefaudeux Jul 1, 2022

blefaudeux Jul 1, 2022

blefaudeux Jul 1, 2022

blefaudeux Jul 1, 2022

blefaudeux Jul 1, 2022

blefaudeux Jul 1, 2022

dianaml0 left a comment

dianaml0 Jul 5, 2022

blefaudeux Jul 6, 2022

dianaml0 Jul 7, 2022

blefaudeux Jul 8, 2022

dianaml0 Jul 14, 2022

codecov-commenter commented Jul 8, 2022

matthias-weissenbacher commented Jul 15, 2022

blefaudeux commented Aug 14, 2022

blefaudeux commented Aug 14, 2022

[metaformers] handling different normalizations + layer repetition #345

[metaformers] handling different normalizations + layer repetition #345

Conversation

blefaudeux commented Jul 1, 2022 • edited

What does this PR do?

Before submitting

PR review

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dianaml0 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 8, 2022

Codecov Report

matthias-weissenbacher commented Jul 15, 2022

blefaudeux commented Aug 14, 2022

blefaudeux commented Aug 14, 2022

blefaudeux commented Jul 1, 2022 •

edited