Layer norm operator #2379

wschin · 2019-09-30T22:32:21Z

LayerNorm is an very important operator in BERT (one of the computation bottleneck). Maybe we should add it as a FunctionProto to have a more meaningful BERT representation and allow runtime to easily write an optimized kernel for it.

WilliamTambellini · 2019-09-30T23:56:25Z

cf:
oneapi-src/oneDNN#511

ebarsoum · 2019-11-07T01:58:03Z

@wschin can you submit a PR?

marcusturewicz · 2020-11-18T00:46:37Z

Coming here from onnx/keras-onnx#557, I'm keen to see this implemented as it's used in SOTA EfficientNet models.

In order to propose a new operator/function, the following is needed:

1. If the operator can be composed by other ONNX operators, then it should be a function and not an operator (we have a function in ONNX : MeanVarianceNormalization).
It sounds like this can be implemented as a function. Is that correct?

2. If the operators can be split to new primitives, propose those primitives instead and make the operator a function.
Depends on above.

3. Based on a model. This will help us understand the usage and that it solves an actual problem. For the case of the model being private or IP and can't be shared, the operator doesn't belong to the standard and should be implemented as custom OP.
Original paper
It's used in tf.keras.applications.EfficientNet

4. The operator needs to be implemented by at-least one (well-known) framework. This help us to understand the actual behavior of the operator and its usage.
tf.keras
PyTorch
MXNet

Operator signature and behavior:

If the operator is available in numpy, prefer numpy semantics.
Don't believe it's available in numpy.

If the operator is available in more than one frameworks, make sure that your design is general and cover those frameworks.
Implementations look similar - would probably follow the Keras one.

Prefer attributes over inputs.
Sure.

WilliamTambellini · 2020-12-02T17:01:06Z

The feature req in cudnn :
https://forums.developer.nvidia.com/t/feature-request-cudnn-layer-normalization/73292

@wschin any news/help needed ?

enpasos · 2022-01-07T07:51:54Z

Did you find a good (fast) solution for layer-normalization in onnx?

enpasos · 2022-01-07T16:05:42Z

Coming here from onnx/keras-onnx#557, I'm keen to see this implemented as it's used in SOTA EfficientNet models.

In order to propose a new operator/function, the following is needed:

1. If the operator can be composed by other ONNX operators, then it should be a function and not an operator (we have a function in ONNX : MeanVarianceNormalization). It sounds like this can be implemented as a function. Is that correct?

2. If the operators can be split to new primitives, propose those primitives instead and make the operator a function. Depends on above.

3. Based on a model. This will help us understand the usage and that it solves an actual problem. For the case of the model being private or IP and can't be shared, the operator doesn't belong to the standard and should be implemented as custom OP. Original paper It's used in tf.keras.applications.EfficientNet

4. The operator needs to be implemented by at-least one (well-known) framework. This help us to understand the actual behavior of the operator and its usage. tf.keras PyTorch MXNet

Operator signature and behavior:

If the operator is available in numpy, prefer numpy semantics. Don't believe it's available in numpy.

If the operator is available in more than one frameworks, make sure that your design is general and cover those frameworks. Implementations look similar - would probably follow the Keras one.

Prefer attributes over inputs. Sure.

I think your suggestion is a composition like

with Attribute on the MeanVarianceNormalization node

enpasos · 2022-01-14T08:25:17Z

Big vote for "Please, please implement layer normalization"

The idea using "MeanVarianceNormalization + Mul + Add" is missing an important piece: MeanVarianceNormalization misses the epsilon used in BatchNormalization to prevent division by zero. Unfortunately "no variance" is not an unlikely corner case. To make it an unlikely corner case without the epsilon existing in MeanVarianceNormalization I only see the option to add some noise to the input data of MeanVarianceNormalization operation - not nice.

By the way ... using "BatchNormalization" in some combination with some pre and post operators is also not feasible - as far as I see it - because of the difference of BatchNormalization and LayerNormalization concerning the learnable parameters:

(from https://pytorch.org/docs/1.10.1/generated/torch.nn.LayerNorm.html?highlight=layer%20normalization)

enpasos · 2022-01-15T09:29:15Z

My workaround for inference that passes my unit tests:

WilliamTambellini · 2022-01-19T03:41:13Z

some news: there seems now to be a LayerNorm test at least in onnxrt:
https://github.com/WilliamTambellini/onnxruntime/blob/master/onnxruntime/test/contrib_ops/layer_norm_test.cc
but perhaps only implemented for cuda or rocm
and an Op:
https://github.com/WilliamTambellini/onnxruntime/blob/master/onnxruntime/test/contrib_ops/layer_norm_op_test.cc
TBC.

WilliamTambellini · 2022-02-11T05:00:16Z

@ebarsoum would you know the process in order to propose a new standard onnx op ?

prasanthpul added the operator Issues related to ONNX operators label Oct 21, 2019

ebarsoum assigned wschin Nov 7, 2019

jcwchen mentioned this issue Oct 23, 2020

covert TasNet from torch to onnx #3067

Closed

WilliamTambellini mentioned this issue Jan 14, 2022

MeanVarianceNormalization without division by zero #3947

Closed

faxu mentioned this issue Mar 18, 2022

Missing documentation on LayerNormalization contrib spec microsoft/onnxruntime#10839

Closed

wschin mentioned this issue Mar 22, 2022

Define layer normalization as a function #4076

Merged

garymm added this to the 1.12 milestone Apr 13, 2022

gramalingam closed this as completed in #4076 Apr 28, 2022

AlexandreEichenberger mentioned this issue Oct 13, 2023

Clarification of Specs: LayerNormalization op should define broadcasting rules for B and Scale inputs #5666

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layer norm operator #2379

Layer norm operator #2379

wschin commented Sep 30, 2019 •

edited

Loading

WilliamTambellini commented Sep 30, 2019

ebarsoum commented Nov 7, 2019

marcusturewicz commented Nov 18, 2020

WilliamTambellini commented Dec 2, 2020

enpasos commented Jan 7, 2022

enpasos commented Jan 7, 2022

enpasos commented Jan 14, 2022 •

edited

Loading

enpasos commented Jan 15, 2022

WilliamTambellini commented Jan 19, 2022

WilliamTambellini commented Feb 11, 2022

Layer norm operator #2379

Layer norm operator #2379

Comments

wschin commented Sep 30, 2019 • edited Loading

WilliamTambellini commented Sep 30, 2019

ebarsoum commented Nov 7, 2019

marcusturewicz commented Nov 18, 2020

WilliamTambellini commented Dec 2, 2020

enpasos commented Jan 7, 2022

enpasos commented Jan 7, 2022

enpasos commented Jan 14, 2022 • edited Loading

enpasos commented Jan 15, 2022

WilliamTambellini commented Jan 19, 2022

WilliamTambellini commented Feb 11, 2022

wschin commented Sep 30, 2019 •

edited

Loading

enpasos commented Jan 14, 2022 •

edited

Loading