Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layer norm operator #2379

Closed
wschin opened this issue Sep 30, 2019 · 10 comments · Fixed by #4076
Closed

Layer norm operator #2379

wschin opened this issue Sep 30, 2019 · 10 comments · Fixed by #4076
Assignees
Labels
operator Issues related to ONNX operators
Milestone

Comments

@wschin
Copy link
Contributor

wschin commented Sep 30, 2019

LayerNorm is an very important operator in BERT (one of the computation bottleneck). Maybe we should add it as a FunctionProto to have a more meaningful BERT representation and allow runtime to easily write an optimized kernel for it.

@WilliamTambellini
Copy link

cf:
oneapi-src/oneDNN#511

@prasanthpul prasanthpul added the operator Issues related to ONNX operators label Oct 21, 2019
@ebarsoum
Copy link
Contributor

ebarsoum commented Nov 7, 2019

@wschin can you submit a PR?

@marcusturewicz
Copy link

Coming here from onnx/keras-onnx#557, I'm keen to see this implemented as it's used in SOTA EfficientNet models.

In order to propose a new operator/function, the following is needed:

1. If the operator can be composed by other ONNX operators, then it should be a function and not an operator (we have a function in ONNX : MeanVarianceNormalization).
It sounds like this can be implemented as a function. Is that correct?

2. If the operators can be split to new primitives, propose those primitives instead and make the operator a function.
Depends on above.

3. Based on a model. This will help us understand the usage and that it solves an actual problem. For the case of the model being private or IP and can't be shared, the operator doesn't belong to the standard and should be implemented as custom OP.
Original paper
It's used in tf.keras.applications.EfficientNet

4. The operator needs to be implemented by at-least one (well-known) framework. This help us to understand the actual behavior of the operator and its usage.
tf.keras
PyTorch
MXNet

Operator signature and behavior:

If the operator is available in numpy, prefer numpy semantics.
Don't believe it's available in numpy.

If the operator is available in more than one frameworks, make sure that your design is general and cover those frameworks.
Implementations look similar - would probably follow the Keras one.

Prefer attributes over inputs.
Sure.

@WilliamTambellini
Copy link

The feature req in cudnn :
https://forums.developer.nvidia.com/t/feature-request-cudnn-layer-normalization/73292

@wschin any news/help needed ?

@enpasos
Copy link

enpasos commented Jan 7, 2022

Did you find a good (fast) solution for layer-normalization in onnx?

@enpasos
Copy link

enpasos commented Jan 7, 2022

Coming here from onnx/keras-onnx#557, I'm keen to see this implemented as it's used in SOTA EfficientNet models.

In order to propose a new operator/function, the following is needed:

1. If the operator can be composed by other ONNX operators, then it should be a function and not an operator (we have a function in ONNX : MeanVarianceNormalization). It sounds like this can be implemented as a function. Is that correct?

2. If the operators can be split to new primitives, propose those primitives instead and make the operator a function. Depends on above.

3. Based on a model. This will help us understand the usage and that it solves an actual problem. For the case of the model being private or IP and can't be shared, the operator doesn't belong to the standard and should be implemented as custom OP. Original paper It's used in tf.keras.applications.EfficientNet

4. The operator needs to be implemented by at-least one (well-known) framework. This help us to understand the actual behavior of the operator and its usage. tf.keras PyTorch MXNet

Operator signature and behavior:

If the operator is available in numpy, prefer numpy semantics. Don't believe it's available in numpy.

If the operator is available in more than one frameworks, make sure that your design is general and cover those frameworks. Implementations look similar - would probably follow the Keras one.

Prefer attributes over inputs. Sure.

I think your suggestion is a composition like
grafik
with Attribute on the MeanVarianceNormalization node
grafik

@enpasos
Copy link

enpasos commented Jan 14, 2022

Big vote for "Please, please implement layer normalization"

The idea using "MeanVarianceNormalization + Mul + Add" is missing an important piece: MeanVarianceNormalization misses the epsilon used in BatchNormalization to prevent division by zero. Unfortunately "no variance" is not an unlikely corner case. To make it an unlikely corner case without the epsilon existing in MeanVarianceNormalization I only see the option to add some noise to the input data of MeanVarianceNormalization operation - not nice.

By the way ... using "BatchNormalization" in some combination with some pre and post operators is also not feasible - as far as I see it - because of the difference of BatchNormalization and LayerNormalization concerning the learnable parameters:

grafik
(from https://pytorch.org/docs/1.10.1/generated/torch.nn.LayerNorm.html?highlight=layer%20normalization)

@enpasos
Copy link

enpasos commented Jan 15, 2022

My workaround for inference that passes my unit tests:
grafik

@WilliamTambellini
Copy link

some news: there seems now to be a LayerNorm test at least in onnxrt:
https://github.com/WilliamTambellini/onnxruntime/blob/master/onnxruntime/test/contrib_ops/layer_norm_test.cc
but perhaps only implemented for cuda or rocm
and an Op:
https://github.com/WilliamTambellini/onnxruntime/blob/master/onnxruntime/test/contrib_ops/layer_norm_op_test.cc
TBC.

@WilliamTambellini
Copy link

@ebarsoum would you know the process in order to propose a new standard onnx op ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operator Issues related to ONNX operators
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants