-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layer norm operator #2379
Comments
@wschin can you submit a PR? |
Coming here from onnx/keras-onnx#557, I'm keen to see this implemented as it's used in SOTA EfficientNet models. In order to propose a new operator/function, the following is needed: 1. If the operator can be composed by other ONNX operators, then it should be a function and not an operator (we have a function in ONNX : MeanVarianceNormalization). 2. If the operators can be split to new primitives, propose those primitives instead and make the operator a function. 3. Based on a model. This will help us understand the usage and that it solves an actual problem. For the case of the model being private or IP and can't be shared, the operator doesn't belong to the standard and should be implemented as custom OP. 4. The operator needs to be implemented by at-least one (well-known) framework. This help us to understand the actual behavior of the operator and its usage. Operator signature and behavior: If the operator is available in numpy, prefer numpy semantics. If the operator is available in more than one frameworks, make sure that your design is general and cover those frameworks. Prefer attributes over inputs. |
The feature req in cudnn : @wschin any news/help needed ? |
Did you find a good (fast) solution for layer-normalization in onnx? |
I think your suggestion is a composition like |
Big vote for "Please, please implement layer normalization" The idea using "MeanVarianceNormalization + Mul + Add" is missing an important piece: MeanVarianceNormalization misses the epsilon used in BatchNormalization to prevent division by zero. Unfortunately "no variance" is not an unlikely corner case. To make it an unlikely corner case without the epsilon existing in MeanVarianceNormalization I only see the option to add some noise to the input data of MeanVarianceNormalization operation - not nice. By the way ... using "BatchNormalization" in some combination with some pre and post operators is also not feasible - as far as I see it - because of the difference of BatchNormalization and LayerNormalization concerning the learnable parameters:
|
some news: there seems now to be a LayerNorm test at least in onnxrt: |
@ebarsoum would you know the process in order to propose a new standard onnx op ? |
LayerNorm is an very important operator in BERT (one of the computation bottleneck). Maybe we should add it as a FunctionProto to have a more meaningful BERT representation and allow runtime to easily write an optimized kernel for it.
The text was updated successfully, but these errors were encountered: