Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GELU as function op #5277

Merged
merged 26 commits into from
Jul 10, 2023
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
12bbb48
Added implementation for GELU
pranshupant May 11, 2023
0cf8a6c
adding unit test for gelu
pranshupant May 30, 2023
2733669
updated attribute name from appox to approximate + added trivial auto…
pranshupant May 30, 2023
9ccb7ff
updated doc and unit test
pranshupant May 31, 2023
c4503e9
Adding generated doc files
pranshupant May 31, 2023
f23be94
adding test data files
pranshupant May 31, 2023
c7e5dde
update to test name and added reference op implementation
pranshupant Jun 20, 2023
0f62240
updates based on PR feedback
pranshupant Jun 21, 2023
ab54319
fixed linting issues and test failures
pranshupant Jul 5, 2023
ad40911
Disabled GELU ORT tests for gelu (opset 20)
pranshupant Jul 5, 2023
fc730d1
Fixed C++ linting issues
pranshupant Jul 5, 2023
eef822d
Merge branch 'main' into main
gramalingam Jul 7, 2023
8f43c38
Added implementation for GELU
pranshupant May 11, 2023
7851925
adding unit test for gelu
pranshupant May 30, 2023
9316001
updated attribute name from appox to approximate + added trivial auto…
pranshupant May 30, 2023
283a4fd
updated doc and unit test
pranshupant May 31, 2023
58f8280
Adding generated doc files
pranshupant May 31, 2023
2bb3236
adding test data files
pranshupant May 31, 2023
fd95e25
update to test name and added reference op implementation
pranshupant Jun 20, 2023
b2f86d5
updates based on PR feedback
pranshupant Jun 21, 2023
a47c3c1
fixed linting issues and test failures
pranshupant Jul 5, 2023
e9eeb1d
Disabled GELU ORT tests for gelu (opset 20)
pranshupant Jul 5, 2023
ba95b2f
Fixed C++ linting issues
pranshupant Jul 5, 2023
2d00e99
Merge branch 'main' of github.com:pranshupant/onnx into main
pranshupant Jul 8, 2023
00b977f
Updated Changelog to account for #5390
pranshupant Jul 8, 2023
0c6ffa4
Merge branch 'main' into main
gramalingam Jul 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
42 changes: 42 additions & 0 deletions docs/Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -23881,6 +23881,48 @@ This version of the operator has been available since version 19 of the default
</dl>

## Version 20 of the default ONNX operator set
### <a name="Gelu-20"></a>**Gelu-20**</a>

Gelu takes one input data (Tensor<T>) and produces one
output data (Tensor<T>) where the gaussian error linear units function,
$y = 0.5 * x * (1 + erf(x/sqrt(2)))$ is applied to the tensor elementwise.
If the attribute "approximate" is set to "tanh", the function estimation,
$y = 0.5 * x * (1 + Tanh(sqrt(2/\pi) * (x + 0.044715 * x^3)))$ is used and applied
to the tensor elementwise.


#### Version

This version of the operator has been available since version 20 of the default ONNX operator set.

#### Attributes

<dl>
<dt><tt>approximate</tt> : string (default is none)</dt>
<dd>Gelu approximation algorithm: `"tanh"`, `"none"`(default).`"none"`: do not use approximation.`"tanh"`: use tanh approximation.</dd>
</dl>

#### Inputs

<dl>
<dt><tt>X</tt> (differentiable) : T</dt>
<dd>Input tensor</dd>
</dl>

#### Outputs

<dl>
<dt><tt>Y</tt> (differentiable) : T</dt>
<dd>Output tensor</dd>
</dl>

#### Type Constraints

<dl>
<dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)</dt>
<dd>Constrain input and output types to float tensors.</dd>
</dl>

### <a name="ConstantOfShape-20"></a>**ConstantOfShape-20**</a>

Generate a tensor with given value and shape.
Expand Down
96 changes: 96 additions & 0 deletions docs/Operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ For an operator input/output's differentiability, it can be differentiable,
|<a href="#Clip">Clip</a>|<a href="Changelog.md#Clip-13">13</a>, <a href="Changelog.md#Clip-12">12</a>, <a href="Changelog.md#Clip-11">11</a>, <a href="Changelog.md#Clip-6">6</a>, <a href="Changelog.md#Clip-1">1</a>|13|
|<a href="#DynamicQuantizeLinear">DynamicQuantizeLinear</a>|<a href="Changelog.md#DynamicQuantizeLinear-11">11</a>|11|
|<a href="#Elu">Elu</a>|<a href="Changelog.md#Elu-6">6</a>, <a href="Changelog.md#Elu-1">1</a>|18|
|<a href="#Gelu">Gelu</a>|<a href="Changelog.md#Gelu-20">20</a>|20|
|<a href="#GreaterOrEqual">GreaterOrEqual</a>|<a href="Changelog.md#GreaterOrEqual-16">16</a>, <a href="Changelog.md#GreaterOrEqual-12">12</a>|16|
|<a href="#GroupNormalization">GroupNormalization</a>|<a href="Changelog.md#GroupNormalization-18">18</a>|18|
|<a href="#HammingWindow">HammingWindow</a>|<a href="Changelog.md#HammingWindow-17">17</a>|17|
Expand Down Expand Up @@ -9410,6 +9411,101 @@ expect(
</details>


### <a name="Gelu"></a><a name="gelu">**Gelu**</a>

Gelu takes one input data (Tensor<T>) and produces one
output data (Tensor<T>) where the gaussian error linear units function,
$y = 0.5 * x * (1 + erf(x/sqrt(2)))$ is applied to the tensor elementwise.
If the attribute "approximate" is set to "tanh", the function estimation,
$y = 0.5 * x * (1 + Tanh(sqrt(2/\pi) * (x + 0.044715 * x^3)))$ is used and applied
to the tensor elementwise.


#### Version

This version of the operator has been available since version 20 of the default ONNX operator set.

#### Attributes

<dl>
<dt><tt>approximate</tt> : string (default is none)</dt>
<dd>Gelu approximation algorithm: `"tanh"`, `"none"`(default).`"none"`: do not use approximation.`"tanh"`: use tanh approximation.</dd>
</dl>

#### Inputs

<dl>
<dt><tt>X</tt> (differentiable) : T</dt>
<dd>Input tensor</dd>
</dl>

#### Outputs

<dl>
<dt><tt>Y</tt> (differentiable) : T</dt>
<dd>Output tensor</dd>
</dl>

#### Type Constraints

<dl>
<dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)</dt>
<dd>Constrain input and output types to float tensors.</dd>
</dl>


#### Examples

<details>
<summary>gelu_default</summary>

```python
node = onnx.helper.make_node("Gelu", inputs=["x"], outputs=["y"])

x = np.array([-1, 0, 1]).astype(np.float32)
# expected output [-0.15865526, 0., 0.84134474]
y = (0.5 * x * (1 + np.vectorize(math.erf)(x / np.sqrt(2)))).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_gelu_default_1")

x = np.random.randn(3, 4, 5).astype(np.float32)
# expected output [2.99595031, 3.99987331, 4.99999857]
y = (0.5 * x * (1 + np.vectorize(math.erf)(x / np.sqrt(2)))).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_gelu_default_2")
```

</details>


<details>
<summary>gelu_tanh</summary>

```python
node = onnx.helper.make_node(
"Gelu", inputs=["x"], outputs=["y"], approximate="tanh"
)

x = np.array([-1, 0, 1]).astype(np.float32)
# expected output [-0.158808, 0., 0.841192]
y = (
0.5
* x
* (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * np.power(x, 3))))
).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_gelu_tanh_1")

x = np.random.randn(3, 4, 5).astype(np.float32)
# expected output [2.9963627, 3.99993, 4.9999995]
y = (
0.5
* x
* (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * np.power(x, 3))))
).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_gelu_tanh_2")
```

</details>


### <a name="Gemm"></a><a name="gemm">**Gemm**</a>

General Matrix multiplication:
Expand Down
52 changes: 51 additions & 1 deletion docs/TestCoverage.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
* [Overall Test Coverage](#overall-test-coverage)
# Node Test Coverage
## Summary
Node tests have covered 173/186 (93.01%, 5 generators excluded) common operators.
Node tests have covered 174/187 (93.05%, 5 generators excluded) common operators.

Node tests have covered 0/0 (N/A) experimental operators.

Expand Down Expand Up @@ -6241,6 +6241,56 @@ expect(
</details>


### Gelu
There are 2 test cases, listed as following:
<details>
<summary>gelu_default</summary>

```python
node = onnx.helper.make_node("Gelu", inputs=["x"], outputs=["y"])

x = np.array([-1, 0, 1]).astype(np.float32)
# expected output [-0.15865526, 0., 0.84134474]
y = (0.5 * x * (1 + np.vectorize(math.erf)(x / np.sqrt(2)))).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_gelu_default_1")

x = np.random.randn(3, 4, 5).astype(np.float32)
# expected output [2.99595031, 3.99987331, 4.99999857]
y = (0.5 * x * (1 + np.vectorize(math.erf)(x / np.sqrt(2)))).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_gelu_default_2")
```

</details>
<details>
<summary>gelu_tanh</summary>

```python
node = onnx.helper.make_node(
"Gelu", inputs=["x"], outputs=["y"], approximate="tanh"
)

x = np.array([-1, 0, 1]).astype(np.float32)
# expected output [-0.158808, 0., 0.841192]
y = (
0.5
* x
* (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * np.power(x, 3))))
).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_gelu_tanh_1")

x = np.random.randn(3, 4, 5).astype(np.float32)
# expected output [2.9963627, 3.99993, 4.9999995]
y = (
0.5
* x
* (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * np.power(x, 3))))
).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_gelu_tanh_2")
```

</details>


### Gemm
There are 11 test cases, listed as following:
<details>
Expand Down
51 changes: 51 additions & 0 deletions onnx/backend/test/case/node/gelu.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Copyright (c) ONNX Project Contributors
#
# SPDX-License-Identifier: Apache-2.0

import math

import numpy as np

import onnx
from onnx.backend.test.case.base import Base
from onnx.backend.test.case.node import expect


class Gelu(Base):
@staticmethod
def export_gelu_tanh() -> None:
node = onnx.helper.make_node(
"Gelu", inputs=["x"], outputs=["y"], approximate="tanh"
)

x = np.array([-1, 0, 1]).astype(np.float32)
# expected output [-0.158808, 0., 0.841192]
y = (
0.5
* x
* (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * np.power(x, 3))))
).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_gelu_tanh_1")

x = np.random.randn(3, 4, 5).astype(np.float32)
# expected output [2.9963627, 3.99993, 4.9999995]
y = (
0.5
* x
* (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * np.power(x, 3))))
).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_gelu_tanh_2")

@staticmethod
def export_gelu_default() -> None:
node = onnx.helper.make_node("Gelu", inputs=["x"], outputs=["y"])

x = np.array([-1, 0, 1]).astype(np.float32)
# expected output [-0.15865526, 0., 0.84134474]
y = (0.5 * x * (1 + np.vectorize(math.erf)(x / np.sqrt(2)))).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_gelu_default_1")

x = np.random.randn(3, 4, 5).astype(np.float32)
# expected output [2.99595031, 3.99987331, 4.99999857]
y = (0.5 * x * (1 + np.vectorize(math.erf)(x / np.sqrt(2)))).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_gelu_default_2")
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
BxJðxÌá?háÌ>“Žz?Ëj@$ ï?â.z¿ÿ8s?bý¾hdÓ½ø9Ò>(€>¢%º?^ÓB?À0ù= Bã>]ת>ü=¿?R¾iJ >¦Z¿/d#ÀŒS'?±K]?‡þ=¿©C@¨(º¿Hm;= ­?¾2Ä?ó¼?Šª>…žÁ>íEc¿½Šý¿‹!²¾ò >*z?•ç™?³Oƾmǚ¾ü6†¿&õ¿gÚ¿³ù?‘x¿FKྙ[ ¿œ G?4”ο—ØY¾L=e¿Æ> Ä¿õ—¿kÞæ¼QNÛ>.:ˆ=™Ýš>ïb"¿6¹¹¾
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ByJðÙ?…K†>ƒ€Q?AŸ@t¨ç? V$¾–ŸI?ôWˆ½uB½›Š>úd¤=’¬?VQ?V©ˆ=q–˜>®~W>‡Q²?Óx¯G>dá+¾8^_¼$Áø>¹o2?©.¾Þ“@Ü3Ù½xDÂ<+7£½¯æ·?п®?24²=I­z>jL*¾'A½¥©¾þ³=DŒ?Dˆ?÷…
¾|ì½î
¾áBâ½®™½KRó?h@¾:U¾Zá¾’?T°½;%µ½;â)¾à€>f¾·¾›­a¼J’>²s=©­?>ÞÉ*¾S ¾
Expand Down
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
BxJðxÌá?háÌ>“Žz?Ëj@$ ï?â.z¿ÿ8s?bý¾hdÓ½ø9Ò>(€>¢%º?^ÓB?À0ù= Bã>]ת>ü=¿?R¾iJ >¦Z¿/d#ÀŒS'?±K]?‡þ=¿©C@¨(º¿Hm;= ­?¾2Ä?ó¼?Šª>…žÁ>íEc¿½Šý¿‹!²¾ò >*z?•ç™?³Oƾmǚ¾ü6†¿&õ¿gÚ¿³ù?‘x¿FKྙ[ ¿œ G?4”ο—ØY¾L=e¿Æ> Ä¿õ—¿kÞæ¼QNÛ>.:ˆ=™Ýš>ïb"¿6¹¹¾
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ByJðÙ?…K†>ƒ€Q?AŸ@t¨ç? V$¾–ŸI?ôWˆ½uB½›Š>úd¤=’¬?VQ?V©ˆ=q–˜>®~W>‡Q²?Óx¯G>dá+¾8^_¼$Áø>¹o2?©.¾Þ“@Ü3Ù½xDÂ<+7£½¯æ·?п®?24²=I­z>jL*¾'A½¥©¾þ³=DŒ?Dˆ?÷…
¾|ì½î
¾áBâ½®™½KRó?h@¾:U¾Zá¾’?T°½;%µ½;â)¾à€>f¾·¾›­a¼J’>²s=©­?>ÞÉ*¾S ¾
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
BxJðxÌá?háÌ>“Žz?Ëj@$ ï?â.z¿ÿ8s?bý¾hdÓ½ø9Ò>(€>¢%º?^ÓB?À0ù= Bã>]ת>ü=¿?R¾iJ >¦Z¿/d#ÀŒS'?±K]?‡þ=¿©C@¨(º¿Hm;= ­?¾2Ä?ó¼?Šª>…žÁ>íEc¿½Šý¿‹!²¾ò >*z?•ç™?³Oƾmǚ¾ü6†¿&õ¿gÚ¿³ù?‘x¿FKྙ[ ¿œ G?4”ο—ØY¾L=e¿Æ> Ä¿õ—¿kÞæ¼QNÛ>.:ˆ=™Ýš>ïb"¿6¹¹¾
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
BxJðxÌá?háÌ>“Žz?Ëj@$ ï?â.z¿ÿ8s?bý¾hdÓ½ø9Ò>(€>¢%º?^ÓB?À0ù= Bã>]ת>ü=¿?R¾iJ >¦Z¿/d#ÀŒS'?±K]?‡þ=¿©C@¨(º¿Hm;= ­?¾2Ä?ó¼?Šª>…žÁ>íEc¿½Šý¿‹!²¾ò >*z?•ç™?³Oƾmǚ¾ü6†¿&õ¿gÚ¿³ù?‘x¿FKྙ[ ¿œ G?4”ο—ØY¾L=e¿Æ> Ä¿õ—¿kÞæ¼QNÛ>.:ˆ=™Ýš>ïb"¿6¹¹¾
Binary file not shown.