Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement copysign #46396

Closed
wants to merge 38 commits into from
Closed

Implement copysign #46396

wants to merge 38 commits into from

Conversation

ejguan
Copy link
Contributor

@ejguan ejguan commented Oct 15, 2020

Related #38349
Stack from ghstack:

numpy

  • No in-place function
  • No method
  • Optional output
  • Available: byte, char, bool, int, short, long, float, double, half
  • Integral promoted to float
  • Not available: float/double complex

c = np.copysign(a, b)

a b c a.grad
-1 -1 -1 1
-0 -1 -0 0
0 -1 -0 0
1 -1 -1 -1
-1 -0 -1 1
-0 -0 -0 0
0 -0 -0 0
1 -0 -1 -1
-1 0 1 -1
-0 0 0 0
0 0 0 0
1 0 1 1
-1 1 1 -1
-0 1 0 0
0 1 0 0
1 1 1 1

This function becomes non-differentiable at a=0 for any b. So, in my opinion, we may set the gradient for a=0 to 0.

TODO:

  • test (cpu/gpu)
  • doc
  • kernel_vec

Differential Revision: D24401366

[ghstack-poisoned]
@ejguan ejguan marked this pull request as draft October 15, 2020 15:20
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Oct 15, 2020

💊 CI failures summary and remediations

As of commit 310bc5e (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 2 times.

@ejguan ejguan changed the title Implement copysign [WIP] Implement copysign Oct 15, 2020
@dr-ci
Copy link

dr-ci bot commented Oct 15, 2020

💊 CI failures summary and remediations

As of commit e3e5eed (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


2 failures confirmed as flaky and can be ignored:

  • pytorch_libtorch_linux_xenial_cuda11_0_cudnn8_py3_gcc7_build
  • pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 216 times.

[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) 
- No in-place function
- No method
- Optional output
- Available: bool, int, short, long, float, double, half
- Not available: byte, char, float/double complex

TODO:
- [ ] test
- [ ] doc
- [ ] kernel_vec

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Oct 15, 2020
ghstack-source-id: ad138297abbe2e65bfb661c8446356e003ab152c
Pull Request resolved: #46396
@@ -49,6 +49,11 @@ void copy_range(variable_list& out, IndexRange range, at::ArrayRef<Tensor> t) {
std::copy(t.begin(), t.end(), out.begin() + range.first);
}

Tensor copysign_tensor_backward(Tensor grad, Tensor self, Tensor other) {
auto result = grad * self.sign() * other.sign();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that you need the self.sign() here.
It will fail gradcheck when you add it to the list in common_method_invocation.py if the formula is wrong. But I think you will need to change this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance:

  • a = tensor(-1.)
  • b = tensor(1.)
  • c = torch.copysign(a, b) = tensor(1)

The derivative of a is -1 rather than b.sign() = 1. Any thought on that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ho right, the derivative is -1 when we change and 1 otherwise so you need both! Agree with you.

Also there is a corner case at 0 here where sign() returns 0. What is copysign() doing for that? Is the backward formula good for this case as well?

@albanD
Copy link
Collaborator

albanD commented Oct 15, 2020

From reading your table of outputs, is the following correct:

copysign(a, b) = {
    abs(a) if b >=0
    -abs(a) if b <0
}

If it is the case, then it makes the gradient computation easier to derive for special points.
In particular, for b>=0 and a~=0, since it is a convex function and we select the min-norm subgradient here: 0
And for b<0 and a~=0, this is a concave function and we select the min-norm supergradient: 0

So basically all the "?" in your table above should be 0 as they all correspond to this case where a~=0.

Related #38349

[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) 
- No in-place function
- No method
- Optional output
- Available: bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: byte, char, float/double complex

`c = np.copysign(a, b)`
|  a |  b |  c | a.grad |
|:--:|:--:|:--:|:----:|
| -1 | -1 | -1 |   1  |
| -0 | -1 | -0 |   1?  |
|  0 | -1 | -0 |  -1?  |
|  1 | -1 | -1 |  -1  |
| -1 | -0 |  1 |  -1  |
| -0 | -0 |  0 |  -1?  |
|  0 | -0 |  0 |   1?  |
|  1 | -0 |  1 |   1  |
| -1 |  0 |  1 |  -1  |
| -0 |  0 |  0 |  -1?  |
|  0 |  0 |  0 |   1?  |
|  1 |  0 |  1 |   1  |
| -1 |  1 |  1 |  -1  |
| -0 |  1 |  0 |  -1?  |
|  0 |  1 |  0 |   1?  |
|  1 |  1 |  1 |   1  |

This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.

TODO:
- [ ] test
- [ ] doc
- [x] ~kernel_vec~

[ghstack-poisoned]
Related #38349

[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) 
- No in-place function
- No method
- Optional output
- Available: bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: byte, char, float/double complex

`c = np.copysign(a, b)`
|  a |  b |  c | a.grad |
|:--:|:--:|:--:|:----:|
| -1 | -1 | -1 |   1  |
| -0 | -1 | -0 |   1?  |
|  0 | -1 | -0 |  -1?  |
|  1 | -1 | -1 |  -1  |
| -1 | -0 |  1 |  -1  |
| -0 | -0 |  0 |  -1?  |
|  0 | -0 |  0 |   1?  |
|  1 | -0 |  1 |   1  |
| -1 |  0 |  1 |  -1  |
| -0 |  0 |  0 |  -1?  |
|  0 |  0 |  0 |   1?  |
|  1 |  0 |  1 |   1  |
| -1 |  1 |  1 |  -1  |
| -0 |  1 |  0 |  -1?  |
|  0 |  1 |  0 |   1?  |
|  1 |  1 |  1 |   1  |

This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.

TODO:
- [ ] test
- [ ] doc
- [x] ~kernel_vec~

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Oct 15, 2020
ghstack-source-id: 0378575506b0ec7464690dbb1b3ca4f7e8569224
Pull Request resolved: #46396
Related #38349

[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) 
- No in-place function
- No method
- Optional output
- Available: bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: byte, char, float/double complex

`c = np.copysign(a, b)`
|  a |  b |  c | a.grad |
|:--:|:--:|:--:|:----:|
| -1 | -1 | -1 |   1  |
| -0 | -1 | -0 |   1?  |
|  0 | -1 | -0 |  -1?  |
|  1 | -1 | -1 |  -1  |
| -1 | -0 |  1 |  -1  |
| -0 | -0 |  0 |  -1?  |
|  0 | -0 |  0 |   1?  |
|  1 | -0 |  1 |   1  |
| -1 |  0 |  1 |  -1  |
| -0 |  0 |  0 |  -1?  |
|  0 |  0 |  0 |   1?  |
|  1 |  0 |  1 |   1  |
| -1 |  1 |  1 |  -1  |
| -0 |  1 |  0 |  -1?  |
|  0 |  1 |  0 |   1?  |
|  1 |  1 |  1 |   1  |

This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.

TODO:
- [ ] test
- [ ] doc
- [x] ~kernel_vec~

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Oct 16, 2020
ghstack-source-id: 2937708298c613db16a5b4fc2dfbd9cd45423fec
Pull Request resolved: #46396
Related #38349

[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) 
- No in-place function
- No method
- Optional output
- Available: bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: byte, char, float/double complex

`c = np.copysign(a, b)`
|  a |  b |  c | a.grad |
|:--:|:--:|:--:|:----:|
| -1 | -1 | -1 |   1  |
| -0 | -1 | -0 |   0  |
|  0 | -1 | -0 |  0  |
|  1 | -1 | -1 |  -1  |
| -1 | -0 |  1 |  -1  |
| -0 | -0 |  0 |  0  |
|  0 | -0 |  0 |   0  |
|  1 | -0 |  1 |   1  |
| -1 |  0 |  1 |  -1  |
| -0 |  0 |  0 |  0  |
|  0 |  0 |  0 |   0  |
|  1 |  0 |  1 |   1  |
| -1 |  1 |  1 |  -1  |
| -0 |  1 |  0 |  0  |
|  0 |  1 |  0 |   0  |
|  1 |  1 |  1 |   1  |

This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.

TODO:
- [x] test
- [ ] doc
- [x] ~kernel_vec~

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Oct 16, 2020
ghstack-source-id: 5f477aa5d724aeaa1606d1aa690cf4778f433cec
Pull Request resolved: #46396
Related #38349

[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) 
- No in-place function
- No method
- Optional output
- Available: byte, char, bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: float/double complex

`c = np.copysign(a, b)`
|  a |  b |  c | a.grad |
|:--:|:--:|:--:|:----:|
| -1 | -1 | -1 |   1  |
| -0 | -1 | -0 |   0  |
|  0 | -1 | -0 |  0  |
|  1 | -1 | -1 |  -1  |
| -1 | -0 |  1 |  -1  |
| -0 | -0 |  0 |  0  |
|  0 | -0 |  0 |   0  |
|  1 | -0 |  1 |   1  |
| -1 |  0 |  1 |  -1  |
| -0 |  0 |  0 |  0  |
|  0 |  0 |  0 |   0  |
|  1 |  0 |  1 |   1  |
| -1 |  1 |  1 |  -1  |
| -0 |  1 |  0 |  0  |
|  0 |  1 |  0 |   0  |
|  1 |  1 |  1 |   1  |

This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.

TODO:
- [x] test
- [x] doc
- [x] ~kernel_vec~

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Oct 16, 2020
ghstack-source-id: edccfde9095eab988ba254c001b784021fa6fe87
Pull Request resolved: #46396
Related #38349

[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) 
- No in-place function
- No method
- Optional output
- Available: byte, char, bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: float/double complex

`c = np.copysign(a, b)`
|  a |  b |  c | a.grad |
|:--:|:--:|:--:|:----:|
| -1 | -1 | -1 |   1  |
| -0 | -1 | -0 |   0  |
|  0 | -1 | -0 |  0  |
|  1 | -1 | -1 |  -1  |
| -1 | -0 |  1 |  -1  |
| -0 | -0 |  0 |  0  |
|  0 | -0 |  0 |   0  |
|  1 | -0 |  1 |   1  |
| -1 |  0 |  1 |  -1  |
| -0 |  0 |  0 |  0  |
|  0 |  0 |  0 |   0  |
|  1 |  0 |  1 |   1  |
| -1 |  1 |  1 |  -1  |
| -0 |  1 |  0 |  0  |
|  0 |  1 |  0 |   0  |
|  1 |  1 |  1 |   1  |

This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.

TODO:
- [x] test
- [x] doc
- [x] ~kernel_vec~

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Oct 19, 2020
ghstack-source-id: 0d5fc918521ebe7ac4e537eb9b13a2e3c18c2091
Pull Request resolved: #46396
Related #38349

[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) 
- No in-place function
- No method
- Optional output
- Available: byte, char, bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: float/double complex

`c = np.copysign(a, b)`
|  a |  b |  c | a.grad |
|:--:|:--:|:--:|:----:|
| -1 | -1 | -1 |   1  |
| -0 | -1 | -0 |   0  |
|  0 | -1 | -0 |  0  |
|  1 | -1 | -1 |  -1  |
| -1 | -0 |  1 |  -1  |
| -0 | -0 |  0 |  0  |
|  0 | -0 |  0 |   0  |
|  1 | -0 |  1 |   1  |
| -1 |  0 |  1 |  -1  |
| -0 |  0 |  0 |  0  |
|  0 |  0 |  0 |   0  |
|  1 |  0 |  1 |   1  |
| -1 |  1 |  1 |  -1  |
| -0 |  1 |  0 |  0  |
|  0 |  1 |  0 |   0  |
|  1 |  1 |  1 |   1  |

This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.

TODO:
- [x] test
- [x] doc
- [x] ~kernel_vec~
- [ ] torch.copysign(Number input, Tensor other)

[ghstack-poisoned]
Related #38349

[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) 
- No in-place function
- No method
- Optional output
- Available: byte, char, bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: float/double complex

`c = np.copysign(a, b)`
|  a |  b |  c | a.grad |
|:--:|:--:|:--:|:----:|
| -1 | -1 | -1 |   1  |
| -0 | -1 | -0 |   0  |
|  0 | -1 | -0 |  0  |
|  1 | -1 | -1 |  -1  |
| -1 | -0 |  1 |  -1  |
| -0 | -0 |  0 |  0  |
|  0 | -0 |  0 |   0  |
|  1 | -0 |  1 |   1  |
| -1 |  0 |  1 |  -1  |
| -0 |  0 |  0 |  0  |
|  0 |  0 |  0 |   0  |
|  1 |  0 |  1 |   1  |
| -1 |  1 |  1 |  -1  |
| -0 |  1 |  0 |  0  |
|  0 |  1 |  0 |   0  |
|  1 |  1 |  1 |   1  |

This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.

TODO:
- [x] test
- [x] doc
- [x] ~kernel_vec~
- [ ] torch.copysign(Number input, Tensor other)

[ghstack-poisoned]
Related #38349

[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) 
- No in-place function
- No method
- Optional output
- Available: byte, char, bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: float/double complex

`c = np.copysign(a, b)`
|  a |  b |  c | a.grad |
|:--:|:--:|:--:|:----:|
| -1 | -1 | -1 |   1  |
| -0 | -1 | -0 |   0  |
|  0 | -1 | -0 |  0  |
|  1 | -1 | -1 |  -1  |
| -1 | -0 |  1 |  -1  |
| -0 | -0 |  0 |  0  |
|  0 | -0 |  0 |   0  |
|  1 | -0 |  1 |   1  |
| -1 |  0 |  1 |  -1  |
| -0 |  0 |  0 |  0  |
|  0 |  0 |  0 |   0  |
|  1 |  0 |  1 |   1  |
| -1 |  1 |  1 |  -1  |
| -0 |  1 |  0 |  0  |
|  0 |  1 |  0 |   0  |
|  1 |  1 |  1 |   1  |

This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.

TODO:
- [x] test
- [x] doc
- [x] ~kernel_vec~
- [ ] torch.copysign(Number input, Tensor other)

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Oct 29, 2020
ghstack-source-id: c1a5bac31d691283301d70fa185e689f923b090a
Pull Request resolved: #46396
test/test_torch.py Outdated Show resolved Hide resolved
test/test_torch.py Outdated Show resolved Hide resolved
test/test_torch.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @ejguan. I just made a few comments on the test_torch.py test. This is looking really good. @zou3519, you'll shepherd this the rest of the way, right?

@mruberry
Copy link
Collaborator

mruberry commented Oct 30, 2020

I removed tests with explicit expected outputs, because I think it's enough to compare the results between PyTorch and NumPy with random cases and special cases (0.0/-0.0/inf/-inf/nan).

Sounds great.

And, I did not use cross product of dtypes because I need to enumerate all types (float/integer/boolean) for checking type promotion and result for both arguments of copysign. But, all the special cases (0.0/-0.0/inf/-inf/nan) are only required floating precision input as the second argument of copysign.

Just test if the second dtype is a float type or not and only perform that part of the test if it is.

@ejguan
Copy link
Contributor Author

ejguan commented Oct 30, 2020

Nice work, @ejguan. I just made a few comments on the test_torch.py test. This is looking really good. @zou3519, you'll shepherd this the rest of the way, right?

Thank you so much mike for reviewing the code and giving suggestions. @mruberry

test/test_torch.py Outdated Show resolved Hide resolved
torch/_torch_docs.py Outdated Show resolved Hide resolved

[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) 
- No in-place function
- No method
- Optional output
- Available: byte, char, bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: float/double complex

`c = np.copysign(a, b)`
|  a |  b |  c | a.grad |
|:--:|:--:|:--:|:----:|
| -1 | -1 | -1 |   1  |
| -0 | -1 | -0 |   0  |
|  0 | -1 | -0 |  0  |
|  1 | -1 | -1 |  -1  |
| -1 | -0 |  -1 |  1  |
| -0 | -0 | -0 |  0  |
|  0 | -0 | -0 |   0  |
|  1 | -0 |  -1 |   -1  |
| -1 |  0 |  1 |  -1  |
| -0 |  0 |  0 |  0  |
|  0 |  0 |  0 |   0  |
|  1 |  0 |  1 |   1  |
| -1 |  1 |  1 |  -1  |
| -0 |  1 |  0 |  0  |
|  0 |  1 |  0 |   0  |
|  1 |  1 |  1 |   1  |

This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.

TODO:
- [x] test (cpu/gpu)
- [x] doc
- [x] ~kernel_vec~

Differential Revision: [D24401366](https://our.internmc.facebook.com/intern/diff/D24401366)

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Nov 2, 2020
ghstack-source-id: 39d7768cbfba6c9c4eb5a090d0182b2423d9f990
Pull Request resolved: #46396
test/test_torch.py Outdated Show resolved Hide resolved
test/test_torch.py Outdated Show resolved Hide resolved
test/test_torch.py Outdated Show resolved Hide resolved
Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) 
- No in-place function
- No method
- Optional output
- Available: byte, char, bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: float/double complex

`c = np.copysign(a, b)`
|  a |  b |  c | a.grad |
|:--:|:--:|:--:|:----:|
| -1 | -1 | -1 |   1  |
| -0 | -1 | -0 |   0  |
|  0 | -1 | -0 |  0  |
|  1 | -1 | -1 |  -1  |
| -1 | -0 |  -1 |  1  |
| -0 | -0 | -0 |  0  |
|  0 | -0 | -0 |   0  |
|  1 | -0 |  -1 |   -1  |
| -1 |  0 |  1 |  -1  |
| -0 |  0 |  0 |  0  |
|  0 |  0 |  0 |   0  |
|  1 |  0 |  1 |   1  |
| -1 |  1 |  1 |  -1  |
| -0 |  1 |  0 |  0  |
|  0 |  1 |  0 |   0  |
|  1 |  1 |  1 |   1  |

This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.

TODO:
- [x] test (cpu/gpu)
- [x] doc
- [x] ~kernel_vec~

Differential Revision: [D24401366](https://our.internmc.facebook.com/intern/diff/D24401366)

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Nov 2, 2020
ghstack-source-id: caeb4463c61473183ac336f7d6eecb4fd05483c0
Pull Request resolved: #46396

[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) 
- No in-place function
- No method
- Optional output
- Available: byte, char, bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: float/double complex

`c = np.copysign(a, b)`
|  a |  b |  c | a.grad |
|:--:|:--:|:--:|:----:|
| -1 | -1 | -1 |   1  |
| -0 | -1 | -0 |   0  |
|  0 | -1 | -0 |  0  |
|  1 | -1 | -1 |  -1  |
| -1 | -0 |  -1 |  1  |
| -0 | -0 | -0 |  0  |
|  0 | -0 | -0 |   0  |
|  1 | -0 |  -1 |   -1  |
| -1 |  0 |  1 |  -1  |
| -0 |  0 |  0 |  0  |
|  0 |  0 |  0 |   0  |
|  1 |  0 |  1 |   1  |
| -1 |  1 |  1 |  -1  |
| -0 |  1 |  0 |  0  |
|  0 |  1 |  0 |   0  |
|  1 |  1 |  1 |   1  |

This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.

TODO:
- [x] test (cpu/gpu)
- [x] doc
- [x] ~kernel_vec~

Differential Revision: [D24401366](https://our.internmc.facebook.com/intern/diff/D24401366)

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Nov 3, 2020
ghstack-source-id: fdf7fbb840449a34cf23b09735eba05a1235fb19
Pull Request resolved: #46396
@ejguan
Copy link
Contributor Author

ejguan commented Nov 3, 2020

Update because of the following two reasons:

  1. torch.bfloat16 can not hold -nan
  2. torch.half can not hold -nan on CUDA

@facebook-github-bot
Copy link
Contributor

@ejguan merged this pull request in f1ac63d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants