-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[AutoDiff] Defines remaining derivatives for tgmath functions. #28108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
For some context on this PR, please refer to #27953 |
|
hi @dan-zheng, could you please trigger the CI. I get locally a compiler crash on this PR :'( |
|
@swift-ci Please test tensorflow |
|
I believe stdlib compilation is fixed in f50beec! Explanation: currently, functions marked with
The The Currently, only reverse-mode differentiation is officially supported; forward-mode differentiation is off-by-default because it is not yet at feature parity with reverse-mode differentiation. Thus, this PR should use the Sorry for the lack of documentation! Eventually, when The differentiable programming manifesto describes the ideal final design for differentiable programming, so it doesn't mention the Our custom differentiation tutorial shows differentiation examples that work today, including derivative registration with the |
|
Thank you @dan-zheng for the fix and explanation. Indeed it was not clear to me when to use the This way we should be able to register derivates for the trigonometric, hyperbolic and error functions using the |
|
Oh .. I hit the problem of registering derivatives in a diff. file: /usr/local/ML/s4tf/compiler/swift/stdlib/public/Platform/tgmath.swift.gyb:576:2: error: derivative not in the same file as the original function
@differentiating(log1p)
~^~~~~~~~~~~~~~~~~~~~~~So this is not possible ... yet :). |
a042779 to
e5f58c7
Compare
Yes. This limitation will be lifted soon - there's ongoing work to support retroactive (including cross-module) derivative registration. |
|
@swift-ci Please test tensorflow |
|
@swift-ci Please test tensorflow macOS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcrasi: could you please help check whether the registered derivatives are ~mathematically correct? I think all of the newly differentiable functions have discontinuities (e.g. floor) but perhaps we can pick a sensible implementation or follow TensorFlow's precedent.
ceil,floor,round,trunc- These are discontinuous at various points (integer values, or
x + 0.5values forround). - TensorFlow defines these to return a zero gradient:
CeilGrad,FloorGrad,RoundGrad,_TruncateDivGrad.
- These are discontinuous at various points (integer values, or
remainder,fmod(remainder from truncated division)- I'm not sure
{ v in (v, v) }is a correct pullback for these functions. Math StackExchange results for "derivative of remainder".
- I'm not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no opinion about what to do at discontinuities because I don't know what the best practices for that are. If we're doing something that pytorch does, seems good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pullback should be
{ v in (v, -v * (x / y).rounded(.toNearestOrEven) }
because remainder(x, y) = x - y * (x / y).rounded(.toNearestOrEven) (https://developer.apple.com/documentation/swift/double/2884269-remainder).
I think the reason pytorch doesn't have this is that they're only defining the gradient wrt the first argument.
It could do something different at discontinuities. But I don't know anything about best practices for what to do at discontinuities, so we might as well leave it as the simplest thing for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed on fc402e074639c6f42878817fe7b42f9a4f613bc5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ v in (v, -v * (x / y).rounded(.towardZero) }
(https://developer.apple.com/documentation/swift/double/2885641-truncatingremainder)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed on fc402e074639c6f42878817fe7b42f9a4f613bc5
test/stdlib/tgmath.swift.gyb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all the functions that you have added are piecewise linear between discontinuities, finite differences give nearly exact results (as long as you don't cross a discontinuity).
Some tests with finite differences would make me a lot more confident that all the edge cases and signs are handled properly. Something like this:
func checkGradient(_ f: @differentiable (T, T) -> T, _ x: T, _ y: T) -> (T, T) {
let eps: T = 0.1
let grad = gradient(at: x, y, in: f)
expectEqualWithTolerance((f(x + eps, y) - f(x, y)) / eps, grad.0)
expectEqualWithTolerance((f(x, y + eps) - f(x, y)) / eps, grad.1)
}
for x in -10...10 {
for y in -10...10 {
guard y != 0 else { continue }
// Check at half-integers to avoid crossing discontinuities.
checkGradient(remainder, x + 0.5, y + 0.5)
checkGradient(fmod, x + 0.5, y + 0.5)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually half-integers are not good enough to avoid discontinuities. e.g. 0.5 divides 0.5.
This loop body should work better:
guard y != 0 && abs(remainder(x, y)) > 0 else { continue }
checkGradient(remainder, x, y)
checkGradient(fmod, x, y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed on fc402e074639c6f42878817fe7b42f9a4f613bc5
|
hi @marcrasi , thank you for the review & suggestions, i'll address them and push shortly a new commit. |
e5f58c7 to
fc402e0
Compare
|
I pushed fc402e074639c6f42878817fe7b42f9a4f613bc5 to address the issues with derivatives of I had to rebase the branch, sorry for that :/ |
test/stdlib/tgmath.swift.gyb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found out that for the combinations of x and y that don't satisfy this condition, computing the derivative for remainder using the definition of derivative would give unexpected results ( I guess due to the fact of it being discontinus ) hence I exclude them.
test/stdlib/tgmath.swift.gyb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: ) where T == T.TangentVector { or { should be moved to the next line, with indentation level 0, to give visual separation between the argument list and the function body
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 7ce2e4c
test/stdlib/tgmath.swift.gyb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the rounding necessary? Can you make this work by making it more tolerant instead of by rounding?
More tolerance seems like a safer test than rounding, because rounding could mask for example a situation where the gradient is supposed to be 1.8 but we calculate it as 2. (Based on my understanding, the gradients will actually always be integers, so this hypothetical situation will never actually happen. But tests are good for catching cases where we have misunderstood something, so it would be good to have a test that could catch such a situation.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the eps value I was using ( 0.001) I needed to use tolerance of around 3700 for the check to pass, which seemed high to me, hence I opted for rounding, but you make a good point on having the ability to catch future errors here.
If I increase the eps to be 0.1 then the test pass with a tolerance of 32.
done in 7ce2e4c
6dcf239 to
04dca63
Compare
|
hi @dan-zheng , could you please trigger CI on this PR? |
|
@swift-ci Please test tensorflow |
|
I am having a look at the tests that fail. |
The following math functions are now differentiable: * `remainder` * `fmod` * `ceil` * `floor` * `round` * `trunc` As well, this PR makes usage of @differentiating instead of @differentiable attribute for derivate registration. NOTE: For the time being this exposes a compiler crash that might ( or not ) be related to [TF-429](https://bugs.swift.org/browse/TF-429). Resolves [TF-812](https://bugs.swift.org/browse/TF-812)
Change functions declared with the `@differentiating` attribute to return a tuple with the `pullback:` label instead of the `differential:` label. The `pullback:` label indicates that the `@differentiating` function is a reverse-mode derivative function (VJP), not a forward-mode derivative function (JVP). Eventually, when `@differentiable(linear)` functions and transposition are fully implemented, `@differentiating` attribute may be changed to only register differential-returning derivative functions.
As well, extend the test strategy to double check derivatives.
as well, addressing formatting remakrs.
From 0.1 to 0.01 but we need to adjust ulps to 192.
7ce2e4c to
3dc7ee0
Compare
|
|
|
@swift-ci Please test tensorflow |
3 similar comments
|
@swift-ci Please test tensorflow |
|
@swift-ci Please test tensorflow |
|
@swift-ci Please test tensorflow |
|
checks passed :) ... would you agree that we could merge this now? |
|
Yes, seems good to me! |
The following math functions are now differentiable:
remainderfmodceilfloorroundtruncAs well, this PR makes usage of @differentiating instead of
@differentiable attribute for derivate registration.
NOTE: For the time being this exposes a compiler crash that might ( or not )
be related to TF-429.
Resolves TF-812