[AutoDiff] Defines remaining derivatives for tgmath functions. #28108

vguerra · 2019-11-06T16:04:38Z

The following math functions are now differentiable:

remainder
fmod
ceil
floor
round
trunc

As well, this PR makes usage of @differentiating instead of
@differentiable attribute for derivate registration.

NOTE: For the time being this exposes a compiler crash that might ( or not )
be related to TF-429.

Resolves TF-812

vguerra · 2019-11-06T16:06:48Z

For some context on this PR, please refer to #27953

vguerra · 2019-11-06T16:07:40Z

hi @dan-zheng, could you please trigger the CI. I get locally a compiler crash on this PR :'(

dan-zheng · 2019-11-06T16:31:33Z

@swift-ci Please test tensorflow

dan-zheng · 2019-11-07T02:35:32Z

I believe stdlib compilation is fixed in f50beec!

Explanation: currently, functions marked with @differentiating must return a tuple with one of the following label schemes:

(value: ..., differential: ...)
(value: ..., pullback: ...)

The differential: label indicates that the @differentiating function is a forward-mode derivative function, or JVP. It returns a differential function, hence the label.

The pullback: label indicates that the @differentiating function is a reverse-mode derivative function, or VJP. It returns a pullback function.

Currently, only reverse-mode differentiation is officially supported; forward-mode differentiation is off-by-default because it is not yet at feature parity with reverse-mode differentiation. Thus, this PR should use the @differentiating attribute to register pullback-returning reverse-mode derivatives. The compiler expects to find reverse-mode derivatives and will generate them if they do not exist, hence the CI failure (TF-429).

Sorry for the lack of documentation!

Eventually, when @differentiable(linear) functions and transposition are fully implemented, @differentiating attribute may be changed to only register differential-returning derivative functions.

The differentiable programming manifesto describes the ideal final design for differentiable programming, so it doesn't mention the pullback: label, which may be removed.

Our custom differentiation tutorial shows differentiation examples that work today, including derivative registration with the pullback: label, and more.

vguerra · 2019-11-07T13:26:18Z

Thank you @dan-zheng for the fix and explanation. Indeed it was not clear to me when to use the pullback label. I should have looked at the differentiation tutorial indeed. Thanks!

This way we should be able to register derivates for the trigonometric, hyperbolic and error functions using the @differentiating attribute as well. This reduces the amount of custom code in tgmath.swift.gyb file to achieve that. I am just making sure tests pass locally and will push a new commit addressing that.

vguerra · 2019-11-07T15:05:05Z

Oh .. I hit the problem of registering derivatives in a diff. file:

/usr/local/ML/s4tf/compiler/swift/stdlib/public/Platform/tgmath.swift.gyb:576:2: error: derivative not in the same file as the original function
@differentiating(log1p)
~^~~~~~~~~~~~~~~~~~~~~~

So this is not possible ... yet :).

dan-zheng · 2019-11-07T15:19:37Z

Oh .. I hit the problem of registering derivatives in a diff. file:
/usr/local/ML/s4tf/compiler/swift/stdlib/public/Platform/tgmath.swift.gyb:576:2: error: derivative not in the same file as the original function
@differentiating(log1p)
~^~~~~~~~~~~~~~~~~~~~~~
So this is not possible ... yet :).

Yes. This limitation will be lifted soon - there's ongoing work to support retroactive (including cross-module) derivative registration.

dan-zheng · 2019-11-08T01:00:05Z

@swift-ci Please test tensorflow

dan-zheng · 2019-11-08T01:35:40Z

@swift-ci Please test tensorflow macOS

dan-zheng

@marcrasi: could you please help check whether the registered derivatives are ~mathematically correct? I think all of the newly differentiable functions have discontinuities (e.g. floor) but perhaps we can pick a sensible implementation or follow TensorFlow's precedent.

ceil, floor, round, trunc
- These are discontinuous at various points (integer values, or x + 0.5 values for round).
- TensorFlow defines these to return a zero gradient: CeilGrad, FloorGrad, RoundGrad, _TruncateDivGrad.
remainder, fmod (remainder from truncated division)
- I'm not sure { v in (v, v) } is a correct pullback for these functions. Math StackExchange results for "derivative of remainder".

vguerra · 2019-11-08T13:08:39Z

Just a side note. I based the implementation of the derivatives according to what pytorch does here :

but indeed I am not sure I understood the reminder and floor derivatives, will have a look at the links you sent .. and as well interested in what @marcrasi has to say :).

marcrasi

I have no opinion about what to do at discontinuities because I don't know what the best practices for that are. If we're doing something that pytorch does, seems good to me.

marcrasi · 2019-11-08T21:46:56Z

stdlib/public/Platform/tgmath.swift.gyb

The pullback should be

{ v in (v, -v * (x / y).rounded(.toNearestOrEven) }

because remainder(x, y) = x - y * (x / y).rounded(.toNearestOrEven) (https://developer.apple.com/documentation/swift/double/2884269-remainder).

I think the reason pytorch doesn't have this is that they're only defining the gradient wrt the first argument.

It could do something different at discontinuities. But I don't know anything about best practices for what to do at discontinuities, so we might as well leave it as the simplest thing for now.

Addressed on fc402e074639c6f42878817fe7b42f9a4f613bc5

marcrasi · 2019-11-08T21:47:48Z

stdlib/public/Platform/tgmath.swift.gyb

{ v in (v, -v * (x / y).rounded(.towardZero) }
(https://developer.apple.com/documentation/swift/double/2885641-truncatingremainder)

Addressed on fc402e074639c6f42878817fe7b42f9a4f613bc5

marcrasi · 2019-11-08T21:58:34Z

test/stdlib/tgmath.swift.gyb

Since all the functions that you have added are piecewise linear between discontinuities, finite differences give nearly exact results (as long as you don't cross a discontinuity).

Some tests with finite differences would make me a lot more confident that all the edge cases and signs are handled properly. Something like this:

func checkGradient(_ f: @differentiable (T, T) -> T, _ x: T, _ y: T) -> (T, T) { let eps: T = 0.1 let grad = gradient(at: x, y, in: f) expectEqualWithTolerance((f(x + eps, y) - f(x, y)) / eps, grad.0) expectEqualWithTolerance((f(x, y + eps) - f(x, y)) / eps, grad.1) } for x in -10...10 { for y in -10...10 { guard y != 0 else { continue } // Check at half-integers to avoid crossing discontinuities. checkGradient(remainder, x + 0.5, y + 0.5) checkGradient(fmod, x + 0.5, y + 0.5) } }

Actually half-integers are not good enough to avoid discontinuities. e.g. 0.5 divides 0.5.

This loop body should work better:

guard y != 0 && abs(remainder(x, y)) > 0 else { continue } checkGradient(remainder, x, y) checkGradient(fmod, x, y)

Addressed on fc402e074639c6f42878817fe7b42f9a4f613bc5

vguerra · 2019-11-10T18:16:27Z

hi @marcrasi , thank you for the review & suggestions, i'll address them and push shortly a new commit.

vguerra · 2019-11-13T16:14:56Z

I pushed fc402e074639c6f42878817fe7b42f9a4f613bc5 to address the issues with derivatives of remainder and fmod, plus the way we test them. All of this based on suggestions from @marcrasi ( thanks Marc! ).

I had to rebase the branch, sorry for that :/

vguerra · 2019-11-13T16:27:42Z

test/stdlib/tgmath.swift.gyb

I found out that for the combinations of x and y that don't satisfy this condition, computing the derivative for remainder using the definition of derivative would give unexpected results ( I guess due to the fact of it being discontinus ) hence I exclude them.

marcrasi · 2019-11-14T19:12:14Z

test/stdlib/tgmath.swift.gyb

style nit: ) where T == T.TangentVector { or { should be moved to the next line, with indentation level 0, to give visual separation between the argument list and the function body

done in 7ce2e4c

marcrasi · 2019-11-14T19:20:08Z

test/stdlib/tgmath.swift.gyb

Is the rounding necessary? Can you make this work by making it more tolerant instead of by rounding?

More tolerance seems like a safer test than rounding, because rounding could mask for example a situation where the gradient is supposed to be 1.8 but we calculate it as 2. (Based on my understanding, the gradients will actually always be integers, so this hypothetical situation will never actually happen. But tests are good for catching cases where we have misunderstood something, so it would be good to have a test that could catch such a situation.)

Given the eps value I was using ( 0.001) I needed to use tolerance of around 3700 for the check to pass, which seemed high to me, hence I opted for rounding, but you make a good point on having the ability to catch future errors here.

If I increase the eps to be 0.1 then the test pass with a tolerance of 32.

done in 7ce2e4c

vguerra · 2019-11-17T13:17:17Z

hi @dan-zheng , could you please trigger CI on this PR?

dan-zheng · 2019-11-17T17:52:47Z

@swift-ci Please test tensorflow

vguerra · 2019-11-17T22:22:33Z

I am having a look at the tests that fail.

@differentiable

The following math functions are now differentiable: * `remainder` * `fmod` * `ceil` * `floor` * `round` * `trunc` As well, this PR makes usage of @differentiating instead of @differentiable attribute for derivate registration. NOTE: For the time being this exposes a compiler crash that might ( or not ) be related to [TF-429](https://bugs.swift.org/browse/TF-429). Resolves [TF-812](https://bugs.swift.org/browse/TF-812)

Change functions declared with the `@differentiating` attribute to return a tuple with the `pullback:` label instead of the `differential:` label. The `pullback:` label indicates that the `@differentiating` function is a reverse-mode derivative function (VJP), not a forward-mode derivative function (JVP). Eventually, when `@differentiable(linear)` functions and transposition are fully implemented, `@differentiating` attribute may be changed to only register differential-returning derivative functions.

As well, extend the test strategy to double check derivatives.

as well, addressing formatting remakrs.

From 0.1 to 0.01 but we need to adjust ulps to 192.

vguerra · 2019-11-18T21:13:37Z

tgmath.swift.gybtests are now passing locally for me, could you please trigger CI once again? .. thanks!

marcrasi · 2019-11-18T21:26:19Z

@swift-ci Please test tensorflow

marcrasi · 2019-11-18T21:26:28Z

@swift-ci Please test tensorflow

marcrasi · 2019-11-18T21:27:12Z

@swift-ci Please test tensorflow

marcrasi · 2019-11-18T22:59:44Z

@swift-ci Please test tensorflow

vguerra · 2019-11-20T20:56:33Z

checks passed :) ... would you agree that we could merge this now?

marcrasi · 2019-11-20T21:25:18Z

Yes, seems good to me!

vguerra mentioned this pull request Nov 6, 2019

[TF-812] Defines remaining derivatives for tgmath functions. #27953

Closed

dan-zheng added the tensorflow This is for "tensorflow" branch PRs. label Nov 6, 2019

vguerra force-pushed the differentiating-math branch from a042779 to e5f58c7 Compare November 7, 2019 15:14

dan-zheng reviewed Nov 8, 2019

View reviewed changes

dan-zheng requested a review from marcrasi November 8, 2019 05:47

marcrasi suggested changes Nov 8, 2019

View reviewed changes

vguerra force-pushed the differentiating-math branch from e5f58c7 to fc402e0 Compare November 13, 2019 16:12

vguerra commented Nov 13, 2019

View reviewed changes

marcrasi approved these changes Nov 14, 2019

View reviewed changes

dan-zheng force-pushed the tensorflow branch from 6dcf239 to 04dca63 Compare November 17, 2019 02:40

vguerra and others added 5 commits November 18, 2019 10:50

Sqrt's VJP uses sqrt to compute value

ef48b6a

Fixing derivatives of remainder and fmod w.r.t. y variable.

a30582e

As well, extend the test strategy to double check derivatives.

avoid rounding of expected derivative values in tests.

4ca8cc5

as well, addressing formatting remakrs.

Decreasing epsilon used to compute derivative by definition.

3dc7ee0

From 0.1 to 0.01 but we need to adjust ulps to 192.

vguerra force-pushed the differentiating-math branch from 7ce2e4c to 3dc7ee0 Compare November 18, 2019 21:06

marcrasi merged commit 9c79811 into swiftlang:tensorflow Nov 20, 2019

[AutoDiff] Defines remaining derivatives for tgmath functions. #28108

[AutoDiff] Defines remaining derivatives for tgmath functions. #28108

Uh oh!

Conversation

vguerra commented Nov 6, 2019

Uh oh!

vguerra commented Nov 6, 2019

Uh oh!

vguerra commented Nov 6, 2019

Uh oh!

dan-zheng commented Nov 6, 2019

Uh oh!

dan-zheng commented Nov 7, 2019

Uh oh!

vguerra commented Nov 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vguerra commented Nov 7, 2019

Uh oh!

dan-zheng commented Nov 7, 2019

Uh oh!

dan-zheng commented Nov 8, 2019

Uh oh!

dan-zheng commented Nov 8, 2019

Uh oh!

dan-zheng left a comment

Choose a reason for hiding this comment

Uh oh!

vguerra commented Nov 8, 2019

Uh oh!

marcrasi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vguerra commented Nov 10, 2019

Uh oh!

vguerra commented Nov 13, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vguerra commented Nov 17, 2019

Uh oh!

dan-zheng commented Nov 17, 2019

Uh oh!

vguerra commented Nov 17, 2019

Uh oh!

vguerra commented Nov 18, 2019

Uh oh!

marcrasi commented Nov 18, 2019

Uh oh!

marcrasi commented Nov 18, 2019

Uh oh!

marcrasi commented Nov 18, 2019

Uh oh!

marcrasi commented Nov 18, 2019

Uh oh!

vguerra commented Nov 20, 2019

Uh oh!

marcrasi commented Nov 20, 2019

vguerra commented Nov 7, 2019 •

edited

Loading