Skip to content

Conversation

@vguerra
Copy link
Contributor

@vguerra vguerra commented Nov 6, 2019

The following math functions are now differentiable:

  • remainder
  • fmod
  • ceil
  • floor
  • round
  • trunc

As well, this PR makes usage of @differentiating instead of
@differentiable attribute for derivate registration.

NOTE: For the time being this exposes a compiler crash that might ( or not )
be related to TF-429.

Resolves TF-812

@vguerra
Copy link
Contributor Author

vguerra commented Nov 6, 2019

For some context on this PR, please refer to #27953

@vguerra
Copy link
Contributor Author

vguerra commented Nov 6, 2019

hi @dan-zheng, could you please trigger the CI. I get locally a compiler crash on this PR :'(

@dan-zheng dan-zheng added the tensorflow This is for "tensorflow" branch PRs. label Nov 6, 2019
@dan-zheng
Copy link
Contributor

@swift-ci Please test tensorflow

@dan-zheng
Copy link
Contributor

I believe stdlib compilation is fixed in f50beec!


Explanation: currently, functions marked with @differentiating must return a tuple with one of the following label schemes:

  • (value: ..., differential: ...)
  • (value: ..., pullback: ...)

The differential: label indicates that the @differentiating function is a forward-mode derivative function, or JVP. It returns a differential function, hence the label.

The pullback: label indicates that the @differentiating function is a reverse-mode derivative function, or VJP. It returns a pullback function.

Currently, only reverse-mode differentiation is officially supported; forward-mode differentiation is off-by-default because it is not yet at feature parity with reverse-mode differentiation. Thus, this PR should use the @differentiating attribute to register pullback-returning reverse-mode derivatives. The compiler expects to find reverse-mode derivatives and will generate them if they do not exist, hence the CI failure (TF-429).


Sorry for the lack of documentation!

Eventually, when @differentiable(linear) functions and transposition are fully implemented, @differentiating attribute may be changed to only register differential-returning derivative functions.

The differentiable programming manifesto describes the ideal final design for differentiable programming, so it doesn't mention the pullback: label, which may be removed.

Our custom differentiation tutorial shows differentiation examples that work today, including derivative registration with the pullback: label, and more.

@vguerra
Copy link
Contributor Author

vguerra commented Nov 7, 2019

Thank you @dan-zheng for the fix and explanation. Indeed it was not clear to me when to use the pullback label. I should have looked at the differentiation tutorial indeed. Thanks!

This way we should be able to register derivates for the trigonometric, hyperbolic and error functions using the @differentiating attribute as well. This reduces the amount of custom code in tgmath.swift.gyb file to achieve that. I am just making sure tests pass locally and will push a new commit addressing that.

@vguerra
Copy link
Contributor Author

vguerra commented Nov 7, 2019

Oh .. I hit the problem of registering derivatives in a diff. file:

/usr/local/ML/s4tf/compiler/swift/stdlib/public/Platform/tgmath.swift.gyb:576:2: error: derivative not in the same file as the original function
@differentiating(log1p)
~^~~~~~~~~~~~~~~~~~~~~~

So this is not possible ... yet :).

@vguerra vguerra force-pushed the differentiating-math branch from a042779 to e5f58c7 Compare November 7, 2019 15:14
@dan-zheng
Copy link
Contributor

Oh .. I hit the problem of registering derivatives in a diff. file:

/usr/local/ML/s4tf/compiler/swift/stdlib/public/Platform/tgmath.swift.gyb:576:2: error: derivative not in the same file as the original function
@differentiating(log1p)
~^~~~~~~~~~~~~~~~~~~~~~

So this is not possible ... yet :).

Yes. This limitation will be lifted soon - there's ongoing work to support retroactive (including cross-module) derivative registration.

@dan-zheng
Copy link
Contributor

@swift-ci Please test tensorflow

@dan-zheng
Copy link
Contributor

@swift-ci Please test tensorflow macOS

Copy link
Contributor

@dan-zheng dan-zheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcrasi: could you please help check whether the registered derivatives are ~mathematically correct? I think all of the newly differentiable functions have discontinuities (e.g. floor) but perhaps we can pick a sensible implementation or follow TensorFlow's precedent.

@dan-zheng dan-zheng requested a review from marcrasi November 8, 2019 05:47
@vguerra
Copy link
Contributor Author

vguerra commented Nov 8, 2019

Just a side note. I based the implementation of the derivatives according to what pytorch does here :

but indeed I am not sure I understood the reminder and floor derivatives, will have a look at the links you sent .. and as well interested in what @marcrasi has to say :).

Copy link

@marcrasi marcrasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no opinion about what to do at discontinuities because I don't know what the best practices for that are. If we're doing something that pytorch does, seems good to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pullback should be

{ v in (v, -v * (x / y).rounded(.toNearestOrEven) }

because remainder(x, y) = x - y * (x / y).rounded(.toNearestOrEven) (https://developer.apple.com/documentation/swift/double/2884269-remainder).

I think the reason pytorch doesn't have this is that they're only defining the gradient wrt the first argument.

It could do something different at discontinuities. But I don't know anything about best practices for what to do at discontinuities, so we might as well leave it as the simplest thing for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed on fc402e074639c6f42878817fe7b42f9a4f613bc5

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed on fc402e074639c6f42878817fe7b42f9a4f613bc5

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all the functions that you have added are piecewise linear between discontinuities, finite differences give nearly exact results (as long as you don't cross a discontinuity).

Some tests with finite differences would make me a lot more confident that all the edge cases and signs are handled properly. Something like this:

func checkGradient(_ f: @differentiable (T, T) -> T, _ x: T, _ y: T) -> (T, T) {
  let eps: T = 0.1
  let grad = gradient(at: x, y, in: f)
  expectEqualWithTolerance((f(x + eps, y) - f(x, y)) / eps, grad.0)
  expectEqualWithTolerance((f(x, y + eps) - f(x, y)) / eps, grad.1)
}

for x in -10...10 {
  for y in -10...10 {
    guard y != 0 else { continue }
    // Check at half-integers to avoid crossing discontinuities.
    checkGradient(remainder, x + 0.5, y + 0.5)
    checkGradient(fmod, x + 0.5, y + 0.5)
  }
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually half-integers are not good enough to avoid discontinuities. e.g. 0.5 divides 0.5.

This loop body should work better:

guard y != 0 && abs(remainder(x, y)) > 0 else { continue }
checkGradient(remainder, x, y)
checkGradient(fmod, x, y)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed on fc402e074639c6f42878817fe7b42f9a4f613bc5

@vguerra
Copy link
Contributor Author

vguerra commented Nov 10, 2019

hi @marcrasi , thank you for the review & suggestions, i'll address them and push shortly a new commit.

@vguerra vguerra force-pushed the differentiating-math branch from e5f58c7 to fc402e0 Compare November 13, 2019 16:12
@vguerra
Copy link
Contributor Author

vguerra commented Nov 13, 2019

I pushed fc402e074639c6f42878817fe7b42f9a4f613bc5 to address the issues with derivatives of remainder and fmod, plus the way we test them. All of this based on suggestions from @marcrasi ( thanks Marc! ).

I had to rebase the branch, sorry for that :/

Comment on lines +296 to +298
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out that for the combinations of x and y that don't satisfy this condition, computing the derivative for remainder using the definition of derivative would give unexpected results ( I guess due to the fact of it being discontinus ) hence I exclude them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: ) where T == T.TangentVector { or { should be moved to the next line, with indentation level 0, to give visual separation between the argument list and the function body

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 7ce2e4c

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the rounding necessary? Can you make this work by making it more tolerant instead of by rounding?

More tolerance seems like a safer test than rounding, because rounding could mask for example a situation where the gradient is supposed to be 1.8 but we calculate it as 2. (Based on my understanding, the gradients will actually always be integers, so this hypothetical situation will never actually happen. But tests are good for catching cases where we have misunderstood something, so it would be good to have a test that could catch such a situation.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the eps value I was using ( 0.001) I needed to use tolerance of around 3700 for the check to pass, which seemed high to me, hence I opted for rounding, but you make a good point on having the ability to catch future errors here.

If I increase the eps to be 0.1 then the test pass with a tolerance of 32.

done in 7ce2e4c

@vguerra
Copy link
Contributor Author

vguerra commented Nov 17, 2019

hi @dan-zheng , could you please trigger CI on this PR?

@dan-zheng
Copy link
Contributor

@swift-ci Please test tensorflow

@vguerra
Copy link
Contributor Author

vguerra commented Nov 17, 2019

I am having a look at the tests that fail.

vguerra and others added 5 commits November 18, 2019 10:50
The following math functions are now differentiable:

* `remainder`
* `fmod`
* `ceil`
* `floor`
* `round`
* `trunc`

As well, this PR makes usage of @differentiating instead of
@differentiable attribute for derivate registration.

NOTE: For the time being this exposes a compiler crash that might ( or not )
be related to [TF-429](https://bugs.swift.org/browse/TF-429).

Resolves [TF-812](https://bugs.swift.org/browse/TF-812)
Change functions declared with the `@differentiating` attribute to return
a tuple with the `pullback:` label instead of the `differential:` label.

The `pullback:` label indicates that the `@differentiating` function
is a reverse-mode derivative function (VJP), not a forward-mode derivative
function (JVP).

Eventually, when `@differentiable(linear)` functions and transposition
are fully implemented, `@differentiating` attribute may be changed to
only register differential-returning derivative functions.
As well, extend the test strategy to double check derivatives.
as well, addressing formatting remakrs.
From 0.1 to 0.01 but we need to adjust ulps to 192.
@vguerra vguerra force-pushed the differentiating-math branch from 7ce2e4c to 3dc7ee0 Compare November 18, 2019 21:06
@vguerra
Copy link
Contributor Author

vguerra commented Nov 18, 2019

tgmath.swift.gybtests are now passing locally for me, could you please trigger CI once again? .. thanks!

@marcrasi
Copy link

@swift-ci Please test tensorflow

3 similar comments
@marcrasi
Copy link

@swift-ci Please test tensorflow

@marcrasi
Copy link

@swift-ci Please test tensorflow

@marcrasi
Copy link

@swift-ci Please test tensorflow

@vguerra
Copy link
Contributor Author

vguerra commented Nov 20, 2019

checks passed :) ... would you agree that we could merge this now?

@marcrasi
Copy link

Yes, seems good to me!

@marcrasi marcrasi merged commit 9c79811 into swiftlang:tensorflow Nov 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tensorflow This is for "tensorflow" branch PRs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants