Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backprop optimization: leveraging function symmetries #117

Closed
rsokl opened this issue Feb 27, 2019 · 1 comment
Closed

Backprop optimization: leveraging function symmetries #117

rsokl opened this issue Feb 27, 2019 · 1 comment

Comments

@rsokl
Copy link
Owner

rsokl commented Feb 27, 2019

I wish I could label this with "good first issue", but this issue is a bit meatier than the label would suggest. That being said, this has the potential to be a really fun issue for an eager developer to take on. It:

  • will speeds some things up, which is about the most exciting thing you can do in mygrad 😜
  • requires a simple, but potentially elegant revision of Operation
  • entails writing some slick automated tests

If there is anyone, or multiple people, who would like to participate in this, please let me know. I think it would be a great learning experience. I will happily provide guidance ranging from: "very hands on", to: "give general insights and review", depending on your needs/preferences.

Obviously if there is not any interest, I will end up taking this on myself in a few weeks or so. However, if you do want to take this on but at a later date, just let me know.

Math

f(x, y) is symmetric if f(x, y) = f(y, x). Thus the following is true for the derivatives for a symmetric function f:

image (1)

Suppose we want to compute the total derivative of a symmetric function with identical inputs. I.e.:

image (2)

given the relationship deduced above, this can be reduced to a single partial derivative.

image (3)

Obviously, this reduction extends trivially to symmetric functions of N inputs, where the factor of 2 becomes a factor of N.

Current State of MyGrad

Presently, MyGrad will always compute its derivatives in long-form (equation 2), even in the instance that it is dealing with a symmetric function that may receive identical inputs.

An exception to this is EinSum, which implements its own backprop so that common optimized sum-reduction cases like einsum("..., ...", x, x) don't drag during backprop; it implements the logic of equation 3 when it has a symmetric reduction case and identical inputs.

Proposal

Operation should have a symmetries attribute that allows individual operations identify symmetry relationships among its inputs. This would mean that those operations with symmetries would check for identical inputs (as enforced by is, not ==), and would compute the total derivative using the reduced form (equation 3) where possible.

The outcome of this is some nice, simple optimizations so that users can freely write things like logaddexp(x, x) without incurring redundant computations during backprop.

@rsokl
Copy link
Owner Author

rsokl commented Feb 16, 2021

It is unclear how useful this is in general. Preliminary timings for special casing of (x * x).backward() did not indicate any appreciable speedup.

It might be good to support (x/x).backward() special casing because that would save a substantial amount of compute as well as potential numerical stability issues. Although, it is also potentially problematic for backprop through x / x to differ from x / +x in any appreciable way.

Closing this for now

@rsokl rsokl closed this as completed Feb 16, 2021
@rsokl rsokl unpinned this issue Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant