Custom Gradients #19302

abhaskumarsinha · 2024-03-13T14:54:51Z

With little bit of tweaking of the syntax, now its possible to have custom gradients for all three - JAX, PyTorch and TensorFlow.

Sorry if I don't know, tho I added the unit tests, I don't know how to run to test it.

Example of syntax:

Here's the rough notebook testing it:

I believe custom addition of function and their custom gradient definitions would enable the users to implement complex layers and operations even if that isn't available in other frameworks. It could also help them work with custom data structures like - complex numbers, non metric spaces or probabilistic spaces.

Example Syntax:

@keras.ops.custom_gradient
def fun(x):
    z = x * x
    def grad(*args, upstream=None):
        if upstream is None:
            # tf.custom_gradient convention
            upstream, = args
        return upstream * x * 10
    return z, grad

x = torch.tensor([2.0], requires_grad = True)
z = fun(x)
z.sum().backward()
x.grad

Only difference is that grad function accepts two arguments (one keyword) in the case of PyTorch and one argument in the case of TensorFlow or Jax. So additional

def grad(*args, upstream = None):
     if upstream == None:
          upstream, = args

doesn't hurt more.

Thank You.

Added a support for @custom_gradient decorator for PyTorch users.

Added documentation for ops.custom_gradient and edited the syntax of log1pexp(x) example to demonstrate new syntax.

Syntax Error in the example

Updated core_test.py with PyTorch Test

Corrected the syntax in test case.

Correcting docs

google-cla · 2024-03-13T14:54:56Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

fchollet

Thanks for the PR. Very nice work, I would not have anticipated it to be feasible like this!

fchollet · 2024-03-13T16:08:10Z

keras/backend/torch/core.py

-    raise NotImplementedError(
-        "`custom_gradient` is not supported with torch backend"
-    )
+class custom_gradient:


To keep the API consistent across backends, we may want to make this a functions def custom_gradient(fun) that instantiates CustomGradientFunction and calls it later when the function is called?

@fchollet Yes. The @custom_gradient in torch/core.py acts as a decorator or custom_gradient(fun) as you said. It takes the function fun and returns a torch.autograd.Function instance (that is, a function whose gradient formula is now known to PyTorch framework).

Additionally, CustomGradientFunction class has forward() and backward() functions with our required forward and backward definitions, as usually custom definitions are defined in PyTorch: https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html

The effort was to unify the syntax and the method of creating new functions with custom gradients, across all backends. Since, PyTorch doesn't suppors decorator method of defining new gradients, I had to write a custom one for that.

For testing purposes, the second colab notebook uses my fork of Keras (because that has Custom Gradients implemented) and all frameworks one by one to test it. Just ensure to restart the Colab instance once to test another framework as backend after testing one.

fchollet · 2024-03-13T16:08:50Z

keras/backend/torch/core.py

+
+class CustomGradientFunction(torch.autograd.Function):
+    """
+    Autograd function for custom gradients.


One-line summary should be on the first line (after """)

fchollet · 2024-03-13T16:10:26Z

Sorry if I don't know, tho I added the unit tests, I don't know how to run to test it.

Run pytest keras/ to run all unit tests. You can also specify the path to a single test file or directory.

codecov-commenter · 2024-03-13T16:14:35Z

Codecov Report

Attention: Patch coverage is 76.92308% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 75.85%. Comparing base (c8700f4) to head (2534ca1).
Report is 125 commits behind head on master.

Files	Patch %	Lines
keras/backend/torch/core.py	76.92%	4 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #19302      +/-   ##
==========================================
- Coverage   80.14%   75.85%   -4.29%     
==========================================
  Files         341      367      +26     
  Lines       36163    40433    +4270     
  Branches     7116     7864     +748     
==========================================
+ Hits        28982    30671    +1689     
- Misses       5578     8065    +2487     
- Partials     1603     1697      +94

Flag	Coverage Δ
keras	`75.71% <76.92%> (-4.29%)`	⬇️
keras-jax	`60.12% <30.76%> (-2.93%)`	⬇️
keras-numpy	`54.39% <30.76%> (-2.69%)`	⬇️
keras-tensorflow	`61.29% <30.76%> (-3.37%)`	⬇️
keras-torch	`60.41% <76.92%> (-3.46%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Added one-line summary of the class in line `454`

fchollet · 2024-03-20T18:57:36Z

@james77777778 what do you think about this functionality and the new upstream argument?

james77777778 · 2024-03-21T01:15:06Z

@james77777778 what do you think about this functionality and the new upstream argument?

This functionality is crucial for the following QLoRA-like technique. The new upstream argument is a bit tricky but I think it is worth the effort for the backend-agnostic feature.

I have tested the code and it works well. The training results in torch are consistent with the other backends.
(using my fork: https://github.com/james77777778/keras/blob/gradient/benchmark.py)

fchollet · 2024-03-21T16:52:08Z

keras/ops/core.py

@@ -661,10 +662,19 @@ def custom_gradient(f):
    def log1pexp(x):
        e = ops.exp(x)

-        def grad(upstream):
+        def grad(*args, upstream = None):


upstream=None

fchollet · 2024-03-21T16:53:01Z

keras/ops/core.py

            return ops.multiply(upstream, 1.0 - 1.0 / ops.add(1, e))

        return ops.log(1 + e), grad
    ```
+
+    Note that the grad function that returns gradient computations 


To make this clearer, please provide two separate code examples, one for JAX/TF and one for torch.

To make this clearer, please provide two separate code examples, one for JAX/TF and one for torch.

Added one JAX/TF-specific backend example, one PyTorch and one for all three together (backend-invariant).
Can you check if my edited PRs are visible to you or not?

fchollet · 2024-03-21T16:53:08Z

keras/ops/core_test.py

@@ -501,15 +501,20 @@ def test_is_tensor(self):
        self.assertFalse(ops.is_tensor([1, 2, 3]))

    @pytest.mark.skipif(
-        backend.backend() not in ("tensorflow", "jax"),
+        backend.backend() not in ("tensorflow", "jax", "pytorch"),


It's "torch"

fchollet · 2024-03-21T16:53:26Z

keras/backend/torch/core.py

+    @staticmethod
+    def backward(ctx, grad_output):
+        """
+        Backward pass computation.


Move this line to after """

Although I've made edits. But this part is not of much interest to the user. It just works as a wrapper to pull syntax to unify backends.

fchollet · 2024-03-21T16:53:40Z

keras/backend/torch/core.py

+    @staticmethod
+    def forward(ctx, forward_fn, *args, **kwargs):
+        """
+        Forward pass computation.


Move this line to after """

fchollet · 2024-03-21T16:53:53Z

keras/backend/torch/core.py

+
+class CustomGradientFunction(torch.autograd.Function):
+    """
+    CustomGradientFunction is a PyTorch autograd function enabling custom forward and backward passes for gradient computation.


Shorten line (break it up into a few lines)

abhaskumarsinha · 2024-03-22T06:56:41Z

keras/ops/core_test.py

+        elif backend.backend() == "torch":
+            import torch
+
+            x = torch.tensor(100.0, requires_grad = True) # x = ops.convert_to_tensor(100.0) is NOT supported Yet!


Hello, @james77777778 @fchollet I just found out why PyT unit tests for @custom_gradient were failing all along.

The problem is that if one wants to calculate or work with gradients of a variable in torch, then regular definition of that variable by tensor torch.tensor(100.0) doesn't works. It needs additional torch.tensor(100.0, requires_grad = True) argument to be able to have .grad instance variable later on.

Keras' ops.convert_to_tensor() too requires this argument to define tensor in PyTorch using requires_grad argument to be able to calculate the gradient.

This is why I had to manually use PyT here to define torch.tensor rather than to have ops.convert_to_tensor to make it work.

You can instead do:

x = keras.Variable(100.0).value

This will work with all backends.

fchollet · 2024-03-22T17:11:05Z

I'll merge the PR and fix remaining issues in post. Thank you for the contribution!

abhaskumarsinha added 6 commits March 13, 2024 19:12

Update core.py

5e99929

Added a support for @custom_gradient decorator for PyTorch users.

Update core.py

71b58b6

Added documentation for ops.custom_gradient and edited the syntax of log1pexp(x) example to demonstrate new syntax.

Update core.py

9229a81

Syntax Error in the example

Update core_test.py

185e8ca

Updated core_test.py with PyTorch Test

Update core_test.py

aaddfac

Corrected the syntax in test case.

Update core.py

b596525

Correcting docs

google-ml-butler bot added the size:M label Mar 13, 2024

google-ml-butler bot assigned gbaned Mar 13, 2024

fchollet reviewed Mar 13, 2024

View reviewed changes

Update core.py

33d2632

Added one-line summary of the class in line `454`

gbaned added this to Assigned Reviewer in PR Queue via automation Mar 14, 2024

fchollet added the keras-team-review-pending Pending review by a Keras team member. label Mar 18, 2024

fchollet reviewed Mar 21, 2024

View reviewed changes

abhaskumarsinha added 10 commits March 22, 2024 10:09

Update core.py

8867661

Update core.py

9700f22

Update core_test.py

3615cce

Update core.py

c520fdc

Update core.py

17c1849

Update core_test.py

606dbf8

Update core.py

b4e9c4b

Update core.py

2265b4f

Update core_test.py

0b69080

Update core_test.py

11f3b4c

abhaskumarsinha commented Mar 22, 2024

View reviewed changes

Update core_test.py

2534ca1

james77777778 mentioned this pull request Mar 22, 2024

Introduce QLoRA-like technique #19356

Merged

fchollet merged commit ddba5d8 into keras-team:master Mar 22, 2024
5 of 6 checks passed

PR Queue automation moved this from Assigned Reviewer to Merged Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Gradients #19302

Custom Gradients #19302

abhaskumarsinha commented Mar 13, 2024

google-cla bot commented Mar 13, 2024

fchollet left a comment

fchollet Mar 13, 2024

abhaskumarsinha Mar 13, 2024

fchollet Mar 13, 2024

fchollet commented Mar 13, 2024

codecov-commenter commented Mar 13, 2024 •

edited

Loading

fchollet commented Mar 20, 2024

james77777778 commented Mar 21, 2024 •

edited

Loading

fchollet Mar 21, 2024

fchollet Mar 21, 2024

abhaskumarsinha Mar 22, 2024

fchollet Mar 21, 2024

fchollet Mar 21, 2024

abhaskumarsinha Mar 22, 2024

fchollet Mar 21, 2024

fchollet Mar 21, 2024

abhaskumarsinha Mar 22, 2024

fchollet Mar 22, 2024

fchollet commented Mar 22, 2024

Custom Gradients #19302

Custom Gradients #19302

Conversation

abhaskumarsinha commented Mar 13, 2024

google-cla bot commented Mar 13, 2024

fchollet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fchollet commented Mar 13, 2024

codecov-commenter commented Mar 13, 2024 • edited Loading

Codecov Report

fchollet commented Mar 20, 2024

james77777778 commented Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fchollet commented Mar 22, 2024

codecov-commenter commented Mar 13, 2024 •

edited

Loading

james77777778 commented Mar 21, 2024 •

edited

Loading