Refactor get analytical jacobian #54049

soulitzer · 2021-03-16T05:36:18Z

Stack from ghstack:

Implement faster gradcheck but not enabled for most things #54480 Implement faster gradcheck and enable for all ops except some
Factor out numerical logic #54479 Factor out numerical logic
Refactor get numerical jacobian to calculate wrt all outputs at once #54378 Refactor get numerical jacobian to calculate wrt all outputs at once
Refactor get numerical jacobian #54092 Refactor get numerical jacobian
Refactor get analytical jacobian #54049 Refactor get analytical jacobian

For release notes:

torch.autograd.gradcheck.get_analytical_jacobian (not part of the public api) is being deprecated.

--- end --

The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing f(grad_out) = grad_out^T J = grad_input. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch.

Edit: I realize a lot of things this PR was originally aiming to allow is actually possible with hooks, hence the tests have already been added in a earlier PR in the stack. But this is still slightly useful for reducing code duplication when adding the new fast gradcheck code (more details below)

After this change, get_analytical_jacobian is only responsible for gathering a list of rows that are later combined into a single Jacobian tensor. This means we don't have to perform any checks for correctness of the dtypes/size at this step

We factor out that logic into a separate function, combine_jacobian_rows, which handles the list of rows -> single Tensor step for each jacobian, and the error checking it entails. (This allows this code to be shared between the fast/slow versions.)

Differential Revision: D27307240

[ghstack-poisoned]

facebook-github-bot · 2021-03-16T05:36:25Z

💊 CI failures summary and remediations

As of commit 844893f (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

ghstack-source-id: ef5731f910d79422b46a41a77d9633e796529f58 Pull Request resolved: #54049

The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch. [ghstack-poisoned]

ghstack-source-id: 6180a546d579ce64d7801e9c60d80b25882574e6 Pull Request resolved: #54049

The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch. Edit: I realize a lot of things this PR was originally aiming to allow is actually possible with hooks, hence the tests have already been added in a earlier PR in the stack. But this is still slightly useful for reducing code duplication when adding the new fast gradcheck code (more details below) After this change, `get_analytical_jacobian` is only responsible for gathering a list of rows that are later combined into a single Jacobian tensor. This means we don't have to perform any checks for correctness of the dtypes/size at this step We factor out that logic into a separate function, `combine_jacobian_rows`, which handles the list of rows -> single Tensor step for each jacobian, and the error checking it entails. (This allows this code to be shared between the fast/slow versions.) [ghstack-poisoned]

zou3519

refactor looks correct, I had a few questions

zou3519 · 2021-03-23T19:40:57Z

torch/autograd/gradcheck.py

+    jacobians_rows = get_analytical_jacobian(fn, output.clone(), grad_out_scale)
+    jacobians_rows_reentrant = get_analytical_jacobian(fn, output.clone(), grad_out_scale)


I never really understood what the reentrant check is testing. What is it testing?

The reason why I am asking is because previously, we would compute row 0 of the first jacobian, then row 0 of the second jacobian, then row 1 of the first jacobian, then row 1 of the second jacobian, etc. This PR changes it so that we compute the full first jacobian followed by the full second jacobian. Does this affect the reentrant check?

I don't really know what its named 'reentrant' check, AFAICT its just computing something twice and checking if they are the same as to see if its deterministic. A better name imo should be determinism check not reentrant check. Maybe @albanD knows more.

Offline @albanD said this is re-entrant in the sense that it is calling the same function twice. This is different from when we call autograd from within autograd, which we also refer to as "re-entrant". So yes, looks like a case of bad naming

Also I realized that the code does explain this in an error message:

pytorch/torch/autograd/gradcheck.py

Lines 472 to 475 in 87989a6

error_msg = "Backward" + error_str + " is not reentrant, i.e., running backward with same \

input and grad_output multiple times gives different values, \

although analytical gradient matches numerical gradient. \

The tolerance for nondeterminism was {}.".format(nondet_tol)

zou3519 · 2021-03-23T19:43:44Z

torch/autograd/gradcheck.py

+    # NB: we can't combine the rows into a single jacobian tensor because fn(v) for
+    # different v may return tensors with different number of elements


Is this because we expect fn(v) to return tensors with the same number of elements if the gradient formula is correct but are not making that assumption because it can be wrong?

torch/autograd/gradcheck.py

The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch. Edit: I realize a lot of things this PR was originally aiming to allow is actually possible with hooks, hence the tests have already been added in a earlier PR in the stack. But this is still slightly useful for reducing code duplication when adding the new fast gradcheck code (more details below) After this change, `get_analytical_jacobian` is only responsible for gathering a list of rows that are later combined into a single Jacobian tensor. This means we don't have to perform any checks for correctness of the dtypes/size at this step We factor out that logic into a separate function, `combine_jacobian_rows`, which handles the list of rows -> single Tensor step for each jacobian, and the error checking it entails. (This allows this code to be shared between the fast/slow versions.) [ghstack-poisoned]

The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch. Edit: I realize a lot of things this PR was originally aiming to allow is actually possible with hooks, hence the tests have already been added in a earlier PR in the stack. But this is still slightly useful for reducing code duplication when adding the new fast gradcheck code (more details below) After this change, `get_analytical_jacobian` is only responsible for gathering a list of rows that are later combined into a single Jacobian tensor. This means we don't have to perform any checks for correctness of the dtypes/size at this step We factor out that logic into a separate function, `combine_jacobian_rows`, which handles the list of rows -> single Tensor step for each jacobian, and the error checking it entails. (This allows this code to be shared between the fast/slow versions.) Differential Revision: [D27307240](https://our.internmc.facebook.com/intern/diff/D27307240) [ghstack-poisoned]

This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck. - `compute_gradient` and `compute_numerical_jacobian_cols` have fewer responsibilities: - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor) - we have two stages again where we first compute the columns separately, then combine them [ghstack-poisoned]

facebook-github-bot · 2021-03-26T18:21:35Z

@soulitzer merged this pull request in df70e2f.

This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck. - `compute_gradient` and `compute_numerical_jacobian_cols` have fewer responsibilities: - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor) - we have two stages again where we first compute the columns separately, then combine them [ghstack-poisoned]

Summary: Pull Request resolved: #54479 This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck. - `compute_gradient` and `compute_numerical_jacobian_cols` have fewer responsibilities: - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor) - we have two stages again where we first compute the columns separately, then combine them Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D27728727 Pulled By: soulitzer fbshipit-source-id: fad3d5c1a91882621039beae3d0ecf633c19c28c

Summary: Pull Request resolved: pytorch#54479 This change is similar to pytorch#54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck. - `compute_gradient` and `compute_numerical_jacobian_cols` have fewer responsibilities: - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor) - we have two stages again where we first compute the columns separately, then combine them Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D27728727 Pulled By: soulitzer fbshipit-source-id: fad3d5c1a91882621039beae3d0ecf633c19c28c

Refactor get analytical jacobian

d6bda1f

[ghstack-poisoned]

soulitzer requested a review from albanD as a code owner March 16, 2021 05:36

This was referenced Mar 16, 2021

Refactor gradcheck #53857

Closed

Gradcheck small fixes #53916

Closed

facebook-github-bot added the cla signed label Mar 16, 2021

soulitzer added a commit that referenced this pull request Mar 16, 2021

Refactor get analytical jacobian

185d334

ghstack-source-id: ef5731f910d79422b46a41a77d9633e796529f58 Pull Request resolved: #54049

soulitzer changed the title ~~Refactor get analytical jacobian~~ (WIP) Refactor get analytical jacobian Mar 16, 2021

soulitzer added a commit that referenced this pull request Mar 16, 2021

Refactor get analytical jacobian

1a23531

ghstack-source-id: 6180a546d579ce64d7801e9c60d80b25882574e6 Pull Request resolved: #54049

soulitzer mentioned this pull request Mar 16, 2021

Refactor get numerical jacobian #54092

Closed

soulitzer added 3 commits March 18, 2021 19:00

soulitzer changed the title ~~(WIP) Refactor get analytical jacobian~~ Refactor get analytical jacobian Mar 19, 2021

soulitzer requested a review from zou3519 March 19, 2021 21:21

This was referenced Mar 20, 2021

Refactor get numerical jacobian to calculate wrt all outputs at once #54378

Closed

Factor out numerical logic #54479

Closed

Implement faster gradcheck but not enabled for most things #54480

Closed

zou3519 reviewed Mar 23, 2021

View reviewed changes

facebook-github-bot closed this in df70e2f Mar 26, 2021

facebook-github-bot added the Merged label Mar 26, 2021

facebook-github-bot deleted the gh/soulitzer/3/head branch March 30, 2021 14:17

soulitzer added the module: bc-breaking Related to a BC-breaking change label Mar 31, 2021

soulitzer mentioned this pull request Apr 2, 2021

torch.autograd.gradcheck.get_numerical_jacobian in utils.gradcheck will be deprecated in PT 1.9 NVIDIA/MinkowskiEngine#341

Open

soulitzer added module: deprecation and removed module: bc-breaking Related to a BC-breaking change labels Apr 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor get analytical jacobian #54049

Refactor get analytical jacobian #54049

soulitzer commented Mar 16, 2021 •

edited

facebook-github-bot commented Mar 16, 2021 •

edited

zou3519 left a comment

zou3519 Mar 23, 2021

soulitzer Mar 23, 2021

zou3519 Mar 24, 2021

zou3519 Mar 24, 2021

zou3519 Mar 23, 2021

facebook-github-bot commented Mar 26, 2021

		jacobians_rows = get_analytical_jacobian(fn, output.clone(), grad_out_scale)
		jacobians_rows_reentrant = get_analytical_jacobian(fn, output.clone(), grad_out_scale)

	error_msg = "Backward" + error_str + " is not reentrant, i.e., running backward with same \
	input and grad_output multiple times gives different values, \
	although analytical gradient matches numerical gradient. \
	The tolerance for nondeterminism was {}.".format(nondet_tol)

		# NB: we can't combine the rows into a single jacobian tensor because fn(v) for
		# different v may return tensors with different number of elements

Refactor get analytical jacobian #54049

Refactor get analytical jacobian #54049

Conversation

soulitzer commented Mar 16, 2021 • edited

For release notes:

facebook-github-bot commented Mar 16, 2021 • edited

💊 CI failures summary and remediations

zou3519 left a comment

Choose a reason for hiding this comment

zou3519 Mar 23, 2021

Choose a reason for hiding this comment

soulitzer Mar 23, 2021

Choose a reason for hiding this comment

zou3519 Mar 24, 2021

Choose a reason for hiding this comment

zou3519 Mar 24, 2021

Choose a reason for hiding this comment

zou3519 Mar 23, 2021

Choose a reason for hiding this comment

facebook-github-bot commented Mar 26, 2021

soulitzer commented Mar 16, 2021 •

edited

facebook-github-bot commented Mar 16, 2021 •

edited