Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor get analytical jacobian #54049

Closed
wants to merge 14 commits into from

Conversation

soulitzer
Copy link
Contributor

@soulitzer soulitzer commented Mar 16, 2021

Stack from ghstack:

For release notes:

torch.autograd.gradcheck.get_analytical_jacobian (not part of the public api) is being deprecated.

--- end --

The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing f(grad_out) = grad_out^T J = grad_input. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch.

Edit: I realize a lot of things this PR was originally aiming to allow is actually possible with hooks, hence the tests have already been added in a earlier PR in the stack. But this is still slightly useful for reducing code duplication when adding the new fast gradcheck code (more details below)

After this change, get_analytical_jacobian is only responsible for gathering a list of rows that are later combined into a single Jacobian tensor. This means we don't have to perform any checks for correctness of the dtypes/size at this step

We factor out that logic into a separate function, combine_jacobian_rows, which handles the list of rows -> single Tensor step for each jacobian, and the error checking it entails. (This allows this code to be shared between the fast/slow versions.)

Differential Revision: D27307240

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 16, 2021

💊 CI failures summary and remediations

As of commit 844893f (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

soulitzer added a commit that referenced this pull request Mar 16, 2021
ghstack-source-id: ef5731f910d79422b46a41a77d9633e796529f58
Pull Request resolved: #54049
@soulitzer soulitzer changed the title Refactor get analytical jacobian (WIP) Refactor get analytical jacobian Mar 16, 2021
The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch.


[ghstack-poisoned]
The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch.


[ghstack-poisoned]
The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch.


[ghstack-poisoned]
The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch.


[ghstack-poisoned]
The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch.


[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Mar 16, 2021
ghstack-source-id: 6180a546d579ce64d7801e9c60d80b25882574e6
Pull Request resolved: #54049
The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch.

Edit: I realize a lot of things this PR was originally aiming to allow is actually possible with hooks, hence the tests have already been added in a earlier PR in the stack. But this is still slightly useful for reducing code duplication when adding the new fast gradcheck code (more details below)

After this change, `get_analytical_jacobian` is only responsible for gathering a list of rows that are later combined into a single Jacobian tensor. This means we don't have to perform any checks for correctness of the dtypes/size at this step

We factor out that logic into a separate function, `combine_jacobian_rows`, which handles the list of rows -> single Tensor step for each jacobian, and the error checking it entails. (This allows this code to be shared between the fast/slow versions.)



[ghstack-poisoned]
The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch.

Edit: I realize a lot of things this PR was originally aiming to allow is actually possible with hooks, hence the tests have already been added in a earlier PR in the stack. But this is still slightly useful for reducing code duplication when adding the new fast gradcheck code (more details below)

After this change, `get_analytical_jacobian` is only responsible for gathering a list of rows that are later combined into a single Jacobian tensor. This means we don't have to perform any checks for correctness of the dtypes/size at this step

We factor out that logic into a separate function, `combine_jacobian_rows`, which handles the list of rows -> single Tensor step for each jacobian, and the error checking it entails. (This allows this code to be shared between the fast/slow versions.)



[ghstack-poisoned]
The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch.

Edit: I realize a lot of things this PR was originally aiming to allow is actually possible with hooks, hence the tests have already been added in a earlier PR in the stack. But this is still slightly useful for reducing code duplication when adding the new fast gradcheck code (more details below)

After this change, `get_analytical_jacobian` is only responsible for gathering a list of rows that are later combined into a single Jacobian tensor. This means we don't have to perform any checks for correctness of the dtypes/size at this step

We factor out that logic into a separate function, `combine_jacobian_rows`, which handles the list of rows -> single Tensor step for each jacobian, and the error checking it entails. (This allows this code to be shared between the fast/slow versions.)



[ghstack-poisoned]
@soulitzer soulitzer changed the title (WIP) Refactor get analytical jacobian Refactor get analytical jacobian Mar 19, 2021
@soulitzer soulitzer requested a review from zou3519 March 19, 2021 21:21
The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch.

Edit: I realize a lot of things this PR was originally aiming to allow is actually possible with hooks, hence the tests have already been added in a earlier PR in the stack. But this is still slightly useful for reducing code duplication when adding the new fast gradcheck code (more details below)

After this change, `get_analytical_jacobian` is only responsible for gathering a list of rows that are later combined into a single Jacobian tensor. This means we don't have to perform any checks for correctness of the dtypes/size at this step

We factor out that logic into a separate function, `combine_jacobian_rows`, which handles the list of rows -> single Tensor step for each jacobian, and the error checking it entails. (This allows this code to be shared between the fast/slow versions.)



[ghstack-poisoned]
Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor looks correct, I had a few questions

Comment on lines 202 to 203
jacobians_rows = get_analytical_jacobian(fn, output.clone(), grad_out_scale)
jacobians_rows_reentrant = get_analytical_jacobian(fn, output.clone(), grad_out_scale)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never really understood what the reentrant check is testing. What is it testing?

The reason why I am asking is because previously, we would compute row 0 of the first jacobian, then row 0 of the second jacobian, then row 1 of the first jacobian, then row 1 of the second jacobian, etc. This PR changes it so that we compute the full first jacobian followed by the full second jacobian. Does this affect the reentrant check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know what its named 'reentrant' check, AFAICT its just computing something twice and checking if they are the same as to see if its deterministic. A better name imo should be determinism check not reentrant check. Maybe @albanD knows more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline @albanD said this is re-entrant in the sense that it is calling the same function twice. This is different from when we call autograd from within autograd, which we also refer to as "re-entrant". So yes, looks like a case of bad naming

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I realized that the code does explain this in an error message:

error_msg = "Backward" + error_str + " is not reentrant, i.e., running backward with same \
input and grad_output multiple times gives different values, \
although analytical gradient matches numerical gradient. \
The tolerance for nondeterminism was {}.".format(nondet_tol)

Comment on lines 232 to 233
# NB: we can't combine the rows into a single jacobian tensor because fn(v) for
# different v may return tensors with different number of elements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because we expect fn(v) to return tensors with the same number of elements if the gradient formula is correct but are not making that assumption because it can be wrong?

torch/autograd/gradcheck.py Outdated Show resolved Hide resolved
torch/autograd/gradcheck.py Outdated Show resolved Hide resolved
torch/autograd/gradcheck.py Outdated Show resolved Hide resolved
The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch.

Edit: I realize a lot of things this PR was originally aiming to allow is actually possible with hooks, hence the tests have already been added in a earlier PR in the stack. But this is still slightly useful for reducing code duplication when adding the new fast gradcheck code (more details below)

After this change, `get_analytical_jacobian` is only responsible for gathering a list of rows that are later combined into a single Jacobian tensor. This means we don't have to perform any checks for correctness of the dtypes/size at this step

We factor out that logic into a separate function, `combine_jacobian_rows`, which handles the list of rows -> single Tensor step for each jacobian, and the error checking it entails. (This allows this code to be shared between the fast/slow versions.)



[ghstack-poisoned]
The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch.

Edit: I realize a lot of things this PR was originally aiming to allow is actually possible with hooks, hence the tests have already been added in a earlier PR in the stack. But this is still slightly useful for reducing code duplication when adding the new fast gradcheck code (more details below)

After this change, `get_analytical_jacobian` is only responsible for gathering a list of rows that are later combined into a single Jacobian tensor. This means we don't have to perform any checks for correctness of the dtypes/size at this step

We factor out that logic into a separate function, `combine_jacobian_rows`, which handles the list of rows -> single Tensor step for each jacobian, and the error checking it entails. (This allows this code to be shared between the fast/slow versions.)

Differential Revision: [D27307240](https://our.internmc.facebook.com/intern/diff/D27307240)

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Mar 25, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Mar 26, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Mar 26, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

@soulitzer merged this pull request in df70e2f.

soulitzer added a commit that referenced this pull request Mar 26, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Mar 26, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot deleted the gh/soulitzer/3/head branch March 30, 2021 14:17
soulitzer added a commit that referenced this pull request Mar 31, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Mar 31, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
@soulitzer soulitzer added the module: bc-breaking Related to a BC-breaking change label Mar 31, 2021
soulitzer added a commit that referenced this pull request Apr 1, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Apr 2, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Apr 2, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Apr 2, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Apr 5, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Apr 5, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Apr 6, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Apr 6, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Apr 9, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
@soulitzer soulitzer added module: deprecation and removed module: bc-breaking Related to a BC-breaking change labels Apr 9, 2021
soulitzer added a commit that referenced this pull request Apr 12, 2021
This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this pull request Apr 13, 2021
Summary:
Pull Request resolved: #54479

This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D27728727

Pulled By: soulitzer

fbshipit-source-id: fad3d5c1a91882621039beae3d0ecf633c19c28c
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Pull Request resolved: pytorch#54479

This change is similar to pytorch#54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck.
 - `compute_gradient` and `compute_numerical_jacobian_cols` have  fewer responsibilities:
   - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives
   - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor)
  - we have two stages again where we first compute the columns separately, then combine them

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D27728727

Pulled By: soulitzer

fbshipit-source-id: fad3d5c1a91882621039beae3d0ecf633c19c28c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants