Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpInfo tests fail gradcheck for ops with TensorList inputs. #51996

Closed
heitorschueroff opened this issue Feb 9, 2021 · 17 comments
Closed

OpInfo tests fail gradcheck for ops with TensorList inputs. #51996

heitorschueroff opened this issue Feb 9, 2021 · 17 comments
Labels
module: autograd Related to torch.autograd, and the autograd engine in general module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul module: tests Issues related to tests (not the torch.testing module) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@heitorschueroff
Copy link
Contributor

heitorschueroff commented Feb 9, 2021

For ops like torch.stack and torch.linalg.multi_dot that take a Tensor[] as input as opposed to a single Tensor, OpInfo testing does not work properly.

If the sample_inputs_func returns a tuple of tensors per SampleInput, it will be passed as individual positional parameters to the op. If instead each SampleInput is returned with a list, then gradcheck will fail with the following error:

FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_multi_dot_cpu_float64 - ValueError: gradcheck expects at least one input tensor to require gradient, but none of the them have requires_grad=True.

The current workaround is to create a lambda as follows:

op=lambda *args, out=None: torch.linalg.multi_dot([*args], out=out),

However, this is changing the actual op which can cause problems in the future, for instance, when comparing the results of the op against NumPy.

cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved @soulitzer @jianyuh @mruberry @heitorschueroff @walterddr @IvanYashchuk @VitalyFedyunin

@mrshenli mrshenli added module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Feb 9, 2021
@mruberry mruberry added the module: autograd Related to torch.autograd, and the autograd engine in general label Feb 10, 2021
@mruberry
Copy link
Collaborator

cc @albanD and @Lilyjjo

Let's focus exclusively on the autograd side of this issue for the moment, as how we address it may affect the fix for the jit tests.

@albanD I think this is a bug in {grad, gradgrad}check where they don't handle tuple inputs properly?

Before @anjali411 implemented its OpInfo, for example, torch.stack had a custom autograd test (see 8fb5f16) but I don't think it used gradcheck. When given a tuple of tensors as input gradcheck will try to splat them when calling the function, but functions like torch.stack don't support a variable number of tensor arguments.

One way to fix this might be to add a "grad_input_function" to gradcheck. This idea has come up elsewhere (#50837 cc @IvanYashchuk), too.

@albanD, seems like you're in favor of that solution on the other issue, so let's start by trying it out as a fix for this, too?

@albanD
Copy link
Collaborator

albanD commented Feb 10, 2021

One way to fix this might be to add a "grad_input_function" to gradcheck. This idea has come up elsewhere (#50837 cc @IvanYashchuk), too.

The proposal there was not to modify gradcheck but opinfo right? And have a pre-processing function before calling the user function.

@albanD I think this is a bug in {grad, gradgrad}check where they don't handle tuple inputs properly?

This is explicitly not supported. The doc says: "func (function) – a Python function that takes Tensor inputs and returns a Tensor or a tuple of Tensors": tuple of Tensors is not supported as input to the function.
So I think the "fix" is to wrap the user function in another function that does the massaging of the arguments.

@mruberry
Copy link
Collaborator

One way to fix this might be to add a "grad_input_function" to gradcheck. This idea has come up elsewhere (#50837 cc @IvanYashchuk), too.

The proposal there was not to modify gradcheck but opinfo right? And have a pre-processing function before calling the user function.

You are correct.

This is explicitly not supported. The doc says: "func (function) – a Python function that takes Tensor inputs and returns a Tensor or a tuple of Tensors": tuple of Tensors is not supported as input to the function.
So I think the "fix" is to wrap the user function in another function that does the massaging of the arguments.

Yes, I think this is conceptually what the input function would do.

@mruberry mruberry added the module: tests Issues related to tests (not the torch.testing module) label Feb 10, 2021
@anjali411
Copy link
Contributor

I don't think we need to make any updates in OpInfo. It's easy enough to handle this case by just passing a lambda function in op. For example, check out tests for stack, vstack etc.

@mruberry
Copy link
Collaborator

We could continue to wrap ops in lambdas, that's true, but that's also what we want to avoid. Wrapping ops in lambdas is going to bite us later when we go to compare with NumPy references, for example. Basically we want to only wrap the op in a lambda in some contexts (like gradcheck).

@mruberry
Copy link
Collaborator

Offline discussion w/@albanD: let's continue to investigate wrapping the op in a lambda before gradchecking. Maybe subsume the current output_func post-process, which currently gets wrapped with the op, too.

@zou3519
Copy link
Contributor

zou3519 commented Feb 12, 2021

Couldn't gradcheck flatten the list of tensors, do things with the tensors, and then reconstruct the list of tensors when it comes time to call the function?

@albanD
Copy link
Collaborator

albanD commented Feb 17, 2021

The question is how do you differentiate between a single argument that is a tuple of tensor vs a tuple containing multiple Tensors that should be passed as separate arguments to the function?

@zou3519
Copy link
Contributor

zou3519 commented Feb 17, 2021

The question is how do you differentiate between a single argument that is a tuple of tensor vs a tuple containing multiple Tensors that should be passed as separate arguments to the function?

I am not suggesting that we would do this (I'm just curious), but to go down this rabbit hole: we'd have to modify the signature of gradcheck so that instead of taking inputs as inputs: Union[torch.Tensor, Sequence[torch.Tensor]] (that is where the ambiguity comes from), it would take in a Tuple of inputs to be passed to the function.

So if a user wanted to pass in multiple tensors, they would do gradcheck(..., inputs=(a, b, c))
If a user wanted to pass in a single argument that is a list of tensors, they would do: gradcheck(..., inputs=([a, b, c],))

This would make the handling of functions with TensorLists nicer, especially if we think that we'll add more TensorList operations to the API

@heitorschueroff
Copy link
Contributor Author

I think @zou3519's suggestion is the cleanest approach from an API perspective. Otherwise, wrapping the function in a lambda that converts from the flattened list of tensors back into the correct arguments to the op is not so straightforward. What if we had an op that takes 2 TensorLists. We'd have to keep track of counts to know go back from the reconstruct the input arguments from the flattened list of tensors.

@albanD
Copy link
Collaborator

albanD commented Feb 17, 2021

It is cleaner but is very BC-breaking :/
We would need to go through a full cycle of deprecation for this.

And yes if it takes multiple Tensor list, you will need to do slicing. Which can still be done automatically, but is indeed a bit more complex.

@zou3519
Copy link
Contributor

zou3519 commented Feb 17, 2021

It is cleaner but is very BC-breaking :/
We would need to go through a full cycle of deprecation for this.

We could have our internal gradcheck do this first (that's why we have an internal gradcheck :D)

@anjali411
Copy link
Contributor

It is cleaner but is very BC-breaking :/
We would need to go through a full cycle of deprecation for this.

We could have our internal gradcheck do this first (that's why we have an internal gradcheck :D)

internal gradcheck??? which one is that? I thought there is just the one (the one in gradcheck.py)

@zou3519
Copy link
Contributor

zou3519 commented Feb 18, 2021

internal gradcheck??? which one is that? I thought there is just the one (the one in gradcheck.py)

It is here:

def gradcheck(fn, inputs, **kwargs):
# Wrapper around gradcheck that enables certain keys by default.
# Use this testing-internal gradcheck instead of autograd.gradcheck so that new features like vmap and
# forward-mode AD are tested by default. We create this wrapper because we'd like to keep new checks
# to be disabled to default for the public-facing api to avoid breaking user code.
#
# All PyTorch devs doing testing should use this wrapper instead of autograd.gradcheck.
keys_enabled_by_default = (
"check_batched_grad",)
for key in keys_enabled_by_default:
kwargs[key] = kwargs.get(key, True)
return torch.autograd.gradcheck(fn, inputs, **kwargs)

Here's the original issue for context: #49409

@imaginary-person
Copy link
Contributor

imaginary-person commented Mar 5, 2021

The failure cause of the following tests seems to be similar to that of test_fn_grad_linalg_multi_dot_cpu_float64 :

TestCommonCUDA.test_variant_consistency_jit_pow_cuda_complex128 
TestCommonCUDA.test_variant_consistency_jit_pow_cuda_complex64
TestCommonCPU.test_variant_consistency_jit_pow_cpu_complex128 
TestCommonCPU.test_variant_consistency_jit_pow_cpu_complex64

Failure cause

RuntimeError: expected ) but found 'ident' here:
  File "<string>", line 3

def the_method(i0):
    return torch.pow(i0, (3.14+3.14j))
                                   ~ <--- HERE

@mruberry
Copy link
Collaborator

mruberry commented Mar 5, 2021

The failure cause of the following tests seems to be similar to that of test_fn_grad_linalg_multi_dot_cpu_float64 :

That looks like the jit's failure to understand complex literals. I have a PR disabling those tests for dtypes other than float32, btw. We get too much noise and very little signal from them for other dtypes.

@ezyang
Copy link
Contributor

ezyang commented Mar 5, 2021

btw complex literal support is coming very soon, #52881

heitorschueroff added a commit that referenced this issue Mar 5, 2021
This PR adds a workaround to OpInfo tests for ops that take TensoList inputs which were failing gradcheck (see #51996). It also updates the *stack ops to pass in a list of Tensors as input to SampleInput to demonstrate that the tests now pass without the need for defining a lambda in op.

[ghstack-poisoned]
heitorschueroff added a commit that referenced this issue Mar 5, 2021
This PR adds a workaround to OpInfo tests for ops that take TensoList inputs which were failing gradcheck (see #51996). It also updates the *stack ops to pass in a list of Tensors as input to SampleInput to demonstrate that the tests now pass without the need for defining a lambda in op.

Differential Revision: [D26860788](https://our.internmc.facebook.com/intern/diff/D26860788)

[ghstack-poisoned]
heitorschueroff added a commit that referenced this issue Mar 6, 2021
This PR adds a workaround to OpInfo tests for ops that take TensoList inputs which were failing gradcheck (see #51996). It also updates the *stack ops to pass in a list of Tensors as input to SampleInput to demonstrate that the tests now pass without the need for defining a lambda in op.

Differential Revision: [D26860788](https://our.internmc.facebook.com/intern/diff/D26860788)

[ghstack-poisoned]
heitorschueroff added a commit that referenced this issue Mar 6, 2021
This PR adds a workaround to OpInfo tests for ops that take TensoList inputs which were failing gradcheck (see #51996). It also updates the *stack ops to pass in a list of Tensors as input to SampleInput to demonstrate that the tests now pass without the need for defining a lambda in op.

Differential Revision: [D26860788](https://our.internmc.facebook.com/intern/diff/D26860788)

[ghstack-poisoned]
heitorschueroff added a commit that referenced this issue Mar 9, 2021
This PR adds a workaround to OpInfo tests for ops that take TensoList inputs which were failing gradcheck (see #51996). It also updates the *stack ops to pass in a list of Tensors as input to SampleInput to demonstrate that the tests now pass without the need for defining a lambda in op.

Differential Revision: [D26860788](https://our.internmc.facebook.com/intern/diff/D26860788)

[ghstack-poisoned]
heitorschueroff added a commit that referenced this issue Mar 11, 2021
This PR adds a workaround to OpInfo tests for ops that take TensoList inputs which were failing gradcheck (see #51996). It also updates the *stack ops to pass in a list of Tensors as input to SampleInput to demonstrate that the tests now pass without the need for defining a lambda in op.

Differential Revision: [D26860788](https://our.internmc.facebook.com/intern/diff/D26860788)

[ghstack-poisoned]
@heitorschueroff heitorschueroff changed the title OpInfo does not work well with operators that take Tensor[] as input. OpInfo tests fail gradcheck for ops with TensorList inputs. Mar 12, 2021
heitorschueroff added a commit that referenced this issue Mar 12, 2021
This PR adds a workaround to OpInfo tests for ops that take TensoList inputs which were failing gradcheck (see #51996). It also updates the *stack ops to pass in a list of Tensors as input to SampleInput to demonstrate that the tests now pass without the need for defining a lambda in op.

Differential Revision: [D26860788](https://our.internmc.facebook.com/intern/diff/D26860788)

[ghstack-poisoned]
heitorschueroff added a commit that referenced this issue Mar 17, 2021
…orList inputs"


This PR adds a workaround to OpInfo tests for ops that take TensoList inputs which were failing gradcheck (see #51996). It also updates the *stack ops to pass in a list of Tensors as input to SampleInput to demonstrate that the tests now pass without the need for defining a lambda in op.

Differential Revision: [D26860788](https://our.internmc.facebook.com/intern/diff/D26860788)

[ghstack-poisoned]
heitorschueroff added a commit that referenced this issue Mar 17, 2021
This PR adds a workaround to OpInfo tests for ops that take TensoList inputs which were failing gradcheck (see #51996). It also updates the *stack ops to pass in a list of Tensors as input to SampleInput to demonstrate that the tests now pass without the need for defining a lambda in op.

Differential Revision: [D26860788](https://our.internmc.facebook.com/intern/diff/D26860788)

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: autograd Related to torch.autograd, and the autograd engine in general module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul module: tests Issues related to tests (not the torch.testing module) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
8 participants