OpInfo tests fail gradcheck for ops with TensorList inputs. #51996

heitorschueroff · 2021-02-09T22:16:00Z

For ops like torch.stack and torch.linalg.multi_dot that take a Tensor[] as input as opposed to a single Tensor, OpInfo testing does not work properly.

If the sample_inputs_func returns a tuple of tensors per SampleInput, it will be passed as individual positional parameters to the op. If instead each SampleInput is returned with a list, then gradcheck will fail with the following error:

FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_multi_dot_cpu_float64 - ValueError: gradcheck expects at least one input tensor to require gradient, but none of the them have requires_grad=True.

The current workaround is to create a lambda as follows:

op=lambda *args, out=None: torch.linalg.multi_dot([*args], out=out),

However, this is changing the actual op which can cause problems in the future, for instance, when comparing the results of the op against NumPy.

cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved @soulitzer @jianyuh @mruberry @heitorschueroff @walterddr @IvanYashchuk @VitalyFedyunin

The text was updated successfully, but these errors were encountered:

mruberry · 2021-02-10T00:39:43Z

cc @albanD and @Lilyjjo

Let's focus exclusively on the autograd side of this issue for the moment, as how we address it may affect the fix for the jit tests.

@albanD I think this is a bug in {grad, gradgrad}check where they don't handle tuple inputs properly?

Before @anjali411 implemented its OpInfo, for example, torch.stack had a custom autograd test (see 8fb5f16) but I don't think it used gradcheck. When given a tuple of tensors as input gradcheck will try to splat them when calling the function, but functions like torch.stack don't support a variable number of tensor arguments.

One way to fix this might be to add a "grad_input_function" to gradcheck. This idea has come up elsewhere (#50837 cc @IvanYashchuk), too.

@albanD, seems like you're in favor of that solution on the other issue, so let's start by trying it out as a fix for this, too?

albanD · 2021-02-10T15:19:13Z

One way to fix this might be to add a "grad_input_function" to gradcheck. This idea has come up elsewhere (#50837 cc @IvanYashchuk), too.

The proposal there was not to modify gradcheck but opinfo right? And have a pre-processing function before calling the user function.

@albanD I think this is a bug in {grad, gradgrad}check where they don't handle tuple inputs properly?

This is explicitly not supported. The doc says: "func (function) – a Python function that takes Tensor inputs and returns a Tensor or a tuple of Tensors": tuple of Tensors is not supported as input to the function.
So I think the "fix" is to wrap the user function in another function that does the massaging of the arguments.

mruberry · 2021-02-10T18:37:17Z

One way to fix this might be to add a "grad_input_function" to gradcheck. This idea has come up elsewhere (#50837 cc @IvanYashchuk), too.

The proposal there was not to modify gradcheck but opinfo right? And have a pre-processing function before calling the user function.

You are correct.

This is explicitly not supported. The doc says: "func (function) – a Python function that takes Tensor inputs and returns a Tensor or a tuple of Tensors": tuple of Tensors is not supported as input to the function.
So I think the "fix" is to wrap the user function in another function that does the massaging of the arguments.

Yes, I think this is conceptually what the input function would do.

anjali411 · 2021-02-10T22:59:10Z

I don't think we need to make any updates in OpInfo. It's easy enough to handle this case by just passing a lambda function in op. For example, check out tests for stack, vstack etc.

mruberry · 2021-02-11T03:20:40Z

We could continue to wrap ops in lambdas, that's true, but that's also what we want to avoid. Wrapping ops in lambdas is going to bite us later when we go to compare with NumPy references, for example. Basically we want to only wrap the op in a lambda in some contexts (like gradcheck).

mruberry · 2021-02-12T18:23:29Z

Offline discussion w/@albanD: let's continue to investigate wrapping the op in a lambda before gradchecking. Maybe subsume the current output_func post-process, which currently gets wrapped with the op, too.

zou3519 · 2021-02-12T21:21:59Z

Couldn't gradcheck flatten the list of tensors, do things with the tensors, and then reconstruct the list of tensors when it comes time to call the function?

albanD · 2021-02-17T00:45:25Z

The question is how do you differentiate between a single argument that is a tuple of tensor vs a tuple containing multiple Tensors that should be passed as separate arguments to the function?

zou3519 · 2021-02-17T16:38:34Z

The question is how do you differentiate between a single argument that is a tuple of tensor vs a tuple containing multiple Tensors that should be passed as separate arguments to the function?

I am not suggesting that we would do this (I'm just curious), but to go down this rabbit hole: we'd have to modify the signature of gradcheck so that instead of taking inputs as inputs: Union[torch.Tensor, Sequence[torch.Tensor]] (that is where the ambiguity comes from), it would take in a Tuple of inputs to be passed to the function.

So if a user wanted to pass in multiple tensors, they would do gradcheck(..., inputs=(a, b, c))
If a user wanted to pass in a single argument that is a list of tensors, they would do: gradcheck(..., inputs=([a, b, c],))

This would make the handling of functions with TensorLists nicer, especially if we think that we'll add more TensorList operations to the API

heitorschueroff · 2021-02-17T17:46:42Z

I think @zou3519's suggestion is the cleanest approach from an API perspective. Otherwise, wrapping the function in a lambda that converts from the flattened list of tensors back into the correct arguments to the op is not so straightforward. What if we had an op that takes 2 TensorLists. We'd have to keep track of counts to know go back from the reconstruct the input arguments from the flattened list of tensors.

albanD · 2021-02-17T21:33:52Z

It is cleaner but is very BC-breaking :/
We would need to go through a full cycle of deprecation for this.

And yes if it takes multiple Tensor list, you will need to do slicing. Which can still be done automatically, but is indeed a bit more complex.

zou3519 · 2021-02-17T22:20:27Z

It is cleaner but is very BC-breaking :/
We would need to go through a full cycle of deprecation for this.

We could have our internal gradcheck do this first (that's why we have an internal gradcheck :D)

anjali411 · 2021-02-18T00:40:06Z

It is cleaner but is very BC-breaking :/
We would need to go through a full cycle of deprecation for this.

We could have our internal gradcheck do this first (that's why we have an internal gradcheck :D)

internal gradcheck??? which one is that? I thought there is just the one (the one in gradcheck.py)

zou3519 · 2021-02-18T14:58:45Z

internal gradcheck??? which one is that? I thought there is just the one (the one in gradcheck.py)

It is here:

pytorch/torch/testing/_internal/common_utils.py

Lines 1941 to 1954 in c8b3686

    
           def gradcheck(fn, inputs, **kwargs): 
        
               # Wrapper around gradcheck that enables certain keys by default. 
        
               # Use this testing-internal gradcheck instead of autograd.gradcheck so that new features like vmap and 
        
               # forward-mode AD are tested by default. We create this wrapper because we'd like to keep new checks 
        
               # to be disabled to default for the public-facing api to avoid breaking user code. 
        
               # 
        
               # All PyTorch devs doing testing should use this wrapper instead of autograd.gradcheck. 
        
               keys_enabled_by_default = ( 
        
                   "check_batched_grad",) 
        
               for key in keys_enabled_by_default: 
        
                   kwargs[key] = kwargs.get(key, True) 
        
               return torch.autograd.gradcheck(fn, inputs, **kwargs)

Here's the original issue for context: #49409

imaginary-person · 2021-03-05T00:29:43Z

The failure cause of the following tests seems to be similar to that of test_fn_grad_linalg_multi_dot_cpu_float64 :

TestCommonCUDA.test_variant_consistency_jit_pow_cuda_complex128 
TestCommonCUDA.test_variant_consistency_jit_pow_cuda_complex64
TestCommonCPU.test_variant_consistency_jit_pow_cpu_complex128 
TestCommonCPU.test_variant_consistency_jit_pow_cpu_complex64

Failure cause

RuntimeError: expected ) but found 'ident' here:
  File "<string>", line 3

def the_method(i0):
    return torch.pow(i0, (3.14+3.14j))
                                   ~ <--- HERE

mruberry · 2021-03-05T10:22:07Z

The failure cause of the following tests seems to be similar to that of test_fn_grad_linalg_multi_dot_cpu_float64 :

That looks like the jit's failure to understand complex literals. I have a PR disabling those tests for dtypes other than float32, btw. We get too much noise and very little signal from them for other dtypes.

ezyang · 2021-03-05T16:08:57Z

btw complex literal support is coming very soon, #52881

This PR adds a workaround to OpInfo tests for ops that take TensoList inputs which were failing gradcheck (see #51996). It also updates the *stack ops to pass in a list of Tensors as input to SampleInput to demonstrate that the tests now pass without the need for defining a lambda in op. [ghstack-poisoned]

This PR adds a workaround to OpInfo tests for ops that take TensoList inputs which were failing gradcheck (see #51996). It also updates the *stack ops to pass in a list of Tensors as input to SampleInput to demonstrate that the tests now pass without the need for defining a lambda in op. Differential Revision: [D26860788](https://our.internmc.facebook.com/intern/diff/D26860788) [ghstack-poisoned]

…orList inputs" This PR adds a workaround to OpInfo tests for ops that take TensoList inputs which were failing gradcheck (see #51996). It also updates the *stack ops to pass in a list of Tensors as input to SampleInput to demonstrate that the tests now pass without the need for defining a lambda in op. Differential Revision: [D26860788](https://our.internmc.facebook.com/intern/diff/D26860788) [ghstack-poisoned]

This PR adds a workaround to OpInfo tests for ops that take TensoList inputs which were failing gradcheck (see #51996). It also updates the *stack ops to pass in a list of Tensors as input to SampleInput to demonstrate that the tests now pass without the need for defining a lambda in op. Differential Revision: [D26860788](https://our.internmc.facebook.com/intern/diff/D26860788) [ghstack-poisoned]

mrshenli added module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Feb 9, 2021

mruberry added the module: autograd Related to torch.autograd, and the autograd engine in general label Feb 10, 2021

mruberry added the module: tests Issues related to tests (not the torch.testing module) label Feb 10, 2021

heitorschueroff mentioned this issue Mar 5, 2021

Fix gradcheck failing for OpInfo with TensorList inputs #53418

Closed

heitorschueroff changed the title ~~OpInfo does not work well with operators that take Tensor[] as input.~~ OpInfo tests fail gradcheck for ops with TensorList inputs. Mar 12, 2021

heitorschueroff linked a pull request Mar 12, 2021 that will close this issue

Fix gradcheck failing for OpInfo with TensorList inputs #53418

Closed

heitorschueroff mentioned this issue Mar 29, 2021

Added support for TensorList inputs in OpInfo #54922

Closed

facebook-github-bot closed this as completed in 6d87b36 Mar 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpInfo tests fail gradcheck for ops with TensorList inputs. #51996

OpInfo tests fail gradcheck for ops with TensorList inputs. #51996

heitorschueroff commented Feb 9, 2021 •

edited

Loading

mruberry commented Feb 10, 2021

albanD commented Feb 10, 2021

mruberry commented Feb 10, 2021

anjali411 commented Feb 10, 2021

mruberry commented Feb 11, 2021

mruberry commented Feb 12, 2021

zou3519 commented Feb 12, 2021

albanD commented Feb 17, 2021

zou3519 commented Feb 17, 2021

heitorschueroff commented Feb 17, 2021

albanD commented Feb 17, 2021

zou3519 commented Feb 17, 2021

anjali411 commented Feb 18, 2021

zou3519 commented Feb 18, 2021

imaginary-person commented Mar 5, 2021 •

edited

Loading

mruberry commented Mar 5, 2021

ezyang commented Mar 5, 2021 •

edited

Loading

OpInfo tests fail gradcheck for ops with TensorList inputs. #51996

OpInfo tests fail gradcheck for ops with TensorList inputs. #51996

Comments

heitorschueroff commented Feb 9, 2021 • edited Loading

mruberry commented Feb 10, 2021

albanD commented Feb 10, 2021

mruberry commented Feb 10, 2021

anjali411 commented Feb 10, 2021

mruberry commented Feb 11, 2021

mruberry commented Feb 12, 2021

zou3519 commented Feb 12, 2021

albanD commented Feb 17, 2021

zou3519 commented Feb 17, 2021

heitorschueroff commented Feb 17, 2021

albanD commented Feb 17, 2021

zou3519 commented Feb 17, 2021

anjali411 commented Feb 18, 2021

zou3519 commented Feb 18, 2021

imaginary-person commented Mar 5, 2021 • edited Loading

mruberry commented Mar 5, 2021

ezyang commented Mar 5, 2021 • edited Loading

heitorschueroff commented Feb 9, 2021 •

edited

Loading

imaginary-person commented Mar 5, 2021 •

edited

Loading

ezyang commented Mar 5, 2021 •

edited

Loading