Skip to content

Conversation

mruberry
Copy link
Collaborator

@mruberry mruberry commented Apr 18, 2022

This PR makes the following improvements:

  • moves the custom skip list for test_normalize_operator_exhaustive in test_fx_experimental to use the typical OpInfo skip architecture. The skips were updated to xfails, and that identified some operators which were no longer failing the test
  • redundant tests with OpInfo-based testing in test_jit.py were removed
  • test_dtypes was improved so its error messages are clear and it makes test_nondifferentiable redundant; the latter test has been removed
  • OpInfo.supports_complex_autograd() is removed in favor of a more accurate and general test for whether the particular dtype is in the backward dtypes of the operator
  • gradchecks have been improved to verify that an operator doesn't support grad if it claims not to
  • gradchecks have been improved to test the gradient of all input tensors that require gradient
  • the concept of "default test dtypes" has been removed
  • excessive and mostly redundant out testing for elementwise unary operators has been removed
  • metadata for whether an op supports nuanced "safe casting" to out behavior has been removed from OpInfos
  • numerous skips have been converted to xfails
  • numerous OpInfos have had their metadata fixed based on the new checks
  • jit-specific utilities in common_methods_invocations.py have been moved to jit_programming_utils.py

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 18, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 06f225f (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-04-18T18:03:20.4598872Z The PR is introduc...m to confirm whether this change is wanted or not.
2022-04-18T18:03:20.4583713Z processing existing schema:  text(__torch__.torch.classes.profiling.SourceRef _0) -> (str _0)
2022-04-18T18:03:20.4585062Z processing existing schema:  count(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-04-18T18:03:20.4586781Z processing existing schema:  duration_ns(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-04-18T18:03:20.4588595Z processing existing schema:  source(__torch__.torch.classes.profiling.SourceStats _0) -> (__torch__.torch.classes.profiling.SourceRef _0)
2022-04-18T18:03:20.4591016Z processing existing schema:  line_map(__torch__.torch.classes.profiling.SourceStats _0) -> (Dict(int, __torch__.torch.classes.profiling.InstructionStats) _0)
2022-04-18T18:03:20.4591950Z processing existing schema:  __init__(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-04-18T18:03:20.4593706Z processing existing schema:  enable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-04-18T18:03:20.4595012Z processing existing schema:  disable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-04-18T18:03:20.4597606Z processing existing schema:  _dump_stats(__torch__.torch.classes.profiling._ScriptProfile _0) -> (__torch__.torch.classes.profiling.SourceStats[] _0)
2022-04-18T18:03:20.4598487Z processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (NoneType _0)
2022-04-18T18:03:20.4598872Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 
2022-04-18T18:03:20.4598882Z 
2022-04-18T18:03:20.4598987Z Broken ops: [
2022-04-18T18:03:20.4599467Z 	aten::_validate_sparse_compressed_tensor_args(Tensor crow_indices, Tensor col_indices, Tensor values, int[] size, int layout) -> ()
2022-04-18T18:03:20.4599547Z ]
2022-04-18T18:03:20.5577356Z + cleanup
2022-04-18T18:03:20.5577469Z + retcode=1
2022-04-18T18:03:20.5577575Z + set +x
2022-04-18T18:03:20.5610791Z ##[error]Process completed with exit code 1.
2022-04-18T18:03:20.5650090Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-04-18T18:03:20.5650167Z with:

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@mruberry mruberry requested a review from ngimel April 18, 2022 00:19
Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are cool improvements and simplifications!

S = 5


def add_nn_functional_test(name, self_size, args, variant_name='', check_ad=(), skipTestIf=(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this now done as part of OpInfo testing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been for some time

claimed_but_unsupported_backward = claimed_backward & unsupported_backward

# Partially supporting a dtype is not an error, but we print a warning
if (len(partially_supported_forward) + len(partially_supported_backward)) > 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or, partially supporting dtype is no longer an error? nice!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! And the error (and warning) messages are now much clearer


# Partially supporting a dtype is not an error, but we print a warning
if (len(partially_supported_forward) + len(partially_supported_backward)) > 0:
msg = "Some dtypes for {0} on device type {1} are only partially supported!\n".format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are supported only for some sample inputs

op.name, device_type
)
if len(partially_supported_forward) > 0:
msg = msg + "The following dtypes only worked on some samples during forward: {0}.\n".format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample inputs


# Creates gradcheck inputs by identifying tensors requiring grad
all_args = None
if is_iterable_of_tensors(sample.input):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still have sample.inputs that are iterables rather than single tensors?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

if is_iterable_of_tensors(sample.input):
all_args = chain(sample.input, sample.args, sample.kwargs.values())
else:
all_args = tuple(chain((sample.input,), sample.args, sample.kwargs.values()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need tuple here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No

self.assertEqual(output, expected.to(output.dtype))

@ops(unary_ufuncs, dtypes=OpDTypes.supported)
def test_out_arg_all_dtypes(self, device, dtype, op):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this checked in test_out? Not for all dtypes, I'm sure, but that's probably fine

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't check safe casting to a higher dtype in test_out yet, no

@@ -9560,14 +9530,20 @@ def generate_std_var_kwargs(t: torch.Tensor, **kwargs):
supports_forward_ad=True,
supports_fwgrad_bwgrad=True,
skips=(
# Float did not match double
DecorateInfo(unittest.expectedFailure, 'TestGradients', 'test_fn_grad'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so gradients are all wrong here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As best the test suite can tell: yes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Is there a hi pri issue about this, or we don't care about cov (or whatever this function is)?

supports_forward_ad=True,
supports_fwgrad_bwgrad=True,
sample_inputs_func=sample_inputs_polygamma,
skips=(
# Redundant tests
DecorateInfo(unittest.skip("Skipped!"), 'TestGradients'),
DecorateInfo(unittest.skip("Skipped!"), 'TestJit'),
DecorateInfo(unittest.skip("Skipped!"), 'TestNormalizeOperators'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalize is probably not a redundant test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point

DecorateInfo(unittest.expectedFailure, 'TestCudaFuserOpInfo', 'test_nvfuser_correctness', dtypes=(torch.bfloat16,)),
DecorateInfo(unittest.skip("Works on some configs"), 'TestNNCOpInfo',
'test_nnc_correctness', dtypes=(torch.bfloat16,)),
DecorateInfo(unittest.skip("Works on some conifgs"), 'TestCudaFuserOpInfo',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, probably on configs that are not supported by nvfuser

@mruberry
Copy link
Collaborator Author

@pytorchbot merge this please

@github-actions
Copy link
Contributor

Hey @mruberry.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

torch.complex64: 1e-2}),),
skips=(
# Failing with wrong imaginary sign on at least some Windows jobs
DecorateInfo(unittest.skip("Skipped!"), 'TestUnaryUfuncs', 'test_reference_numerics_normal',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a year old issue #52299, it's weird that it's still not fixed, are our windows jobs using some old cuda version?

facebook-github-bot pushed a commit that referenced this pull request Apr 19, 2022
Summary:
This PR makes the following improvements:

- moves the custom skip list for test_normalize_operator_exhaustive in test_fx_experimental to use the typical OpInfo skip architecture. The skips were updated to xfails, and that identified some operators which were no longer failing the test
- redundant tests with OpInfo-based testing in test_jit.py were removed
- test_dtypes was improved so its error messages are clear and it makes test_nondifferentiable redundant; the latter test has been removed
- OpInfo.supports_complex_autograd() is removed in favor of a more accurate and general test for whether the particular dtype is in the backward dtypes of the operator
- gradchecks have been improved to verify that an operator doesn't support grad if it claims not to
- gradchecks have been improved to test the gradient of all input tensors that require gradient
- the concept of "default test dtypes" has been removed
- excessive and mostly redundant out testing for elementwise unary operators has been removed
- metadata for whether an op supports nuanced "safe casting" to out behavior has been removed from OpInfos
- numerous skips have been converted to xfails
- numerous OpInfos have had their metadata fixed based on the new checks
- jit-specific utilities in common_methods_invocations.py have been moved to jit_programming_utils.py

Pull Request resolved: #75951
Approved by: https://github.com/ngimel

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/de949a0e59f458a77c213d270106a0c9fea6c484

Reviewed By: seemethere

Differential Revision: D35751451

fbshipit-source-id: 3bb18586388d7b815a3e4d999769d05082f6d4eb
malfet pushed a commit that referenced this pull request Apr 20, 2022
This PR makes the following improvements:

- moves the custom skip list for test_normalize_operator_exhaustive in test_fx_experimental to use the typical OpInfo skip architecture. The skips were updated to xfails, and that identified some operators which were no longer failing the test
- redundant tests with OpInfo-based testing in test_jit.py were removed
- test_dtypes was improved so its error messages are clear and it makes test_nondifferentiable redundant; the latter test has been removed
- OpInfo.supports_complex_autograd() is removed in favor of a more accurate and general test for whether the particular dtype is in the backward dtypes of the operator
- gradchecks have been improved to verify that an operator doesn't support grad if it claims not to
- gradchecks have been improved to test the gradient of all input tensors that require gradient
- the concept of "default test dtypes" has been removed
- excessive and mostly redundant out testing for elementwise unary operators has been removed
- metadata for whether an op supports nuanced "safe casting" to out behavior has been removed from OpInfos
- numerous skips have been converted to xfails
- numerous OpInfos have had their metadata fixed based on the new checks
- jit-specific utilities in common_methods_invocations.py have been moved to jit_programming_utils.py
Pull Request resolved: #75951
Approved by: https://github.com/ngimel

(cherry picked from commit de949a0)
pytorchmergebot pushed a commit that referenced this pull request May 18, 2022
This re-enables the test_reference_numerics_extremal
tests for atan and atanh.
This PR limits the tests to be not run with complex128 as
its still failing.
These tests were disabled previously in this PR:
#75951

There is no open issue for these tests.

Signed-off-by: Arindam Roy <rarindam@gmail.com>

Pull Request resolved: #77669
Approved by: https://github.com/mruberry
@mruberry mruberry deleted the opinfo_arch branch May 19, 2022 19:00
facebook-github-bot pushed a commit that referenced this pull request May 20, 2022
Summary:
This re-enables the test_reference_numerics_extremal
tests for atan and atanh.
This PR limits the tests to be not run with complex128 as
its still failing.
These tests were disabled previously in this PR:
#75951

There is no open issue for these tests.

Signed-off-by: Arindam Roy <rarindam@gmail.com>

Pull Request resolved: #77669
Approved by: https://github.com/mruberry

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/1af47a3a3e55d2dda094a12a26e603a1c7b5e009

Reviewed By: seemethere, b0noI

Differential Revision: D36493970

fbshipit-source-id: 389653eaea304212d1a0a821c3023fcd2e296deb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants