Skip to content

add support for sparse tensors in torch.testing.assert_close #58844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 31 commits into from

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented May 24, 2021

This adds support for sparse tensors the same way torch.testing._internal.common_utils.TestCase.assertEqual does:

if x.is_sparse:
if x.size() != y.size():
debug_msg_sparse = ("Attempted to compare equality of tensors with different sizes: "
f"Expected: {x.size()}; Actual: {y.size()}.")
super().assertTrue(False, msg=self._get_assert_msg(msg=msg, debug_msg=debug_msg_sparse))
x = x.coalesce()
y = y.coalesce()
indices_result, debug_msg_indices = self._compareTensors(x._indices(), y._indices(),
rtol=rtol, atol=atol,
equal_nan=equal_nan, exact_dtype=exact_dtype,
exact_device=exact_device)
if not indices_result:
assert debug_msg_indices is not None
debug_msg = "Sparse tensor indices failed to compare as equal! " + debug_msg_indices
super().assertTrue(indices_result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
values_result, debug_msg_values = self._compareTensors(x._values(), y._values(),
rtol=rtol, atol=atol,
equal_nan=equal_nan, exact_dtype=exact_dtype,
exact_device=exact_device)
if not values_result:
assert debug_msg_values is not None
debug_msg = "Sparse tensor values failed to compare as equal! " + debug_msg_values
super().assertTrue(values_result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))

  • Tensors are coalesced before comparison.
  • Indices and values are compared individually.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 24, 2021

💊 CI failures summary and remediations

As of commit fc903d5 (more details on the Dr. CI page and at hud.pytorch.org/pr/58844):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@codecov
Copy link

codecov bot commented May 24, 2021

Codecov Report

Merging #58844 (fc903d5) into master (15dc320) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #58844   +/-   ##
=======================================
  Coverage   76.23%   76.24%           
=======================================
  Files        2054     2054           
  Lines      205033   205075   +42     
=======================================
+ Hits       156309   156350   +41     
- Misses      48724    48725    +1     

Copy link
Collaborator

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some comments and made suggestions regarding testing uncoalesced sparse tensors.

The PR should implement support for sparse csr tensors as well. As a hint, sparse and sparse csr tensors have the following differences:

  • sparse has two strided members (indices and values) while sparse csr has three: crow_indices, col_indicesm and values
  • sparse csr is always a 2D tensor while sparse represents a ND tensor
  • the coalesce concept exists only for sparse, not for sparse csr. Hence, _indices and _values methods are defined only for sparse layout.
    In all other aspects, sparse and sparse csr are similar.

Copy link
Collaborator Author

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pearu Do you think we should add an example for sparse comparisons to our docstring? If yes, for what use case?

@pmeier pmeier requested review from pearu and mruberry June 4, 2021 06:40
Copy link
Collaborator

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I have only some nits regarding CSR documentation bit and enforcing equality checks for COO and CSR indices. Thanks, @pmeier !

Copy link
Collaborator

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the concern about what happens when a new sparse format is introduced, is real.
Currently, the checks may give false-positive results when, say, one of the operands (actual or expected) is a strided tensor and another is an instance of a new sparse format.

@pmeier pmeier requested a review from pearu June 14, 2021 10:37
Copy link
Collaborator

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I am uncertain if we should use "uncoalesced" or "non-coalesced".

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@facebook-github-bot
Copy link
Contributor

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator

This hit some internal test failures on the pytorch_core-buck job which appear relevant:

caffe2/test:testing - test_mismatching_values_msg (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_mismatching_crow_indices_msg (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_matching (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_mismatching_col_indices_msg (test_testing.TestAssertsSparseCSR)

NotImplementedError: Could not run 'aten::crow_indices' with arguments from the 'SparseCsrCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process

Looks like that job isn't built with sparse support; fyi @cpuhrsch

One simple option to unblock this PR is to decorate tests using the sparse CSR format with

@skipIf(IS_FBCODE or IS_SANDCASTLE,  "Not all sandcastle jobs support CSR testing")

I'm not sure how test_sparse_csr.py isn't run in this build, but it appears to be filtered somewhere? cc @pearu - any ideas?

@pearu
Copy link
Collaborator

pearu commented Jun 21, 2021

This hit some internal test failures on the pytorch_core-buck job which appear relevant:

caffe2/test:testing - test_mismatching_values_msg (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_mismatching_crow_indices_msg (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_matching (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_mismatching_col_indices_msg (test_testing.TestAssertsSparseCSR)

NotImplementedError: Could not run 'aten::crow_indices' with arguments from the 'SparseCsrCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process
...

@mruberry can you share the cmake "Summary" part from the build log? Perhaps this could provide hints how to reproduce this externally..

@mruberry
Copy link
Collaborator

This hit some internal test failures on the pytorch_core-buck job which appear relevant:
caffe2/test:testing - test_mismatching_values_msg (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_mismatching_crow_indices_msg (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_matching (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_mismatching_col_indices_msg (test_testing.TestAssertsSparseCSR)

NotImplementedError: Could not run 'aten::crow_indices' with arguments from the 'SparseCsrCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process
...

@mruberry can you share the cmake "Summary" part from the build log? Perhaps this could provide hints how to reproduce this externally..

It's a buck build so there's a lot of buck targets to scrape through for how this is actually built, but it looks like this test is being filtered directly by the test command, which is opt-in:

buck test --flagfile fbsource//fbcode/mode/opt //caffe2/test:ao //caffe2/test:autograd //caffe2/test:complex //caffe2/test:cuda //caffe2/test:dataloader //caffe2/test:datapipe //caffe2/test:distributions //caffe2/test:fbonly //caffe2/test:function_schema //caffe2/test:futures //caffe2/test:fx //caffe2/test:fx_const_fold //caffe2/test:fx_dce_pass //caffe2/test:fx_experimental //caffe2/test:jit //caffe2/test:kernel_launch_checks //caffe2/test:linalg //caffe2/test:mkldnn //caffe2/test:mobile //caffe2/test:multiprocessing //caffe2/test:nn //caffe2/test:optim //caffe2/test:others //caffe2/test:package //caffe2/test:profiler //caffe2/test:pruning //caffe2/test:quantization //caffe2/test:quantization_fx //caffe2/test:serialization //caffe2/test:sparse //caffe2/test:static_runtime //caffe2/test:tensorboard //caffe2/test:tensorexpr //caffe2/test:test_bundled_images //caffe2/test:test_bundled_inputs //caffe2/test:test_fx_experimental //caffe2/test:test_mobile_optimizer //caffe2/test:test_package_lib //caffe2/test:testing //caffe2/test:throughput_benchmark //caffe2/test:torch //caffe2/test:torch_cuda //caffe2/test:type //caffe2/test:utils //caffe2/test:xnnpack_integration 

@pearu
Copy link
Collaborator

pearu commented Jun 21, 2021

@mruberry , should TestAssertsSparseCSR live under test/test_sparse_csr.py ? Do TestAssertsSparseCOO tests run and pass?

@mruberry
Copy link
Collaborator

@mruberry , should TestAssertsSparseCSR live under test/test_sparse_csr.py ? Do TestAssertsSparseCOO tests run and pass?

test_testing.py seems like the right place for it. Yes, the COO-related tests appear to be passing:

✓ Pass: caffe2/test:testing - test_mismatching_is_coalesced (test_testing.TestAssertsSparseCOO) (1.167)

@facebook-github-bot
Copy link
Contributor

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 6ea2267.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: testing Issues related to the torch.testing module (not tests) open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants