Skip to content

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented May 24, 2021

This adds support for sparse tensors the same way torch.testing._internal.common_utils.TestCase.assertEqual does:

if x.is_sparse:
if x.size() != y.size():
debug_msg_sparse = ("Attempted to compare equality of tensors with different sizes: "
f"Expected: {x.size()}; Actual: {y.size()}.")
super().assertTrue(False, msg=self._get_assert_msg(msg=msg, debug_msg=debug_msg_sparse))
x = x.coalesce()
y = y.coalesce()
indices_result, debug_msg_indices = self._compareTensors(x._indices(), y._indices(),
rtol=rtol, atol=atol,
equal_nan=equal_nan, exact_dtype=exact_dtype,
exact_device=exact_device)
if not indices_result:
assert debug_msg_indices is not None
debug_msg = "Sparse tensor indices failed to compare as equal! " + debug_msg_indices
super().assertTrue(indices_result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
values_result, debug_msg_values = self._compareTensors(x._values(), y._values(),
rtol=rtol, atol=atol,
equal_nan=equal_nan, exact_dtype=exact_dtype,
exact_device=exact_device)
if not values_result:
assert debug_msg_values is not None
debug_msg = "Sparse tensor values failed to compare as equal! " + debug_msg_values
super().assertTrue(values_result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))

  • Tensors are coalesced before comparison.
  • Indices and values are compared individually.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 24, 2021

💊 CI failures summary and remediations

As of commit fc903d5 (more details on the Dr. CI page and at hud.pytorch.org/pr/58844):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@codecov
Copy link

codecov bot commented May 24, 2021

Codecov Report

Merging #58844 (fc903d5) into master (15dc320) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #58844   +/-   ##
=======================================
  Coverage   76.23%   76.24%           
=======================================
  Files        2054     2054           
  Lines      205033   205075   +42     
=======================================
+ Hits       156309   156350   +41     
- Misses      48724    48725    +1     

Copy link
Collaborator

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some comments and made suggestions regarding testing uncoalesced sparse tensors.

The PR should implement support for sparse csr tensors as well. As a hint, sparse and sparse csr tensors have the following differences:

  • sparse has two strided members (indices and values) while sparse csr has three: crow_indices, col_indicesm and values
  • sparse csr is always a 2D tensor while sparse represents a ND tensor
  • the coalesce concept exists only for sparse, not for sparse csr. Hence, _indices and _values methods are defined only for sparse layout.
    In all other aspects, sparse and sparse csr are similar.

Copy link
Collaborator Author

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pearu Do you think we should add an example for sparse comparisons to our docstring? If yes, for what use case?

@pmeier pmeier requested review from pearu and mruberry June 4, 2021 06:40
Copy link
Collaborator

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I have only some nits regarding CSR documentation bit and enforcing equality checks for COO and CSR indices. Thanks, @pmeier !

Copy link
Collaborator

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the concern about what happens when a new sparse format is introduced, is real.
Currently, the checks may give false-positive results when, say, one of the operands (actual or expected) is a strided tensor and another is an instance of a new sparse format.

@pmeier pmeier requested a review from pearu June 14, 2021 10:37
Copy link
Collaborator

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I am uncertain if we should use "uncoalesced" or "non-coalesced".

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@facebook-github-bot
Copy link
Contributor

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator

This hit some internal test failures on the pytorch_core-buck job which appear relevant:

caffe2/test:testing - test_mismatching_values_msg (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_mismatching_crow_indices_msg (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_matching (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_mismatching_col_indices_msg (test_testing.TestAssertsSparseCSR)

NotImplementedError: Could not run 'aten::crow_indices' with arguments from the 'SparseCsrCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process

Looks like that job isn't built with sparse support; fyi @cpuhrsch

One simple option to unblock this PR is to decorate tests using the sparse CSR format with

@skipIf(IS_FBCODE or IS_SANDCASTLE,  "Not all sandcastle jobs support CSR testing")

I'm not sure how test_sparse_csr.py isn't run in this build, but it appears to be filtered somewhere? cc @pearu - any ideas?

@pearu
Copy link
Collaborator

pearu commented Jun 21, 2021

This hit some internal test failures on the pytorch_core-buck job which appear relevant:

caffe2/test:testing - test_mismatching_values_msg (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_mismatching_crow_indices_msg (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_matching (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_mismatching_col_indices_msg (test_testing.TestAssertsSparseCSR)

NotImplementedError: Could not run 'aten::crow_indices' with arguments from the 'SparseCsrCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process
...

@mruberry can you share the cmake "Summary" part from the build log? Perhaps this could provide hints how to reproduce this externally..

@mruberry
Copy link
Collaborator

This hit some internal test failures on the pytorch_core-buck job which appear relevant:
caffe2/test:testing - test_mismatching_values_msg (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_mismatching_crow_indices_msg (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_matching (test_testing.TestAssertsSparseCSR)
caffe2/test:testing - test_mismatching_col_indices_msg (test_testing.TestAssertsSparseCSR)

NotImplementedError: Could not run 'aten::crow_indices' with arguments from the 'SparseCsrCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process
...

@mruberry can you share the cmake "Summary" part from the build log? Perhaps this could provide hints how to reproduce this externally..

It's a buck build so there's a lot of buck targets to scrape through for how this is actually built, but it looks like this test is being filtered directly by the test command, which is opt-in:

buck test --flagfile fbsource//fbcode/mode/opt //caffe2/test:ao //caffe2/test:autograd //caffe2/test:complex //caffe2/test:cuda //caffe2/test:dataloader //caffe2/test:datapipe //caffe2/test:distributions //caffe2/test:fbonly //caffe2/test:function_schema //caffe2/test:futures //caffe2/test:fx //caffe2/test:fx_const_fold //caffe2/test:fx_dce_pass //caffe2/test:fx_experimental //caffe2/test:jit //caffe2/test:kernel_launch_checks //caffe2/test:linalg //caffe2/test:mkldnn //caffe2/test:mobile //caffe2/test:multiprocessing //caffe2/test:nn //caffe2/test:optim //caffe2/test:others //caffe2/test:package //caffe2/test:profiler //caffe2/test:pruning //caffe2/test:quantization //caffe2/test:quantization_fx //caffe2/test:serialization //caffe2/test:sparse //caffe2/test:static_runtime //caffe2/test:tensorboard //caffe2/test:tensorexpr //caffe2/test:test_bundled_images //caffe2/test:test_bundled_inputs //caffe2/test:test_fx_experimental //caffe2/test:test_mobile_optimizer //caffe2/test:test_package_lib //caffe2/test:testing //caffe2/test:throughput_benchmark //caffe2/test:torch //caffe2/test:torch_cuda //caffe2/test:type //caffe2/test:utils //caffe2/test:xnnpack_integration 

@pearu
Copy link
Collaborator

pearu commented Jun 21, 2021

@mruberry , should TestAssertsSparseCSR live under test/test_sparse_csr.py ? Do TestAssertsSparseCOO tests run and pass?

@mruberry
Copy link
Collaborator

@mruberry , should TestAssertsSparseCSR live under test/test_sparse_csr.py ? Do TestAssertsSparseCOO tests run and pass?

test_testing.py seems like the right place for it. Yes, the COO-related tests appear to be passing:

✓ Pass: caffe2/test:testing - test_mismatching_is_coalesced (test_testing.TestAssertsSparseCOO) (1.167)

@facebook-github-bot
Copy link
Contributor

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 6ea2267.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: testing Issues related to the torch.testing module (not tests) open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants