Skip to content

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Apr 12, 2021

Stack from ghstack:

Add support to compare scalars as well as np.ndarray's with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

Differential Revision: D27903814

Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 12, 2021

💊 CI failures summary and remediations

As of commit 30b74a8 (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test1 (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: CUDA error: device-side assert triggered
Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "C:\Users\circleci\project\build\win_tmp\build\torch\cuda\__init__.py", line 444, in synchronize
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: device-side assert triggered
C:/Users/circleci/project/aten/src/ATen/native/cuda/TensorCompare.cu:68: block: [0,0,0], thread: [0,0,0] Assertion `input[0] != c10::complex<float>(0, 0)` failed.
Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "C:\Users\circleci\project\build\win_tmp\build\torch\cuda\__init__.py", line 444, in synchronize
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: device-side assert triggered
ok (60.558s)
  test_gather_bool (__main__.TestCuda) ... ok (0.006s)
  test_get_device_index (__main__.TestCuda) ... ok (0.006s)
  test_get_set_rng_state_all (__main__.TestCuda) ... skip (0.003s)
  test_grad_scaling_accumulation (__main__.TestCuda) ... ok (0.016s)
  test_grad_scaling_autocast (__main__.TestCuda) ... ok (0.049s)
  test_grad_scaling_clipping (__main__.TestCuda) ... ok (0.047s)
  test_grad_scaling_clipping_separate_unscale (__main__.TestCuda) ... test_cuda.py:2216: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior.
  torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm, error_if_nonfinite=False)
ok (0.034s)

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_macos_10_13_py3_lite_interpreter_build_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun) ❄️

Apr 23 21:36:49 unknown file: Failure
Apr 23 21:36:45 /Applications/Xcode-12.GM.seed.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: /Users/distiller/project/torch/lib/libCaffe2_perfkernels_avx512.a(common_avx512.cc.o) has no symbols
Apr 23 21:36:45 warning: /Applications/Xcode-12.GM.seed.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: archive library: /Users/distiller/project/torch/lib/libCaffe2_perfkernels_avx512.a the table of contents is empty (no object file members in the library define global symbols)
Apr 23 21:36:47 + popd
Apr 23 21:36:47 ~/project
Apr 23 21:36:47 + /Users/distiller/project/../cpp-build/caffe2/build/bin/test_lite_interpreter_runtime
Apr 23 21:36:48 Note: Google Test filter = *-*_CUDA:*_MultiCUDA
Apr 23 21:36:48 [==========] Running 2 tests from 1 test case.
Apr 23 21:36:48 [----------] Global test environment set-up.
Apr 23 21:36:48 [----------] 2 tests from RunTimeTest
Apr 23 21:36:48 [ RUN      ] RunTimeTest.LoadAndForward
Apr 23 21:36:49 unknown file: Failure
Apr 23 21:36:49 C++ exception with description "open file failed, file path: /Users/distiller/project/test/cpp/lite_interpreter_runtime/sequence.ptl
Apr 23 21:36:49 Exception raised from FileAdapter at /Users/distiller/project/caffe2/serialize/file_adapter.cc:11 (most recent call first):
Apr 23 21:36:49 frame #0: decltype(std::__1::forward<c10::(anonymous namespace)::GetFetchStackTrace()::$_0&>(fp)()) std::__1::__invoke<c10::(anonymous namespace)::GetFetchStackTrace()::$_0&>(c10::(anonymous namespace)::GetFetchStackTrace()::$_0&) + 54 (0x1087db746 in libc10.dylib)
Apr 23 21:36:49 frame #1: std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > std::__1::__invoke_void_return_wrapper<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >::__call<c10::(anonymous namespace)::GetFetchStackTrace()::$_0&>(c10::(anonymous namespace)::GetFetchStackTrace()::$_0&) + 54 (0x1087db6e6 in libc10.dylib)
Apr 23 21:36:49 frame #2: std::__1::__function::__alloc_func<c10::(anonymous namespace)::GetFetchStackTrace()::$_0, std::__1::allocator<c10::(anonymous namespace)::GetFetchStackTrace()::$_0>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > ()>::operator()() + 54 (0x1087db6a6 in libc10.dylib)
Apr 23 21:36:49 frame #3: std::__1::__function::__func<c10::(anonymous namespace)::GetFetchStackTrace()::$_0, std::__1::allocator<c10::(anonymous namespace)::GetFetchStackTrace()::$_0>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > ()>::operator()() + 45 (0x1087da3dd in libc10.dylib)
Apr 23 21:36:49 frame #4: std::__1::__function::__value_func<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > ()>::operator()() const + 75 (0x1087e152b in libc10.dylib)
Apr 23 21:36:49 frame #5: std::__1::function<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > ()>::operator()() const + 35 (0x1087d14d3 in libc10.dylib)
Apr 23 21:36:49 frame #6: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 78 (0x1087d1a4e in libc10.dylib)
Apr 23 21:36:49 frame #7: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 130 (0x1087cd9a2 in libc10.dylib)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

pmeier added a commit that referenced this pull request Apr 12, 2021
Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

ghstack-source-id: 741591d
Pull Request resolved: #55786
Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

[ghstack-poisoned]
pmeier added 4 commits April 14, 2021 17:54
Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

[ghstack-poisoned]
Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

[ghstack-poisoned]
Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

[ghstack-poisoned]
Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

[ghstack-poisoned]
@pmeier pmeier requested review from mattip and mruberry April 15, 2021 08:40
@mattip
Copy link
Contributor

mattip commented Apr 15, 2021

Are there corner cases where casting an ndarray or a scalar to a Tensor is not the right thing to do? For instance, comparing a complex-valued scalar to an ndarray succeeds but casting them to Tensors and them comparing them fails?

@pmeier
Copy link
Collaborator Author

pmeier commented Apr 16, 2021

Before we continue our implementation review cycle, we should probably define what kind of inputs we allow.

My idea is to allow all "array-and-scalar-like" inputs. This includes tensors as well as every other object from which a tensor can be created with torch.tensor() (see my comments about check_device and check_dtype at the end). This enables us to use our internal tensor comparison functionality for everything.

I would also allow sequences and mappings of the former types ("container of array-and-scalar-likes") for both inputs simultaneously. In this case the checks are performed elementwise.

I think this is a straight-forward UI in all but one case:

assert_close(torch.tensor([[1.0, 2.0], [3.0, 4.0]]), [[1.0, 2.0], [4.0, 4.0]]) # compare 2d tensors
assert_close([[1.0, 2.0], [3.0, 4.0]], [[1.0, 2.0], [4.0, 4.0]]) # compare sequence of 1d tensors

In the first case the error message would point to index (1, 0) whereas in the second case the error message would point to index 0 with the failure happening for index 1. The information is basically the same, but it in the second case it is presented more convoluted.

Still, IMO the special case of comparing two multi-level sequences of scalars does not outweigh the benefits of this more general approach.


All of this requires check_device and check_dtype to be flexible:

  • If we only deal with torch.Tensor's', we can be strict and set check_device = check_dtype = True as it is the default now.
  • If we additionally encounter an array-like, we should set check_device = False, but keep check_dtype = True, to avoid failures when comparing a CUDA tensor to the array.
  • If we additionally encounter a scalar we should relax both checks check_device = check_dtype = False, since in addition to CUDA tensors the dtype of torch.tensor() is flexible and should not be checked if we internally construct the tensor from the inputs.

I would simply set the default value to None and in case it is not explicitly passed use the logic above to determine the default value.

@mruberry
Copy link
Collaborator

mruberry commented Apr 18, 2021

Before we continue our implementation review cycle, we should probably define what kind of inputs we allow.

My idea is to allow all "array-and-scalar-like" inputs. This includes tensors as well as every other object from which a tensor can be created with torch.tensor() (see my comments about check_device and check_dtype at the end). This enables us to use our internal tensor comparison functionality for everything.

I would also allow sequences and mappings of the former types ("container of array-and-scalar-likes") for both inputs simultaneously. In this case the checks are performed elementwise.

I think this is a straight-forward UI in all but one case:

assert_close(torch.tensor([[1.0, 2.0], [3.0, 4.0]]), [[1.0, 2.0], [4.0, 4.0]]) # compare 2d tensors
assert_close([[1.0, 2.0], [3.0, 4.0]], [[1.0, 2.0], [4.0, 4.0]]) # compare sequence of 1d tensors

In the first case the error message would point to index (1, 0) whereas in the second case the error message would point to index 0 with the failure happening for index 1. The information is basically the same, but it in the second case it is presented more convoluted.

Still, IMO the special case of comparing two multi-level sequences of scalars does not outweigh the benefits of this more general approach.

All of this requires check_device and check_dtype to be flexible:

  • If we only deal with torch.Tensor's', we can be strict and set check_device = check_dtype = True as it is the default now.
  • If we additionally encounter an array-like, we should set check_device = False, but keep check_dtype = True, to avoid failures when comparing a CUDA tensor to the array.
  • If we additionally encounter a scalar we should relax both checks check_device = check_dtype = False, since in addition to CUDA tensors the dtype of torch.tensor() is flexible and should not be checked if we internally construct the tensor from the inputs.

I would simply set the default value to None and in case it is not explicitly passed use the logic above to determine the default value.

From offline discussion:

  • let's require types match, except for containers where it's OK for sequences to compare with other sequences and mappables to compared with other mappables (for example, a list and a tuple with the same elements would still be equal under this definition, as would a dict and an ordered dict with the same keys and values)
  • converting to tensor in advance sounds good, but we should be sure that error messages for scalars and 0D tensors (AKA "scalar tensors") reflect the user's expectations and don't use language like "tensors failed to compare equal at index 0", because the user may not have compared tensors and scalars and 0D tensors don't support indexing

Note this means that [1, 2] != torch.tensor((1, 2)) and that 2 != complex(2, 0).

@pmeier asks (offline): Why is it so horrible if we support Python types not matching? Like, why can't [1, 2] == torch.tensor((1, 2,))?

This just goes to readability and reviewability. This is only useful if the expected object differs from the actual object, and it will require test reviewers to understand what's actually being returned by the operator independent of the test. It's much easier for reviewers to check the expected object and know it's actually what the function must return, not something "like" what the function is returning.

Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

[ghstack-poisoned]
@pmeier
Copy link
Collaborator Author

pmeier commented Apr 19, 2021

@mruberry Should be all done except for

error messages for scalars and 0D tensors (AKA "scalar tensors") reflect the user's expectations and don't use language like "tensors failed to compare equal at index 0", because the user may not have compared tensors and scalars and 0D tensors don't support indexing

I agree that we should do that, but lets wait with that until we figured out what to do with #55890.

@mruberry
Copy link
Collaborator

@mruberry Should be all done except for

error messages for scalars and 0D tensors (AKA "scalar tensors") reflect the user's expectations and don't use language like "tensors failed to compare equal at index 0", because the user may not have compared tensors and scalars and 0D tensors don't support indexing

I agree that we should do that, but lets wait with that until we figured out what to do with #55890.

Sounds good; I'll take a look later today

Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

[ghstack-poisoned]
@mruberry
Copy link
Collaborator

This is a tricky stack to parse.

So let me just ask a few questions and then maybe we can revisit this code after this stack. What will happen with the following tests:

np.array((1, 2)) == torch.tensor((1, 2))
np.array((1, 2)) == np.array((1, 2))
complex(2, 0) == 2
2. == 2
[1, 2] == torch.tensor((1, 2))
2. == torch.tensor((2.,))
2. == np.array((2.,))

@pmeier
Copy link
Collaborator Author

pmeier commented Apr 20, 2021

@mruberry Assuming == means torch.testing.assert_(equal|close):

  • np.array((1, 2)) == torch.tensor((1, 2)): Fail, because the inputs have a different type and we required type equality.
  • np.array((1, 2)) == np.array((1, 2)): Pass.
  • complex(2, 0) == 2: Fail, because we don't support complex tensors right now and torch.as_tensor(complex(2, 0)).dtype == torch.complex64. If we add complex support, would only pass if check_dtype=False is explicitly passed.
  • 2. == 2: Fail, because the inputs have a different type and we required type equality. This is the only case where I think we could relax the strictness a bit.
  • [1, 2] == torch.tensor((1, 2)): Fail, because the inputs have a different type and we required type equality.
  • 2. == torch.tensor((2.,)): Fail, because the inputs have a different type and we required type equality.
  • 2. == np.array((2.,)): Fail, because the inputs have a different type and we required type equality.

@mruberry mruberry self-requested a review April 20, 2021 08:52
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

pmeier added 4 commits April 20, 2021 13:50
Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

[ghstack-poisoned]
Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

Differential Revision: [D27903814](https://our.internmc.facebook.com/intern/diff/D27903814)

[ghstack-poisoned]
Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

Differential Revision: [D27903814](https://our.internmc.facebook.com/intern/diff/D27903814)

[ghstack-poisoned]
Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

Differential Revision: [D27903814](https://our.internmc.facebook.com/intern/diff/D27903814)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 27148db.

@facebook-github-bot facebook-github-bot deleted the gh/pmeier/7/head branch April 28, 2021 14:17
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Pull Request resolved: pytorch#55786

Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D27903814

Pulled By: mruberry

fbshipit-source-id: fe3d063d0c9513cbd8b3408a2023e94c490c817e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants