Update error messages that use LAPACK error codes #63864

IvanYashchuk · 2021-08-24T14:49:47Z

This PR updates the batchCheckErrors and singleCheckErrors functions so that the error messages are defined only once.
batchCheckErrors function reuses singleCheckErrors now.

Fixes #63220, fixes #59779

cc @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk @xwang233 @lezcano

facebook-github-bot · 2021-08-24T14:49:57Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/63864
📄 Preview docs built from this PR

💊 CI failures summary and remediations

As of commit 332e172 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Aug 26 10:52:38 CMake Error at CMakeLists.txt:1 (cmake_minimum_required):

Aug 26 10:52:37 invalid command 'bad_argument'
Aug 26 10:52:37 + retcode=1
Aug 26 10:52:37 + set -e
Aug 26 10:52:37 + return 1
Aug 26 10:52:37 + [[ pytorch-linux-xenial-py3.6-gcc5.4-build != *libtorch* ]]
Aug 26 10:52:37 + [[ pytorch-linux-xenial-py3.6-gcc5.4-build != *ppc64le* ]]
Aug 26 10:52:37 + [[ pytorch-linux-xenial-py3.6-gcc5.4-build != *clang* ]]
Aug 26 10:52:37 + WERROR=1
Aug 26 10:52:37 + python setup.py bdist_wheel
Aug 26 10:52:38 Building wheel torch-1.10.0a0+git34e9ea3
Aug 26 10:52:38 CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
Aug 26 10:52:38   CMake 3.10 or higher is required.  You are running version 3.5.1
Aug 26 10:52:38 
Aug 26 10:52:38 
Aug 26 10:52:38 -- Configuring incomplete, errors occurred!
Aug 26 10:52:38 -- Building version 1.10.0a0+git34e9ea3
Aug 26 10:52:38 cmake -DBUILD_ENVIRONMENT=pytorch-linux-xenial-py3.6-gcc5.4-build -DBUILD_PYTHON=True -DBUILD_STATIC_RUNTIME_BENCHMARK=ON -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/var/lib/jenkins/workspace/torch -DCMAKE_PREFIX_PATH=/opt/conda/lib/python3.6/site-packages;/opt/conda -DNUMPY_INCLUDE_DIR=/opt/conda/lib/python3.6/site-packages/numpy/core/include -DOPENSSL_ROOT_DIR=/opt/openssl -DPYTHON_EXECUTABLE=/opt/conda/bin/python -DPYTHON_INCLUDE_DIR=/opt/conda/include/python3.6m -DPYTHON_LIBRARY=/opt/conda/lib/libpython3.6m.so.1.0 -DTORCH_BUILD_VERSION=1.10.0a0+git34e9ea3 -DUSE_GLOO_WITH_OPENSSL=ON -DUSE_LLVM=/opt/llvm -DUSE_NUMPY=True -DWERROR=1 /var/lib/jenkins/workspace
Aug 26 10:52:38 + cleanup
Aug 26 10:52:38 + retcode=1
Aug 26 10:52:38 + set +x
Aug 26 10:52:38 =================== sccache compilation log ===================

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

lezcano

This is such a great clean-up, thank you!
Just left a suggestion that could help readability.

lezcano · 2021-08-24T15:14:02Z

aten/src/ATen/native/LinearAlgebraUtils.h

+static inline void singleCheckErrors(int64_t info, const char* name, int64_t batch_id=-1) {
+  std::stringstream batch_ss;
+  if (batch_id >= 0) {
+    batch_ss << ": For batch " << batch_id;


Suggested change

batch_ss << ": For batch " << batch_id;

batch_ss << ": (Batch element " << batch_id << ") ";

Then, all the messages could start with capital letter (rather than with a ": " + lower case).

xwang233

LGTM, thanks for the fix!

codecov · 2021-08-24T18:43:11Z

Codecov Report

Merging #63864 (0bd907f) into master (c545b09) will decrease coverage by 0.10%.
The diff coverage is n/a.

❗ Current head 0bd907f differs from pull request most recent head 824cd1b. Consider uploading reports for the commit 824cd1b to get more accurate results

@@            Coverage Diff             @@
##           master   #63864      +/-   ##
==========================================
- Coverage   67.13%   67.02%   -0.11%     
==========================================
  Files         691      691              
  Lines       90496    90531      +35     
==========================================
- Hits        60754    60678      -76     
- Misses      29742    29853     +111

mruberry · 2021-08-25T14:04:59Z

Unfortunately it looks like the test failures are real

IvanYashchuk · 2021-08-25T14:08:18Z

824cd1b needs to update the tests as well...

Update expected error messages in tests

mruberry · 2021-08-26T04:07:32Z

aten/src/ATen/native/LinearAlgebraUtils.h

+  auto batch_string = batch_ss.str();
+  if (info < 0) {
+    TORCH_CHECK(false, name, batch_string,
+        ": Argument ", -info, " has illegal value.");


More precisely would this be "Item" instead of "Argument"?

(This particular check probably needs a comment, too)

Last question here: is there a test exercising this code path?

This should be changed to use TORCH_INTERNAL_ASSERT, negative info would indicate that the implementation calling LAPACK is wrong, it should never be negative in normal program execution.

mruberry · 2021-08-26T04:12:36Z

aten/src/ATen/native/LinearAlgebraUtils.h

+ * has been successful (info = 0) or not, and report in case of the latter.
+ */
+static inline void singleCheckErrors(int64_t info, const char* name, int64_t batch_id=-1) {
+  std::stringstream batch_ss;


Because error message construction has bit us in the past would it be faster to:

std::string batch_str{""}; if (batch_id >= 0) { batch_str = ": (Batch element " + batch_id + ")"; }

?

I don't know which one is faster, I'll use your suggestion.

mruberry · 2021-08-26T04:15:44Z

aten/src/ATen/native/LinearAlgebraUtils.h

+    TORCH_CHECK(false, name, batch_string,
+        ": Argument ", -info, " has illegal value.");
+  } else if (info > 0) {
+    if (strstr(name, "inv")) {


How perf painful is all this string matching vs. using an enum?

I think strstr here is fine, because we just enter this if whenever we are about to raise an exception.

I think compared to other overheads (dispatcher + input checks), this strstr has a negligible cost, the strings to search the substring in are small. Most certainly the fastest way would be to use a separate function for each error message.

mruberry · 2021-08-26T04:17:24Z

aten/src/ATen/native/LinearAlgebraUtils.h

 /*
 * Given a vector of int64_t infos, obtained after a batch operations,
 * this function checks if the computation over all these batches has been
 * successful (info = 0) or not, and report in case of the latter.
 */
-static inline void batchCheckErrors(std::vector<int64_t>& infos, const char* name, bool allow_singular=false) {
+static inline void batchCheckErrors(const std::vector<int64_t>& infos, const char* name) {


Nice const addition

mruberry · 2021-08-26T04:17:41Z

aten/src/ATen/native/LinearAlgebraUtils.h

- * Given a info int, obtained after a single operation, this function check if the computation
- * has been successful (info = 0) or not, and report in case of the latter.
- */
-static inline void singleCheckErrors(int64_t info, const char* name, bool allow_singular=false) {


Cool removal of these old error messages with their deprecated macros

aten/src/ATen/native/BatchLinearAlgebra.cpp

mruberry

Overall looks really good @IvanYashchuk, just a few comments/suggestions inline

IvanYashchuk · 2021-08-26T19:20:54Z

I updated the pull request.
@mruberry, there is one CI failure that is not related.

IvanYashchuk · 2021-08-31T10:19:17Z

@mruberry, could you please take another look at this PR?

facebook-github-bot · 2021-08-31T16:50:14Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mruberry

Awesome! Nice consolidation, @IvanYashchuk

facebook-github-bot · 2021-09-07T07:07:03Z

@mruberry merged this pull request in 32fbeb1.

…#940) Summary: Pull Request resolved: pytorch#940 pytorch/pytorch#63864 changed the wording of an error that we parse, so we need to update this. This adds a new `_handle_numerical_errors` helper together with a unit test to make sure we catch other upstream changes of the same kind in the future. The proper solution to this is to use more specific error codes as suggested in pytorch/pytorch#64785. Reviewed By: sdaulton Differential Revision: D30860296 fbshipit-source-id: d402aaeeba0f3e09592ba5f28c60cb619ecde979

…#940) Summary: Pull Request resolved: pytorch#940 pytorch/pytorch#63864 changed the wording of an error that we parse, so we need to update this. This adds a new `_handle_numerical_errors` helper together with a unit test to make sure we catch other upstream changes of the same kind in the future. The proper solution to this is to use more specific error codes as suggested in pytorch/pytorch#64785. Reviewed By: sdaulton Differential Revision: D30860296 fbshipit-source-id: 28adab90bdd4cd5e555db440c7012db1967b148d

Summary: Pull Request resolved: #940 pytorch/pytorch#63864 changed the wording of an error that we parse, so we need to update this. This adds a new `_handle_numerical_errors` helper together with a unit test to make sure we catch other upstream changes of the same kind in the future. The proper solution to this is to use more specific error codes as suggested in pytorch/pytorch#64785. Reviewed By: sdaulton Differential Revision: D30860296 fbshipit-source-id: 869da96119dde2bedf12273a138af900ddc5eea2

…#940) Summary: Pull Request resolved: pytorch#940 pytorch/pytorch#63864 changed the wording of an error that we parse, so we need to update this. This adds a new `_handle_numerical_errors` helper together with a unit test to make sure we catch other upstream changes of the same kind in the future. The proper solution to this is to use more specific error codes as suggested in pytorch/pytorch#64785. Reviewed By: sdaulton Differential Revision: D30860296 fbshipit-source-id: 869da96119dde2bedf12273a138af900ddc5eea2

IvanYashchuk added 2 commits August 24, 2021 09:16

Update error messages for positive lapack error codes

efd29ed

Add svd error test case

0bd907f

IvanYashchuk added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label Aug 24, 2021

IvanYashchuk requested review from lezcano and mruberry August 24, 2021 14:49

IvanYashchuk requested a review from nikitaved as a code owner August 24, 2021 14:49

facebook-github-bot added the cla signed label Aug 24, 2021

pytorchbot added the open source label Aug 24, 2021

lezcano approved these changes Aug 24, 2021

View reviewed changes

xwang233 approved these changes Aug 24, 2021

View reviewed changes

Start with capital letters

f38edaa

Update expected error messages in tests

IvanYashchuk force-pushed the fix-lapack-errors branch from 1a4351e to f38edaa Compare August 25, 2021 15:04

mruberry reviewed Aug 26, 2021

View reviewed changes

aten/src/ATen/native/BatchLinearAlgebra.cpp Outdated Show resolved Hide resolved

mruberry reviewed Aug 26, 2021

View reviewed changes

IvanYashchuk added 4 commits August 26, 2021 04:58

Fix tests

907481d

Use TORCH_INTERNAL_ASSERT instead of TORCH_CHECK for info<0

c883544

Use string instead of stringstream

da13050

File an issue and link in a TODO comment

332e172

mruberry approved these changes Aug 31, 2021

View reviewed changes

facebook-github-bot closed this in 32fbeb1 Sep 7, 2021

facebook-github-bot added the Merged label Sep 7, 2021

IvanYashchuk deleted the fix-lapack-errors branch September 7, 2021 08:31

sdaulton mentioned this pull request Sep 9, 2021

[Bug] Different nan handling under GPU and CPU cornellius-gp/gpytorch#1747

Open

dme65 mentioned this pull request Sep 9, 2021

Fix Cholesky error handling pyro-ppl/pyro#2927

Merged

lezcano mentioned this pull request Sep 10, 2021

Feature request: Use LinAlgError instead of raising RuntimeError for linalg operations #64785

Closed

Balandat mentioned this pull request Sep 10, 2021

Update error handling to new pytorch linear algebra messages pytorch/botorch#940

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update error messages that use LAPACK error codes #63864

Update error messages that use LAPACK error codes #63864

IvanYashchuk commented Aug 24, 2021 •

edited

Loading

facebook-github-bot commented Aug 24, 2021 •

edited

Loading

lezcano left a comment

lezcano Aug 24, 2021 •

edited

Loading

lezcano Aug 24, 2021

xwang233 left a comment

codecov bot commented Aug 24, 2021 •

edited

Loading

mruberry commented Aug 25, 2021

IvanYashchuk commented Aug 25, 2021

mruberry Aug 26, 2021

mruberry Aug 26, 2021

mruberry Aug 26, 2021

IvanYashchuk Aug 26, 2021

mruberry Aug 26, 2021

IvanYashchuk Aug 26, 2021

mruberry Aug 26, 2021

lezcano Aug 26, 2021

IvanYashchuk Aug 26, 2021

mruberry Aug 26, 2021

mruberry Aug 26, 2021

mruberry left a comment

IvanYashchuk commented Aug 26, 2021

IvanYashchuk commented Aug 31, 2021

facebook-github-bot commented Aug 31, 2021

mruberry left a comment

facebook-github-bot commented Sep 7, 2021

	batch_ss << ": For batch " << batch_id;
	batch_ss << ": (Batch element " << batch_id << ") ";

Update error messages that use LAPACK error codes #63864

Update error messages that use LAPACK error codes #63864

Conversation

IvanYashchuk commented Aug 24, 2021 • edited Loading

facebook-github-bot commented Aug 24, 2021 • edited Loading

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_build (1/1)

lezcano left a comment

Choose a reason for hiding this comment

lezcano Aug 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xwang233 left a comment

Choose a reason for hiding this comment

codecov bot commented Aug 24, 2021 • edited Loading

Codecov Report

mruberry commented Aug 25, 2021

IvanYashchuk commented Aug 25, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mruberry left a comment

Choose a reason for hiding this comment

IvanYashchuk commented Aug 26, 2021

IvanYashchuk commented Aug 31, 2021

facebook-github-bot commented Aug 31, 2021

mruberry left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Sep 7, 2021

IvanYashchuk commented Aug 24, 2021 •

edited

Loading

facebook-github-bot commented Aug 24, 2021 •

edited

Loading

lezcano Aug 24, 2021 •

edited

Loading

codecov bot commented Aug 24, 2021 •

edited

Loading