Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update error messages that use LAPACK error codes #63864

Closed
wants to merge 7 commits into from

Conversation

IvanYashchuk
Copy link
Collaborator

@IvanYashchuk IvanYashchuk commented Aug 24, 2021

This PR updates the batchCheckErrors and singleCheckErrors functions so that the error messages are defined only once.
batchCheckErrors function reuses singleCheckErrors now.

Fixes #63220, fixes #59779

cc @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk @xwang233 @lezcano

@IvanYashchuk IvanYashchuk added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label Aug 24, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 24, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 332e172 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Aug 26 10:52:38 CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
Aug 26 10:52:37 invalid command 'bad_argument'
Aug 26 10:52:37 + retcode=1
Aug 26 10:52:37 + set -e
Aug 26 10:52:37 + return 1
Aug 26 10:52:37 + [[ pytorch-linux-xenial-py3.6-gcc5.4-build != *libtorch* ]]
Aug 26 10:52:37 + [[ pytorch-linux-xenial-py3.6-gcc5.4-build != *ppc64le* ]]
Aug 26 10:52:37 + [[ pytorch-linux-xenial-py3.6-gcc5.4-build != *clang* ]]
Aug 26 10:52:37 + WERROR=1
Aug 26 10:52:37 + python setup.py bdist_wheel
Aug 26 10:52:38 Building wheel torch-1.10.0a0+git34e9ea3
Aug 26 10:52:38 CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
Aug 26 10:52:38   CMake 3.10 or higher is required.  You are running version 3.5.1
Aug 26 10:52:38 
Aug 26 10:52:38 
Aug 26 10:52:38 -- Configuring incomplete, errors occurred!
Aug 26 10:52:38 -- Building version 1.10.0a0+git34e9ea3
Aug 26 10:52:38 cmake -DBUILD_ENVIRONMENT=pytorch-linux-xenial-py3.6-gcc5.4-build -DBUILD_PYTHON=True -DBUILD_STATIC_RUNTIME_BENCHMARK=ON -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/var/lib/jenkins/workspace/torch -DCMAKE_PREFIX_PATH=/opt/conda/lib/python3.6/site-packages;/opt/conda -DNUMPY_INCLUDE_DIR=/opt/conda/lib/python3.6/site-packages/numpy/core/include -DOPENSSL_ROOT_DIR=/opt/openssl -DPYTHON_EXECUTABLE=/opt/conda/bin/python -DPYTHON_INCLUDE_DIR=/opt/conda/include/python3.6m -DPYTHON_LIBRARY=/opt/conda/lib/libpython3.6m.so.1.0 -DTORCH_BUILD_VERSION=1.10.0a0+git34e9ea3 -DUSE_GLOO_WITH_OPENSSL=ON -DUSE_LLVM=/opt/llvm -DUSE_NUMPY=True -DWERROR=1 /var/lib/jenkins/workspace
Aug 26 10:52:38 + cleanup
Aug 26 10:52:38 + retcode=1
Aug 26 10:52:38 + set +x
Aug 26 10:52:38 =================== sccache compilation log ===================

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a great clean-up, thank you!
Just left a suggestion that could help readability.

static inline void singleCheckErrors(int64_t info, const char* name, int64_t batch_id=-1) {
std::stringstream batch_ss;
if (batch_id >= 0) {
batch_ss << ": For batch " << batch_id;
Copy link
Collaborator

@lezcano lezcano Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
batch_ss << ": For batch " << batch_id;
batch_ss << ": (Batch element " << batch_id << ") ";

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, all the messages could start with capital letter (rather than with a ": " + lower case).

Copy link
Collaborator

@xwang233 xwang233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix!

@codecov
Copy link

codecov bot commented Aug 24, 2021

Codecov Report

Merging #63864 (0bd907f) into master (c545b09) will decrease coverage by 0.10%.
The diff coverage is n/a.

❗ Current head 0bd907f differs from pull request most recent head 824cd1b. Consider uploading reports for the commit 824cd1b to get more accurate results

@@            Coverage Diff             @@
##           master   #63864      +/-   ##
==========================================
- Coverage   67.13%   67.02%   -0.11%     
==========================================
  Files         691      691              
  Lines       90496    90531      +35     
==========================================
- Hits        60754    60678      -76     
- Misses      29742    29853     +111     

@mruberry
Copy link
Collaborator

Unfortunately it looks like the test failures are real

@IvanYashchuk
Copy link
Collaborator Author

824cd1b needs to update the tests as well...

Update expected error messages in tests
auto batch_string = batch_ss.str();
if (info < 0) {
TORCH_CHECK(false, name, batch_string,
": Argument ", -info, " has illegal value.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More precisely would this be "Item" instead of "Argument"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This particular check probably needs a comment, too)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last question here: is there a test exercising this code path?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be changed to use TORCH_INTERNAL_ASSERT, negative info would indicate that the implementation calling LAPACK is wrong, it should never be negative in normal program execution.

* has been successful (info = 0) or not, and report in case of the latter.
*/
static inline void singleCheckErrors(int64_t info, const char* name, int64_t batch_id=-1) {
std::stringstream batch_ss;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because error message construction has bit us in the past would it be faster to:

std::string batch_str{""};
if (batch_id >= 0) {
  batch_str = ": (Batch element " + batch_id + ")";
}

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know which one is faster, I'll use your suggestion.

TORCH_CHECK(false, name, batch_string,
": Argument ", -info, " has illegal value.");
} else if (info > 0) {
if (strstr(name, "inv")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How perf painful is all this string matching vs. using an enum?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think strstr here is fine, because we just enter this if whenever we are about to raise an exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think compared to other overheads (dispatcher + input checks), this strstr has a negligible cost, the strings to search the substring in are small. Most certainly the fastest way would be to use a separate function for each error message.

/*
* Given a vector of int64_t infos, obtained after a batch operations,
* this function checks if the computation over all these batches has been
* successful (info = 0) or not, and report in case of the latter.
*/
static inline void batchCheckErrors(std::vector<int64_t>& infos, const char* name, bool allow_singular=false) {
static inline void batchCheckErrors(const std::vector<int64_t>& infos, const char* name) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice const addition

* Given a info int, obtained after a single operation, this function check if the computation
* has been successful (info = 0) or not, and report in case of the latter.
*/
static inline void singleCheckErrors(int64_t info, const char* name, bool allow_singular=false) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool removal of these old error messages with their deprecated macros

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks really good @IvanYashchuk, just a few comments/suggestions inline

@IvanYashchuk
Copy link
Collaborator Author

I updated the pull request.
@mruberry, there is one CI failure that is not related.

@IvanYashchuk
Copy link
Collaborator Author

@mruberry, could you please take another look at this PR?

@facebook-github-bot
Copy link
Contributor

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Nice consolidation, @IvanYashchuk

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 32fbeb1.

@IvanYashchuk IvanYashchuk deleted the fix-lapack-errors branch September 7, 2021 08:31
Balandat added a commit to Balandat/botorch that referenced this pull request Sep 10, 2021
…#940)

Summary:
Pull Request resolved: pytorch#940

pytorch/pytorch#63864 changed the wording of an error that we parse, so we need to update this. This adds a new `_handle_numerical_errors` helper together with a unit test to make sure we catch other upstream changes of the same kind in the future.

The proper solution to this is to use more specific error codes as suggested in pytorch/pytorch#64785.

Reviewed By: sdaulton

Differential Revision: D30860296

fbshipit-source-id: d402aaeeba0f3e09592ba5f28c60cb619ecde979
Balandat added a commit to Balandat/botorch that referenced this pull request Sep 10, 2021
…#940)

Summary:
Pull Request resolved: pytorch#940

pytorch/pytorch#63864 changed the wording of an error that we parse, so we need to update this. This adds a new `_handle_numerical_errors` helper together with a unit test to make sure we catch other upstream changes of the same kind in the future.

The proper solution to this is to use more specific error codes as suggested in pytorch/pytorch#64785.

Reviewed By: sdaulton

Differential Revision: D30860296

fbshipit-source-id: 28adab90bdd4cd5e555db440c7012db1967b148d
facebook-github-bot pushed a commit to pytorch/botorch that referenced this pull request Sep 11, 2021
Summary:
Pull Request resolved: #940

pytorch/pytorch#63864 changed the wording of an error that we parse, so we need to update this. This adds a new `_handle_numerical_errors` helper together with a unit test to make sure we catch other upstream changes of the same kind in the future.

The proper solution to this is to use more specific error codes as suggested in pytorch/pytorch#64785.

Reviewed By: sdaulton

Differential Revision: D30860296

fbshipit-source-id: 869da96119dde2bedf12273a138af900ddc5eea2
saitcakmak pushed a commit to saitcakmak/botorch that referenced this pull request Dec 7, 2021
…#940)

Summary:
Pull Request resolved: pytorch#940

pytorch/pytorch#63864 changed the wording of an error that we parse, so we need to update this. This adds a new `_handle_numerical_errors` helper together with a unit test to make sure we catch other upstream changes of the same kind in the future.

The proper solution to this is to use more specific error codes as suggested in pytorch/pytorch#64785.

Reviewed By: sdaulton

Differential Revision: D30860296

fbshipit-source-id: 869da96119dde2bedf12273a138af900ddc5eea2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SVD error message is not correct The error message for batched inputs to torch.linalg.eigh is wrong
6 participants