-
Notifications
You must be signed in to change notification settings - Fork 22.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update error messages that use LAPACK error codes #63864
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 332e172 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages: pytorch_linux_xenial_py3_6_gcc5_4_build (1/1)Step: "Build" (full log | diagnosis details | 🔁 rerun)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such a great clean-up, thank you!
Just left a suggestion that could help readability.
static inline void singleCheckErrors(int64_t info, const char* name, int64_t batch_id=-1) { | ||
std::stringstream batch_ss; | ||
if (batch_id >= 0) { | ||
batch_ss << ": For batch " << batch_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch_ss << ": For batch " << batch_id; | |
batch_ss << ": (Batch element " << batch_id << ") "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, all the messages could start with capital letter (rather than with a ": " + lower case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix!
Codecov Report
@@ Coverage Diff @@
## master #63864 +/- ##
==========================================
- Coverage 67.13% 67.02% -0.11%
==========================================
Files 691 691
Lines 90496 90531 +35
==========================================
- Hits 60754 60678 -76
- Misses 29742 29853 +111 |
Unfortunately it looks like the test failures are real |
824cd1b needs to update the tests as well... |
Update expected error messages in tests
1a4351e
to
f38edaa
Compare
auto batch_string = batch_ss.str(); | ||
if (info < 0) { | ||
TORCH_CHECK(false, name, batch_string, | ||
": Argument ", -info, " has illegal value."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More precisely would this be "Item" instead of "Argument"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This particular check probably needs a comment, too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last question here: is there a test exercising this code path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be changed to use TORCH_INTERNAL_ASSERT
, negative info
would indicate that the implementation calling LAPACK is wrong, it should never be negative in normal program execution.
* has been successful (info = 0) or not, and report in case of the latter. | ||
*/ | ||
static inline void singleCheckErrors(int64_t info, const char* name, int64_t batch_id=-1) { | ||
std::stringstream batch_ss; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because error message construction has bit us in the past would it be faster to:
std::string batch_str{""};
if (batch_id >= 0) {
batch_str = ": (Batch element " + batch_id + ")";
}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know which one is faster, I'll use your suggestion.
TORCH_CHECK(false, name, batch_string, | ||
": Argument ", -info, " has illegal value."); | ||
} else if (info > 0) { | ||
if (strstr(name, "inv")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How perf painful is all this string matching vs. using an enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think strstr
here is fine, because we just enter this if whenever we are about to raise an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think compared to other overheads (dispatcher + input checks), this strstr
has a negligible cost, the strings to search the substring in are small. Most certainly the fastest way would be to use a separate function for each error message.
/* | ||
* Given a vector of int64_t infos, obtained after a batch operations, | ||
* this function checks if the computation over all these batches has been | ||
* successful (info = 0) or not, and report in case of the latter. | ||
*/ | ||
static inline void batchCheckErrors(std::vector<int64_t>& infos, const char* name, bool allow_singular=false) { | ||
static inline void batchCheckErrors(const std::vector<int64_t>& infos, const char* name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice const addition
* Given a info int, obtained after a single operation, this function check if the computation | ||
* has been successful (info = 0) or not, and report in case of the latter. | ||
*/ | ||
static inline void singleCheckErrors(int64_t info, const char* name, bool allow_singular=false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool removal of these old error messages with their deprecated macros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks really good @IvanYashchuk, just a few comments/suggestions inline
I updated the pull request. |
@mruberry, could you please take another look at this PR? |
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Nice consolidation, @IvanYashchuk
…#940) Summary: Pull Request resolved: pytorch#940 pytorch/pytorch#63864 changed the wording of an error that we parse, so we need to update this. This adds a new `_handle_numerical_errors` helper together with a unit test to make sure we catch other upstream changes of the same kind in the future. The proper solution to this is to use more specific error codes as suggested in pytorch/pytorch#64785. Reviewed By: sdaulton Differential Revision: D30860296 fbshipit-source-id: d402aaeeba0f3e09592ba5f28c60cb619ecde979
…#940) Summary: Pull Request resolved: pytorch#940 pytorch/pytorch#63864 changed the wording of an error that we parse, so we need to update this. This adds a new `_handle_numerical_errors` helper together with a unit test to make sure we catch other upstream changes of the same kind in the future. The proper solution to this is to use more specific error codes as suggested in pytorch/pytorch#64785. Reviewed By: sdaulton Differential Revision: D30860296 fbshipit-source-id: 28adab90bdd4cd5e555db440c7012db1967b148d
Summary: Pull Request resolved: #940 pytorch/pytorch#63864 changed the wording of an error that we parse, so we need to update this. This adds a new `_handle_numerical_errors` helper together with a unit test to make sure we catch other upstream changes of the same kind in the future. The proper solution to this is to use more specific error codes as suggested in pytorch/pytorch#64785. Reviewed By: sdaulton Differential Revision: D30860296 fbshipit-source-id: 869da96119dde2bedf12273a138af900ddc5eea2
…#940) Summary: Pull Request resolved: pytorch#940 pytorch/pytorch#63864 changed the wording of an error that we parse, so we need to update this. This adds a new `_handle_numerical_errors` helper together with a unit test to make sure we catch other upstream changes of the same kind in the future. The proper solution to this is to use more specific error codes as suggested in pytorch/pytorch#64785. Reviewed By: sdaulton Differential Revision: D30860296 fbshipit-source-id: 869da96119dde2bedf12273a138af900ddc5eea2
This PR updates the
batchCheckErrors
andsingleCheckErrors
functions so that the error messages are defined only once.batchCheckErrors
function reusessingleCheckErrors
now.Fixes #63220, fixes #59779
cc @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk @xwang233 @lezcano