New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the "internal" error type #3123
Conversation
Previews, as seen when this build job started (4cd80ef): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, modulo any possible changes to the error scope API which are pretty much orthogonal.
spec/index.bs
Outdated
[=shader-creation error|shader creation=] of | ||
|descriptor|.{{GPUComputePipelineDescriptor/compute}}.{{GPUProgrammableStage/module}}, or during | ||
the [=pipeline-creation error|pipeline creation=] of |pipeline|: | ||
1. [$generate an operation error$] and make |pipeline| [=invalid=]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should also be something in createShaderModule, I think along the lines of "if there's an uncategorized shader error, make the GPUShaderModule invalid without generating any error"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... that would cause create*Pipeline()
to fail validation, unless we had a special case "invalid due to uncategorized error" state. Alternatively, we could check for the errors in the shader modules prior to pipeline validation, but my impression was that we wanted to treat validation errors as "highest priority", and only start looking at whether or not other errors were generated when we knew validation was successful.
Previews, as seen when this build job started (1cc2145): |
After editors discussion on Monday, we determined that there wasn't a solid need for a separate "operation" and "out-of-memory" error type, as the situations in which they could arise were operation specific and didn't overlap. As a result, it's cleaner to simply extend the existing "out-of-memory" errors to encompass the types of uncategorized errors described in the WGSL spec and rename that error type to "internal" to reflect it's new scope. This PR has been updated accordingly. In the future should we see a need to differentiate OOM errors from other types of internal errors explicitly, we can add enums, flags, or other metadata to the |
Notably it didn't seem to make sense to have a generic "internal" error as well as an "out-of-memory" error. It would make more sense to have "pipeline-failure" and "out-of-memory" or just "internal". We went with just "internal" for now because it's easy to expose the difference later as a field of GPUInternalError. |
I think it's useful to preserve out-of-memory whenever we can tell that that's a reasonable possibility. |
@kdashg out of memory errors can only come from createBuffer, createTexture, and createQuerySet. If you capture an internal error from one of those calls then you know it's an out of memory condition. Do you think that doesn't work well? |
Previews, as seen when this build job started (b7d46dd): |
Previews, as seen when this build job started (6f71ca9): |
Updated to add a Also, adding that created just enough clutter in that section that I felt the urge to restructure it for clarity. So now there's one subsection that just describes the different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % nit
system or implementation-specific reason even when all validation requirements have been satisfied. | ||
For example, the operation may exceede the capabilities of the implementation in a way not easily | ||
captured by the [=supported limits=]. The same operation may succeed on other devices or under | ||
difference circumstances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should say that internal errors can only come from certain operations described in the spec, not any arbitrary operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, and also added an algorithm for generating an internal error for future use (and while I was at it cleaned up some of the algorithms to use the new timeline styles.)
Haven't actually added any instances of an operation generating an internal error yet, because this CL is already messy enough as is and I'd like those to be more easily reviewed.
Out of curiosity, what's the expected behavior for an operation that we don't explicitly say can generate internal error experiencing an internal error? Is that just forced to be a lost context every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize we didn't currently specify that out of memory can come from createBuffer/Texture/QuerySet. That'll be nice to have written down.
Yeah, the behavior I have in mind is:
- Implementation does any heroics it wants to try to do to make the error go away (like waiting for the queue to finish and then deleting any temporary resources) and tries again
- If it still fails, lose the device
Editors meeting: After discussion about usage and addressing concerns raised by @kdashg and others, decided to break up "internal" and "out-of-memory" errors into separate types once again. One motivating example for this change was the idea of developers who may want to capture an entire frame in an OOM error scope and use it as a signal to take some general action to reduce memory use, such as flushing caches. Combining internal and OOM errors would make that more difficult. Change has already been implemented in the PR. |
Previews, as seen when this build job started (573f678): |
Fixes #2308 This error type captures non-validation errors that occur at a system or implementation level. (Such as shaders failing because they are too complex.)
Co-authored-by: Kai Ninomiya <kainino@chromium.org>
Co-authored-by: Kai Ninomiya <kainino@chromium.org>
Previews, as seen when this build job started (94bf9a4): |
This approach was agreed on in today's meeting with the understanding that it satisfies @kdashg's previously voiced requirement (easily differentiating OOM errors). Merging. |
FixesIssue: #2308 (EDIT(@kainino0x): still need to keep that issue open about exposing non-uncategorized errors)This error type captures non-validation errors that occur at a system
or implementation level. (Such as shaders failing because they are
too complex.) Initially only pipeline creation may produce operation
errors.