-
Notifications
You must be signed in to change notification settings - Fork 343
ErrorHandling.md: tryCreate* #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
06402f7 to
c2113f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not completely convinced of this direction. Once memory's filled up, any tiny allocation can cause an allocation failure. In some apps it won't be the big texture or buffer allocation, but some other smaller allocation which provokes out-of-memory.
Upon further thought I think some way of listening for a certain kind of error, like "OUT_OF_MEMORY", and being able to recover from it is a more robust approach.
However I recognize that one would not necessarily easily be able to tell which objects failed allocation and attempt their re-allocation after deleting other objects (perhaps purging a tile / texture cache).
Adding Promise-based versions of just these two APIs seems pretty asymmetric. Also, I don't think we should add Promise-based versions of all of the allocation APIs.
|
Generally, I agree with what you're saying. However, I don't think #196 is the right mechanism through which to handle it: I'd like to keep it very specifically targeted at telemetry-style error logging. Aside from maybe the asynchronous nature of class BufferManager {
device: GPUDevice;
// hypothetical max heap which is ordered on number
buffers: MaxHeap<number, GPUBuffer> = new MaxHeap<number, GPUBuffer>();
async alloc(nice: number, desc: GPUBufferDescriptor): Promise<GPUBuffer> {
let buffer: GPUBuffer | null = null;
while (true) {
let buffer = await this.device.tryCreateBuffer(desc);
if (buffer) { break; }
const [nicestNice, nicestBuffer] = this.buffers.pop();
if (nicestNice <= 0) { throw "OOM"; }
if (nicestNice <= nice) { return null; }
nicestBuffer.destroy();
nicestBuffer.destroyed = true;
}
buffer.destroyed = false;
this.buffers.push(nice, buffer);
return buffer;
}
}It's not as ideal as what could be done with a memory heap, but it's something. |
|
I don't think there any other APIs that do large allocations. I think(?) it's safe to consider OOM on e.g. sampler allocation to be catastrophic. (i.e. device loss - but we could be more forgiving than that.) |
|
Commented on the other PR, but to continue the conversation here: out-of-memory is the main kind of error that applications will likely try to recover from. Does it seem like a good idea to make the recovery from this error robust? That implies that it should be possible to recover from allocation failures of all types of GPU objects. |
|
I think what you're suggesting is that if an app sees OOM on a small (e.g. sampler) allocation, then it should be able to free up a large texture and try again. Maybe that is true. I have 2 concerns: 1, if the driver really gave us OOM for that tiny allocation, its memory probably in pretty bad condition. I'm not sure we should trust it to still be in usable state. 2, is it even likely that freeing a large allocation will fix the small allocation? only if the driver's OOM error was really out-of-GPU-memory and not some other strange case (like ran out of slots in an internal table). I don't think it would be good to put applications in a state where they progressively (slowly) kill all their texture resources because of a sampler allocation failure, when really something more subtle was going on. |
design/ErrorHandling.md
Outdated
| E.g. create a large buffer, create a bind group from that, create a command buffer from that, then choose whether to submit based on whether the buffer was successfully allocated. | ||
| - If yes, `tryCreateBuffer` must return `GPUBuffer` and error log entries must not be generated when creating objects from invalid objects. | ||
| (Only log errors on queue.submit and other device/queue level operations.) | ||
| - If no, `tryCreateBuffer` should return `Promise<GPUBuffer>` and error log entries should be generated when creating objects from invalid objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a much simpler approach, even if it requires some extra asynchronicity from the application.
|
Looks good. Like @kainino0x said, textures and buffers OOM would primarily be caused by running out of GPU memory which is recoverable. Other types of OOM oculd be because we ran out of CPU memory, space in hardware tables, etc and don't seem like they could be handled in any useful way by applications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about this change. It looks like the API is getting split between CIN (contagious internal nullability) and promises now, and it makes it somewhat inelegant.
If we are talking about a case where the user creates a resource that they want to handle a failure for, wouldn't a promise be semi-equivalent to creating it as CIN and then checking the status explicitly before submitting any work based on it?
c2113f5 to
86309d9
Compare
|
Note: this PR also removes object status query. The rationale for this is that the query was only useful for:
Validation errors should not be detected, because they indicate programming errors (hopefully), so they are surfaced through the error log instead (#196). Device loss applies to all objects on a device, and is handled through a separate mechanism (#198), so it's not useful to separately know that an individual object on the device is invalid. |
438e93d to
a591726
Compare
| - Should `tryCreate*` (and `createReady*Pipeline`) resolve to invalid objects on validation failure and device loss, instead of rejecting? | ||
| This simplifies things slightly by avoiding giving useless extra info to the app via "reject" (which they would ignore/noop anyway). | ||
| Instead these Promises would never reject. | ||
| - Tentatively resolved: yes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are any further comments on my proposed resolve/reject semantics of tryCreate/createReady, this thread would be a good place.
a591726 to
d283645
Compare
Yeah, pretty much. However I think having a version that returns Promise is a simpler API than adding a I think changing it to a status flag would only be functionally different if the status flag were synchronous (i.e. |
|
Hopefully those bits will make sense.
Note that these errors are already defined at https://heycam.github.io/webidl/#idl-DOMException-error-names const gpuAdapter = await gpu.requestAdapter({ /* options */ });
const gpuDevice = await gpuAdapter.requestDevice({ /* options */ });
try {
const buffer = gpuDevice.requestBuffer({ /* options */ });
// TODO: Play with buffer.
} catch (error) {
if (error.name === "NotSupportedError") {
console.log(error.message);
//
} else if (error.name === "UnknownError") {
...
} else {
...
}
// TODO: Try creating another buffer
} |
|
I'm slightly concerned about greater usage of
|
|
Compared with the weight of an allocating/freeing an actual GPU buffer or texture resource, I'm not particularly concerned about the weight of the promise. The WASM problem is not really solved yet (for fallible allocations). Right now we can't even pass objects between threads while a WASM thread is blocking, so we have a lot to figure out. (My nested event loop proposal would solve both.) I don't have time to think about it right now, but hopefully we can discuss it in the next meeting. |
|
|
Discussed at the 11 Feb 2019 WebGPU Meeting |
| ```webidl | ||
| partial interface GPUDevice { | ||
| GPUObjectStatusQuery getObjectStatus(StatusableObject object); | ||
| Promise<GPUBuffer?> tryCreateBuffer(GPUBufferDescriptor descriptor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it nullable? Shouldn't the API just reject the promise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked about this in a my last comment, just above.
It could reject, but I currently have strong preference to distinguish it via returning null.
|
Closing this for now because it will interact with error-checking-for-testing. |
|
FTR: Supplanted by #215 |
* Add IDL tests for flags interfaces * disable no-undef, not useful with typescript
Provides a simpler API for fallible allocation of
GPUBuffers andGPUTextures.EDIT: Note this removes the object status query too.