ErrorHandling.md: tryCreate* #197

kainino0x · 2019-01-31T22:20:13Z

Provides a simpler API for fallible allocation of GPUBuffers and GPUTextures.

EDIT: Note this removes the object status query too.

kenrussell

Not completely convinced of this direction. Once memory's filled up, any tiny allocation can cause an allocation failure. In some apps it won't be the big texture or buffer allocation, but some other smaller allocation which provokes out-of-memory.

Upon further thought I think some way of listening for a certain kind of error, like "OUT_OF_MEMORY", and being able to recover from it is a more robust approach.

However I recognize that one would not necessarily easily be able to tell which objects failed allocation and attempt their re-allocation after deleting other objects (perhaps purging a tile / texture cache).

Adding Promise-based versions of just these two APIs seems pretty asymmetric. Also, I don't think we should add Promise-based versions of all of the allocation APIs.

kainino0x · 2019-02-01T03:08:34Z

Generally, I agree with what you're saying. However, I don't think #196 is the right mechanism through which to handle it: I'd like to keep it very specifically targeted at telemetry-style error logging.

Aside from maybe the asynchronous nature of tryCreate*, I think it more or less provides the primitives an app would need to do this. Here's a sketch of something an app could do:

class BufferManager {
  device: GPUDevice;
  // hypothetical max heap which is ordered on number
  buffers: MaxHeap<number, GPUBuffer> = new MaxHeap<number, GPUBuffer>();

  async alloc(nice: number, desc: GPUBufferDescriptor): Promise<GPUBuffer> {
    let buffer: GPUBuffer | null = null;
    while (true) {
      let buffer = await this.device.tryCreateBuffer(desc);
      if (buffer) { break; }

      const [nicestNice, nicestBuffer] = this.buffers.pop();
      if (nicestNice <= 0) { throw "OOM"; }
      if (nicestNice <= nice) { return null; }
      nicestBuffer.destroy();
      nicestBuffer.destroyed = true;
    }
    buffer.destroyed = false;
    this.buffers.push(nice, buffer);
    return buffer;
  }
}

It's not as ideal as what could be done with a memory heap, but it's something.

kainino0x · 2019-02-01T03:10:38Z

I don't think there any other APIs that do large allocations. I think(?) it's safe to consider OOM on e.g. sampler allocation to be catastrophic. (i.e. device loss - but we could be more forgiving than that.)

kenrussell · 2019-02-01T08:07:43Z

Commented on the other PR, but to continue the conversation here: out-of-memory is the main kind of error that applications will likely try to recover from. Does it seem like a good idea to make the recovery from this error robust? That implies that it should be possible to recover from allocation failures of all types of GPU objects.

kainino0x · 2019-02-01T08:53:10Z

I think what you're suggesting is that if an app sees OOM on a small (e.g. sampler) allocation, then it should be able to free up a large texture and try again. Maybe that is true. I have 2 concerns: 1, if the driver really gave us OOM for that tiny allocation, its memory probably in pretty bad condition. I'm not sure we should trust it to still be in usable state. 2, is it even likely that freeing a large allocation will fix the small allocation? only if the driver's OOM error was really out-of-GPU-memory and not some other strange case (like ran out of slots in an internal table). I don't think it would be good to put applications in a state where they progressively (slowly) kill all their texture resources because of a sampler allocation failure, when really something more subtle was going on.

Kangz · 2019-02-01T10:45:17Z

design/ErrorHandling.md

+   E.g. create a large buffer, create a bind group from that, create a command buffer from that, then choose whether to submit based on whether the buffer was successfully allocated.
+    - If yes, `tryCreateBuffer` must return `GPUBuffer` and error log entries must not be generated when creating objects from invalid objects.
+      (Only log errors on queue.submit and other device/queue level operations.)
+    - If no, `tryCreateBuffer` should return `Promise<GPUBuffer>` and error log entries should be generated when creating objects from invalid objects.


This seems to be a much simpler approach, even if it requires some extra asynchronicity from the application.

design/ErrorHandling.md

Kangz · 2019-02-01T10:50:52Z

Looks good. Like @kainino0x said, textures and buffers OOM would primarily be caused by running out of GPU memory which is recoverable. Other types of OOM oculd be because we ran out of CPU memory, space in hardware tables, etc and don't seem like they could be handled in any useful way by applications.

kvark

I have mixed feelings about this change. It looks like the API is getting split between CIN (contagious internal nullability) and promises now, and it makes it somewhat inelegant.
If we are talking about a case where the user creates a resource that they want to handle a failure for, wouldn't a promise be semi-equivalent to creating it as CIN and then checking the status explicitly before submitting any work based on it?

design/ErrorHandling.md

kainino0x · 2019-02-01T19:22:42Z

Note: this PR also removes object status query. The rationale for this is that the query was only useful for:

detecting createPipeline completion (now Promise versions of create*Pipeline #184)
detecting createBuffer/Texture OOM (now ErrorHandling.md: tryCreate* #197)

Validation errors should not be detected, because they indicate programming errors (hopefully), so they are surfaced through the error log instead (#196).

Device loss applies to all objects on a device, and is handled through a separate mechanism (#198), so it's not useful to separately know that an individual object on the device is invalid.

kainino0x · 2019-02-01T22:06:22Z

design/ErrorHandling.md

+ - Should `tryCreate*` (and `createReady*Pipeline`) resolve to invalid objects on validation failure and device loss, instead of rejecting?
+   This simplifies things slightly by avoiding giving useless extra info to the app via "reject" (which they would ignore/noop anyway).
+   Instead these Promises would never reject.
+    - Tentatively resolved: yes.


If there are any further comments on my proposed resolve/reject semantics of tryCreate/createReady, this thread would be a good place.

kainino0x · 2019-02-01T22:15:24Z

If we are talking about a case where the user creates a resource that they want to handle a failure for, wouldn't a promise be semi-equivalent to creating it as CIN and then checking the status explicitly before submitting any work based on it?

Yeah, pretty much. However I think having a version that returns Promise is a simpler API than adding a fallible flag to the creation descriptor (or having a separate entry point for fallible allocations), plus adding the concept of GPUObjectStatus and device.getObjectStatus -> Promise<GPUObjectStatus>, like we had before.

I think changing it to a status flag would only be functionally different if the status flag were synchronous (i.e. GPUBuffer.allocationSucceeded).

beaufortfrancois · 2019-02-04T12:15:37Z

Hopefully those bits will make sense.

Following the promises pattern in WebGPU, I'd rename tryCreateBuffer to requestBuffer.
I think promises should not resolve with null objects. They should reject when resource allocation runs out of memory, there are validation errors, or device is lost with an explicit error like Web Bluetooth does for instance: https://webbluetoothcg.github.io/web-bluetooth/#error-handling

Note that these errors are already defined at https://heycam.github.io/webidl/#idl-DOMException-error-names

const gpuAdapter = await gpu.requestAdapter({ /* options */ });
const gpuDevice = await gpuAdapter.requestDevice({ /* options */ });

try {
  const buffer = gpuDevice.requestBuffer({ /* options */ });
  // TODO: Play with buffer.
} catch (error) {
  if (error.name === "NotSupportedError") {
    console.log(error.message);
    // 
  } else if (error.name === "UnknownError") {
    ...
  } else { 
    ...
  }
  // TODO: Try creating another buffer
}

grovesNL · 2019-02-04T14:27:15Z

I'm slightly concerned about greater usage of Promise in the API here and in #184 because of:

Additional garbage collection requirements because promises can't be reused. Although this really depends on the frequency at which these functions are called.
Control flow complications when used with WebAssembly (for example, the workaround in Emterpreter for dealing with that when compiling to WebAssembly from C++).

kainino0x · 2019-02-05T19:34:57Z

Compared with the weight of an allocating/freeing an actual GPU buffer or texture resource, I'm not particularly concerned about the weight of the promise.

The WASM problem is not really solved yet (for fallible allocations). Right now we can't even pass objects between threads while a WASM thread is blocking, so we have a lot to figure out. (My nested event loop proposal would solve both.) I don't have time to think about it right now, but hopefully we can discuss it in the next meeting.

kainino0x · 2019-02-05T19:42:56Z

Re @beaufortfrancois

Minor objection, I don't think the semantics match. The failure cases of request* vs tryCreate* are semantically completely different. So I think it's better for them to have different names (but would be ok renaming them to match). If there are conventions in the rest of the web platform, we might prefer to follow them.
I don't really like bunching all of the "errors" up into the "reject" case. It means applications can easily accidentally take "reject" to mean "failed to allocate" instead of checking the error type. As it is, the reject cases of tryCreate* and creatyReady* are exactly identical, which I think is preferable. (Also, in the case of tryCreate*, failure to allocate is NOT an error - which is also a good reason for it to not be a "reject". AFAICT, Promises in IDL are expected to reject with Error/Exception types, which isn't really honest here.)

grorg · 2019-02-11T19:58:01Z

Discussed at the 11 Feb 2019 WebGPU Meeting

dmikis · 2019-02-11T20:24:01Z

design/ErrorHandling.md

+```webidl
 partial interface GPUDevice {
-    GPUObjectStatusQuery getObjectStatus(StatusableObject object);
+    Promise<GPUBuffer?> tryCreateBuffer(GPUBufferDescriptor descriptor);


Why is it nullable? Shouldn't the API just reject the promise?

I talked about this in a my last comment, just above.

It could reject, but I currently have strong preference to distinguish it via returning null.

kainino0x · 2019-02-20T19:43:34Z

Closing this for now because it will interact with error-checking-for-testing.

kainino0x · 2019-02-26T03:23:17Z

FTR: Supplanted by #215

* Add IDL tests for flags interfaces * disable no-undef, not useful with typescript

kainino0x force-pushed the errors2-trycreate branch from 06402f7 to c2113f5 Compare January 31, 2019 22:24

kenrussell reviewed Feb 1, 2019

View reviewed changes

kainino0x requested review from Kangz and kdashg February 1, 2019 02:39

kainino0x mentioned this pull request Feb 1, 2019

ErrorHandling.md: redo Telemetry with events #196

Merged

kainino0x requested a review from litherum February 1, 2019 03:12

Kangz reviewed Feb 1, 2019

View reviewed changes

design/ErrorHandling.md Show resolved Hide resolved

kvark reviewed Feb 1, 2019

View reviewed changes

design/ErrorHandling.md Show resolved Hide resolved

kainino0x force-pushed the errors2-trycreate branch from c2113f5 to 86309d9 Compare February 1, 2019 19:09

kainino0x mentioned this pull request Feb 1, 2019

Promise versions of create*Pipeline #184

Merged

kainino0x force-pushed the errors2-trycreate branch 2 times, most recently from 438e93d to a591726 Compare February 1, 2019 22:05

kainino0x commented Feb 1, 2019

View reviewed changes

ErrorHandling.md: tryCreate*

d283645

kainino0x force-pushed the errors2-trycreate branch from a591726 to d283645 Compare February 1, 2019 22:09

kainino0x mentioned this pull request Feb 8, 2019

Add explainer document for BufferOperations #147

Merged

dmikis reviewed Feb 11, 2019

View reviewed changes

kainino0x closed this Feb 20, 2019

kainino0x mentioned this pull request Feb 22, 2019

Revamp object status APIs with testing in mind #215

Closed

kainino0x deleted the errors2-trycreate branch February 26, 2019 03:23

ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this pull request Sep 6, 2022

updates for gpuweb#181,gpuweb#196,gpuweb#197,gpuweb#198

f165075

ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this pull request Sep 6, 2022

Add IDL tests for flags interfaces (gpuweb#197)

1fd0e25

* Add IDL tests for flags interfaces * disable no-undef, not useful with typescript

ErrorHandling.md: tryCreate* #197

ErrorHandling.md: tryCreate* #197

Uh oh!

Conversation

kainino0x commented Jan 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kenrussell left a comment

Choose a reason for hiding this comment

Uh oh!

kainino0x commented Feb 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kainino0x commented Feb 1, 2019

Uh oh!

kenrussell commented Feb 1, 2019

Uh oh!

kainino0x commented Feb 1, 2019

Uh oh!

Kangz Feb 1, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kangz commented Feb 1, 2019

Uh oh!

kvark left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kainino0x commented Feb 1, 2019

Uh oh!

kainino0x Feb 1, 2019

Choose a reason for hiding this comment

Uh oh!

kainino0x commented Feb 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beaufortfrancois commented Feb 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grovesNL commented Feb 4, 2019

Uh oh!

kainino0x commented Feb 5, 2019

Uh oh!

kainino0x commented Feb 5, 2019

Uh oh!

grorg commented Feb 11, 2019

Uh oh!

dmikis Feb 11, 2019

Choose a reason for hiding this comment

Uh oh!

kainino0x Feb 11, 2019

Choose a reason for hiding this comment

Uh oh!

kainino0x commented Feb 20, 2019

Uh oh!

kainino0x commented Feb 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

kainino0x commented Jan 31, 2019 •

edited

Loading

kainino0x commented Feb 1, 2019 •

edited

Loading

kainino0x commented Feb 1, 2019 •

edited

Loading

beaufortfrancois commented Feb 4, 2019 •

edited

Loading