gpuweb · kainino0x · Jan 30, 2019 · dmikis · Feb 11, 2019 · kainino0x
diff --git a/design/ErrorHandling.md b/design/ErrorHandling.md
@@ -7,7 +7,7 @@ There are a number of cases that developers or applications need error handling
 
  - *Debugging*: Getting errors synchronously during development, to break in to the debugger.
  - *Telemetry*: Collecting error logs in deployment, for bug reporting and telemetry.
- - *Recovery*: Recovering from recoverable errors (like out-of-memory on resource creation).
+ - *Fallible Allocation*: Recovering from recoverable errors (like out-of-memory on resource creation).
  - *Fallback*: Tearing down the application and falling back, e.g. to WebGL, 2D Canvas, or static content.
 
 Meanwhile, error handling should not make the API clunky to use.
@@ -96,61 +96,38 @@ The effect of an error depends on the type of a call:
 
 In each case, an error is logged to the error log.
 
-## *Recovery*: Recoverable Errors
+## *Fallible Allocation*: Out-of-memory in object creation
 
-Recoverable errors are produced only by object creation.
-The status of an object can be retrieved asynchronously (see next section).
-
-```
-enum GPUObjectStatus {
-    // The object is valid.
-    "valid",
-    // The object is invalid due to a non-fatal allocation failure.
-    // The application can use this as a signal to scale back resource usage, if possible.
-    "out-of-memory",
-    // The object is invalid for another, unrecoverable reason.
-    "invalid",
-};
-```
-
-(Note that object creation failures always send an error to the error log, regardless of the object type or the type of failure.)
-
-If an application uses recoverable allocation, the implementation will still generate error log entries:
-a `"recoverable-out-of-memory"` error for the object creation, and `"validation-error"`s for any subsequent uses of the invalid object.
-The application may need to understand whether such error log entries were part of a recovered allocation (e.g. to avoid sending telemetry for those errors).
-To facilitate this filtering, the handle to the invalid object (including "expando" JavaScript properties) is attached to the error log entry (see above).
-
-### Recoverable errors in object creation
-
-A recoverable error is exposed as a `GPUObjectStatusQuery`.
-
-```
-// (Exact form/type subject to change.)
-typedef Promise<GPUObjectStatus> GPUObjectStatusQuery;
-
-typedef (GPUBuffer or GPUTexture) StatusableObject;
+Recoverable fallible allocations are exposed as Promise versions of the `createBuffer`/`createTexture` entry points.
 
+```webidl
 partial interface GPUDevice {
-    GPUObjectStatusQuery getObjectStatus(StatusableObject object);
+    Promise<GPUBuffer?> tryCreateBuffer(GPUBufferDescriptor descriptor);
+    Promise<GPUTexture?> tryCreateTexture(GPUTextureDescriptor descriptor);
 };
 ```
 
-A concrete example: When creating a buffer, the following logic applies:
+If an application wants to allocate with fatal out-of-memory, it uses `createBuffer`/`createTexture`.
+Just like with any the creation of smaller objects, an out-of-memory condition will be treated as a fatal error: the device is lost.
+
+The `tryCreate*` entry points return Promises.
+ - If resource allocation succeeds, the Promise resolves to a *valid* object.
+   (Though device loss can still make the object invalid at any time.)
+ - If resource allocation runs out of memory, the Promise resolves to `null`.
+ - If there is a validation error, the Promise resolves to an invalid object (and produces a log entry).
+    - If the device is lost, the Promise resolves to an invalid object.
+(The Promise never rejects, as the application should not need to further handle those cases.)
 
- - `createBuffer` returns a `GPUBuffer` object `buffer` immediately.
- - A `GPUObjectStatusQuery` can be obtained by calling `device.getObjectStatus(buffer)`.
-   At a later time, that query resolves to a `GPUObjectStatus` that is one of:
-    - Creation succeeded (`"valid"`).
-    - Creation encountered a recoverable error (`"out-of-memory"`).
-      (The application can then choose to retry a smaller allocation of a *new* `GPUBuffer`.)
-    - Creation encountered another type of error out of the control of the application (`"invalid"`).
+### Relation to `createReady*Pipeline`
 
-Regardless of any recovery efforts the application makes, if creation fails,
-the resulting object is invalid (and subject to error propagation).
+`createReady*Pipeline` should be symmetrical with `tryCreate*`, so it is also described here.
 
-Checking the status of a `GPUBuffer` or `GPUTexture` is **not** required.
-It is only necessary if an application wishes to recover from recoverable errors such as out of memory.
-(If it does, it is responsible for avoiding using the invalid object.)
+While not error-handling related, the `createReady*Pipeline` entry points also return Promises.
+ - If creation succeeds, the Promise resolves to a *valid* object.
+   (Though device loss can still make the object invalid at any time.)
+ - If there is a validation error, the Promise resolves to an invalid object (and produces a log entry).
+    - If the device is lost, the Promise resolves to an invalid object.
+(The Promise never rejects, as the application should not need to further handle those cases.)
 
 ## Open Questions and Considerations
 
@@ -169,7 +146,10 @@ It is only necessary if an application wishes to recover from recoverable errors
    - <mesh3.indexBuffer> is invalid because it got an unsupported usage flag (0x89)
    ```
 
- - The exact shape of `GPUObjectStatusQuery` (currently `Promise<GPUObjectStatus>`) may piggy-back on the decision taken for `GPUFence`.
+ - How do applications handle the case where they've allocated a lot of optional memory, but want to make another required allocation (which could fail due to OOM)?
+   How do they know when to free an optional allocation first?
+    - For now, applications wanting to handle this kind of case must always use fallible allocations.
+    - (We will likely improve this with a `GPUResourceHeap`, once we figure out what that looks like.)
 
 ## Resolved Questions
 
@@ -184,3 +164,21 @@ It is only necessary if an application wishes to recover from recoverable errors
  - Should an object creation error immediately log an error to the error log?
    Or should it only log if the error propagates to a device-level operation?
     - Tentatively resolved: errors should be logged immediately.
+
+ - Should applications be able to intentionally create graphs of potentially-invalid objects, and recover from this late?
+   E.g. create a large buffer, create a bind group from that, create a command buffer from that, then choose whether to submit based on whether the buffer was successfully allocated.
+    - If yes, `tryCreateBuffer` must return `GPUBuffer` and error log entries must not be generated when creating objects from invalid objects.
+      (Only log errors on queue.submit and other device/queue level operations.)
+    - If no, `tryCreateBuffer` should return `Promise<GPUBuffer?>` and error log entries should be generated when creating objects from invalid objects.
+    - Tentatively resolved: no.
+
+ - Should there be an API to query object status?
+    - Tentatively resolved: no. (API was removed in #197.)
+       - This query was only useful for detecting createPipeline completion and createBuffer/Texture OOM (now both done via Promise).
+       - Validation errors should not be detected, because they indicate programming errors (hopefully), so they are surfaced through the error log instead.
+       - Device loss applies to all objects on a device, and is handled through a separate mechanism, so it's not useful to separately know that an individual object on the device is invalid.
+
+ - Should `tryCreate*` (and `createReady*Pipeline`) resolve to invalid objects on validation failure and device loss, instead of rejecting?
+   This simplifies things slightly by avoiding giving useless extra info to the app via "reject" (which they would ignore/noop anyway).
+   Instead these Promises would never reject.
+    - Tentatively resolved: yes.
diff --git a/design/sketch.webidl b/design/sketch.webidl
@@ -660,6 +660,9 @@ interface GPUDevice {
     GPUTexture createTexture(GPUTextureDescriptor descriptor);
     GPUSampler createSampler(GPUSamplerDescriptor descriptor);
 
+    Promise<GPUBuffer?> tryCreateBuffer(GPUBufferDescriptor descriptor);
+    Promise<GPUTexture?> tryCreateTexture(GPUTextureDescriptor descriptor);
+
     GPUBindGroupLayout createBindGroupLayout(GPUBindGroupLayoutDescriptor descriptor);
     GPUPipelineLayout createPipelineLayout(GPUPipelineLayoutDescriptor descriptor);
     GPUBindGroup createBindGroup(GPUBindGroupDescriptor descriptor);
@@ -674,7 +677,6 @@ interface GPUDevice {
     GPUQueue getQueue();
 
     attribute GPULogCallback onLog;
-    GPUObjectStatusQuery getObjectStatus(GPUStatusableObject statusableObject);
 };
 
 dictionary GPUDeviceDescriptor {