Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 46 additions & 48 deletions design/ErrorHandling.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ There are a number of cases that developers or applications need error handling

- *Debugging*: Getting errors synchronously during development, to break in to the debugger.
- *Telemetry*: Collecting error logs in deployment, for bug reporting and telemetry.
- *Recovery*: Recovering from recoverable errors (like out-of-memory on resource creation).
- *Fallible Allocation*: Recovering from recoverable errors (like out-of-memory on resource creation).
- *Fallback*: Tearing down the application and falling back, e.g. to WebGL, 2D Canvas, or static content.

Meanwhile, error handling should not make the API clunky to use.
Expand Down Expand Up @@ -96,61 +96,38 @@ The effect of an error depends on the type of a call:

In each case, an error is logged to the error log.

## *Recovery*: Recoverable Errors
## *Fallible Allocation*: Out-of-memory in object creation

Recoverable errors are produced only by object creation.
The status of an object can be retrieved asynchronously (see next section).

```
enum GPUObjectStatus {
// The object is valid.
"valid",
// The object is invalid due to a non-fatal allocation failure.
// The application can use this as a signal to scale back resource usage, if possible.
"out-of-memory",
// The object is invalid for another, unrecoverable reason.
"invalid",
};
```

(Note that object creation failures always send an error to the error log, regardless of the object type or the type of failure.)

If an application uses recoverable allocation, the implementation will still generate error log entries:
a `"recoverable-out-of-memory"` error for the object creation, and `"validation-error"`s for any subsequent uses of the invalid object.
The application may need to understand whether such error log entries were part of a recovered allocation (e.g. to avoid sending telemetry for those errors).
To facilitate this filtering, the handle to the invalid object (including "expando" JavaScript properties) is attached to the error log entry (see above).

### Recoverable errors in object creation

A recoverable error is exposed as a `GPUObjectStatusQuery`.

```
// (Exact form/type subject to change.)
typedef Promise<GPUObjectStatus> GPUObjectStatusQuery;

typedef (GPUBuffer or GPUTexture) StatusableObject;
Recoverable fallible allocations are exposed as Promise versions of the `createBuffer`/`createTexture` entry points.

```webidl
partial interface GPUDevice {
GPUObjectStatusQuery getObjectStatus(StatusableObject object);
Promise<GPUBuffer?> tryCreateBuffer(GPUBufferDescriptor descriptor);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it nullable? Shouldn't the API just reject the promise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked about this in a my last comment, just above.

It could reject, but I currently have strong preference to distinguish it via returning null.

Promise<GPUTexture?> tryCreateTexture(GPUTextureDescriptor descriptor);
};
```

A concrete example: When creating a buffer, the following logic applies:
If an application wants to allocate with fatal out-of-memory, it uses `createBuffer`/`createTexture`.
Just like with any the creation of smaller objects, an out-of-memory condition will be treated as a fatal error: the device is lost.

The `tryCreate*` entry points return Promises.
- If resource allocation succeeds, the Promise resolves to a *valid* object.
(Though device loss can still make the object invalid at any time.)
- If resource allocation runs out of memory, the Promise resolves to `null`.
- If there is a validation error, the Promise resolves to an invalid object (and produces a log entry).
- If the device is lost, the Promise resolves to an invalid object.
(The Promise never rejects, as the application should not need to further handle those cases.)

- `createBuffer` returns a `GPUBuffer` object `buffer` immediately.
- A `GPUObjectStatusQuery` can be obtained by calling `device.getObjectStatus(buffer)`.
At a later time, that query resolves to a `GPUObjectStatus` that is one of:
- Creation succeeded (`"valid"`).
- Creation encountered a recoverable error (`"out-of-memory"`).
(The application can then choose to retry a smaller allocation of a *new* `GPUBuffer`.)
- Creation encountered another type of error out of the control of the application (`"invalid"`).
### Relation to `createReady*Pipeline`

Regardless of any recovery efforts the application makes, if creation fails,
the resulting object is invalid (and subject to error propagation).
`createReady*Pipeline` should be symmetrical with `tryCreate*`, so it is also described here.

Checking the status of a `GPUBuffer` or `GPUTexture` is **not** required.
It is only necessary if an application wishes to recover from recoverable errors such as out of memory.
(If it does, it is responsible for avoiding using the invalid object.)
While not error-handling related, the `createReady*Pipeline` entry points also return Promises.
- If creation succeeds, the Promise resolves to a *valid* object.
(Though device loss can still make the object invalid at any time.)
- If there is a validation error, the Promise resolves to an invalid object (and produces a log entry).
- If the device is lost, the Promise resolves to an invalid object.
(The Promise never rejects, as the application should not need to further handle those cases.)

## Open Questions and Considerations

Expand All @@ -169,7 +146,10 @@ It is only necessary if an application wishes to recover from recoverable errors
- <mesh3.indexBuffer> is invalid because it got an unsupported usage flag (0x89)
```

- The exact shape of `GPUObjectStatusQuery` (currently `Promise<GPUObjectStatus>`) may piggy-back on the decision taken for `GPUFence`.
- How do applications handle the case where they've allocated a lot of optional memory, but want to make another required allocation (which could fail due to OOM)?
How do they know when to free an optional allocation first?
- For now, applications wanting to handle this kind of case must always use fallible allocations.
- (We will likely improve this with a `GPUResourceHeap`, once we figure out what that looks like.)

## Resolved Questions

Expand All @@ -184,3 +164,21 @@ It is only necessary if an application wishes to recover from recoverable errors
- Should an object creation error immediately log an error to the error log?
Or should it only log if the error propagates to a device-level operation?
- Tentatively resolved: errors should be logged immediately.

- Should applications be able to intentionally create graphs of potentially-invalid objects, and recover from this late?
E.g. create a large buffer, create a bind group from that, create a command buffer from that, then choose whether to submit based on whether the buffer was successfully allocated.
- If yes, `tryCreateBuffer` must return `GPUBuffer` and error log entries must not be generated when creating objects from invalid objects.
(Only log errors on queue.submit and other device/queue level operations.)
- If no, `tryCreateBuffer` should return `Promise<GPUBuffer?>` and error log entries should be generated when creating objects from invalid objects.
- Tentatively resolved: no.

- Should there be an API to query object status?
- Tentatively resolved: no. (API was removed in #197.)
- This query was only useful for detecting createPipeline completion and createBuffer/Texture OOM (now both done via Promise).
- Validation errors should not be detected, because they indicate programming errors (hopefully), so they are surfaced through the error log instead.
- Device loss applies to all objects on a device, and is handled through a separate mechanism, so it's not useful to separately know that an individual object on the device is invalid.

- Should `tryCreate*` (and `createReady*Pipeline`) resolve to invalid objects on validation failure and device loss, instead of rejecting?
This simplifies things slightly by avoiding giving useless extra info to the app via "reject" (which they would ignore/noop anyway).
Instead these Promises would never reject.
- Tentatively resolved: yes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are any further comments on my proposed resolve/reject semantics of tryCreate/createReady, this thread would be a good place.

4 changes: 3 additions & 1 deletion design/sketch.webidl
Original file line number Diff line number Diff line change
Expand Up @@ -660,6 +660,9 @@ interface GPUDevice {
GPUTexture createTexture(GPUTextureDescriptor descriptor);
GPUSampler createSampler(GPUSamplerDescriptor descriptor);

Promise<GPUBuffer?> tryCreateBuffer(GPUBufferDescriptor descriptor);
Promise<GPUTexture?> tryCreateTexture(GPUTextureDescriptor descriptor);

GPUBindGroupLayout createBindGroupLayout(GPUBindGroupLayoutDescriptor descriptor);
GPUPipelineLayout createPipelineLayout(GPUPipelineLayoutDescriptor descriptor);
GPUBindGroup createBindGroup(GPUBindGroupDescriptor descriptor);
Expand All @@ -674,7 +677,6 @@ interface GPUDevice {
GPUQueue getQueue();

attribute GPULogCallback onLog;
GPUObjectStatusQuery getObjectStatus(GPUStatusableObject statusableObject);
};

dictionary GPUDeviceDescriptor {
Expand Down