Skip to content

Conversation

@kainino0x
Copy link
Contributor

@kainino0x kainino0x commented Jan 31, 2019

Provides a simpler API for fallible allocation of GPUBuffers and GPUTextures.

EDIT: Note this removes the object status query too.

Copy link
Member

@kenrussell kenrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not completely convinced of this direction. Once memory's filled up, any tiny allocation can cause an allocation failure. In some apps it won't be the big texture or buffer allocation, but some other smaller allocation which provokes out-of-memory.

Upon further thought I think some way of listening for a certain kind of error, like "OUT_OF_MEMORY", and being able to recover from it is a more robust approach.

However I recognize that one would not necessarily easily be able to tell which objects failed allocation and attempt their re-allocation after deleting other objects (perhaps purging a tile / texture cache).

Adding Promise-based versions of just these two APIs seems pretty asymmetric. Also, I don't think we should add Promise-based versions of all of the allocation APIs.

@kainino0x
Copy link
Contributor Author

kainino0x commented Feb 1, 2019

Generally, I agree with what you're saying. However, I don't think #196 is the right mechanism through which to handle it: I'd like to keep it very specifically targeted at telemetry-style error logging.

Aside from maybe the asynchronous nature of tryCreate*, I think it more or less provides the primitives an app would need to do this. Here's a sketch of something an app could do:

class BufferManager {
  device: GPUDevice;
  // hypothetical max heap which is ordered on number
  buffers: MaxHeap<number, GPUBuffer> = new MaxHeap<number, GPUBuffer>();

  async alloc(nice: number, desc: GPUBufferDescriptor): Promise<GPUBuffer> {
    let buffer: GPUBuffer | null = null;
    while (true) {
      let buffer = await this.device.tryCreateBuffer(desc);
      if (buffer) { break; }

      const [nicestNice, nicestBuffer] = this.buffers.pop();
      if (nicestNice <= 0) { throw "OOM"; }
      if (nicestNice <= nice) { return null; }
      nicestBuffer.destroy();
      nicestBuffer.destroyed = true;
    }
    buffer.destroyed = false;
    this.buffers.push(nice, buffer);
    return buffer;
  }
}

It's not as ideal as what could be done with a memory heap, but it's something.

@kainino0x
Copy link
Contributor Author

I don't think there any other APIs that do large allocations. I think(?) it's safe to consider OOM on e.g. sampler allocation to be catastrophic. (i.e. device loss - but we could be more forgiving than that.)

@kainino0x kainino0x requested a review from litherum February 1, 2019 03:12
@kenrussell
Copy link
Member

Commented on the other PR, but to continue the conversation here: out-of-memory is the main kind of error that applications will likely try to recover from. Does it seem like a good idea to make the recovery from this error robust? That implies that it should be possible to recover from allocation failures of all types of GPU objects.

@kainino0x
Copy link
Contributor Author

I think what you're suggesting is that if an app sees OOM on a small (e.g. sampler) allocation, then it should be able to free up a large texture and try again. Maybe that is true. I have 2 concerns: 1, if the driver really gave us OOM for that tiny allocation, its memory probably in pretty bad condition. I'm not sure we should trust it to still be in usable state. 2, is it even likely that freeing a large allocation will fix the small allocation? only if the driver's OOM error was really out-of-GPU-memory and not some other strange case (like ran out of slots in an internal table). I don't think it would be good to put applications in a state where they progressively (slowly) kill all their texture resources because of a sampler allocation failure, when really something more subtle was going on.

E.g. create a large buffer, create a bind group from that, create a command buffer from that, then choose whether to submit based on whether the buffer was successfully allocated.
- If yes, `tryCreateBuffer` must return `GPUBuffer` and error log entries must not be generated when creating objects from invalid objects.
(Only log errors on queue.submit and other device/queue level operations.)
- If no, `tryCreateBuffer` should return `Promise<GPUBuffer>` and error log entries should be generated when creating objects from invalid objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a much simpler approach, even if it requires some extra asynchronicity from the application.

@Kangz
Copy link
Contributor

Kangz commented Feb 1, 2019

Looks good. Like @kainino0x said, textures and buffers OOM would primarily be caused by running out of GPU memory which is recoverable. Other types of OOM oculd be because we ran out of CPU memory, space in hardware tables, etc and don't seem like they could be handled in any useful way by applications.

Copy link
Contributor

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about this change. It looks like the API is getting split between CIN (contagious internal nullability) and promises now, and it makes it somewhat inelegant.
If we are talking about a case where the user creates a resource that they want to handle a failure for, wouldn't a promise be semi-equivalent to creating it as CIN and then checking the status explicitly before submitting any work based on it?

@kainino0x
Copy link
Contributor Author

Note: this PR also removes object status query. The rationale for this is that the query was only useful for:

Validation errors should not be detected, because they indicate programming errors (hopefully), so they are surfaced through the error log instead (#196).

Device loss applies to all objects on a device, and is handled through a separate mechanism (#198), so it's not useful to separately know that an individual object on the device is invalid.

@kainino0x kainino0x force-pushed the errors2-trycreate branch 2 times, most recently from 438e93d to a591726 Compare February 1, 2019 22:05
- Should `tryCreate*` (and `createReady*Pipeline`) resolve to invalid objects on validation failure and device loss, instead of rejecting?
This simplifies things slightly by avoiding giving useless extra info to the app via "reject" (which they would ignore/noop anyway).
Instead these Promises would never reject.
- Tentatively resolved: yes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are any further comments on my proposed resolve/reject semantics of tryCreate/createReady, this thread would be a good place.

@kainino0x
Copy link
Contributor Author

kainino0x commented Feb 1, 2019

If we are talking about a case where the user creates a resource that they want to handle a failure for, wouldn't a promise be semi-equivalent to creating it as CIN and then checking the status explicitly before submitting any work based on it?

Yeah, pretty much. However I think having a version that returns Promise is a simpler API than adding a fallible flag to the creation descriptor (or having a separate entry point for fallible allocations), plus adding the concept of GPUObjectStatus and device.getObjectStatus -> Promise<GPUObjectStatus>, like we had before.

I think changing it to a status flag would only be functionally different if the status flag were synchronous (i.e. GPUBuffer.allocationSucceeded).

@beaufortfrancois
Copy link
Contributor

beaufortfrancois commented Feb 4, 2019

Hopefully those bits will make sense.

  1. Following the promises pattern in WebGPU, I'd rename tryCreateBuffer to requestBuffer.

  2. I think promises should not resolve with null objects. They should reject when resource allocation runs out of memory, there are validation errors, or device is lost with an explicit error like Web Bluetooth does for instance: https://webbluetoothcg.github.io/web-bluetooth/#error-handling

Note that these errors are already defined at https://heycam.github.io/webidl/#idl-DOMException-error-names

const gpuAdapter = await gpu.requestAdapter({ /* options */ });
const gpuDevice = await gpuAdapter.requestDevice({ /* options */ });

try {
  const buffer = gpuDevice.requestBuffer({ /* options */ });
  // TODO: Play with buffer.
} catch (error) {
  if (error.name === "NotSupportedError") {
    console.log(error.message);
    // 
  } else if (error.name === "UnknownError") {
    ...
  } else { 
    ...
  }
  // TODO: Try creating another buffer
}

@grovesNL
Copy link
Contributor

grovesNL commented Feb 4, 2019

I'm slightly concerned about greater usage of Promise in the API here and in #184 because of:

  • Additional garbage collection requirements because promises can't be reused. Although this really depends on the frequency at which these functions are called.
  • Control flow complications when used with WebAssembly (for example, the workaround in Emterpreter for dealing with that when compiling to WebAssembly from C++).

@kainino0x
Copy link
Contributor Author

Compared with the weight of an allocating/freeing an actual GPU buffer or texture resource, I'm not particularly concerned about the weight of the promise.

The WASM problem is not really solved yet (for fallible allocations). Right now we can't even pass objects between threads while a WASM thread is blocking, so we have a lot to figure out. (My nested event loop proposal would solve both.) I don't have time to think about it right now, but hopefully we can discuss it in the next meeting.

@kainino0x
Copy link
Contributor Author

Re @beaufortfrancois

  1. Minor objection, I don't think the semantics match. The failure cases of request* vs tryCreate* are semantically completely different. So I think it's better for them to have different names (but would be ok renaming them to match). If there are conventions in the rest of the web platform, we might prefer to follow them.

  2. I don't really like bunching all of the "errors" up into the "reject" case. It means applications can easily accidentally take "reject" to mean "failed to allocate" instead of checking the error type. As it is, the reject cases of tryCreate* and creatyReady* are exactly identical, which I think is preferable. (Also, in the case of tryCreate*, failure to allocate is NOT an error - which is also a good reason for it to not be a "reject". AFAICT, Promises in IDL are expected to reject with Error/Exception types, which isn't really honest here.)

@grorg
Copy link
Contributor

grorg commented Feb 11, 2019

Discussed at the 11 Feb 2019 WebGPU Meeting

```webidl
partial interface GPUDevice {
GPUObjectStatusQuery getObjectStatus(StatusableObject object);
Promise<GPUBuffer?> tryCreateBuffer(GPUBufferDescriptor descriptor);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it nullable? Shouldn't the API just reject the promise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked about this in a my last comment, just above.

It could reject, but I currently have strong preference to distinguish it via returning null.

@kainino0x
Copy link
Contributor Author

Closing this for now because it will interact with error-checking-for-testing.

@kainino0x kainino0x closed this Feb 20, 2019
@kainino0x kainino0x deleted the errors2-trycreate branch February 26, 2019 03:23
@kainino0x
Copy link
Contributor Author

FTR: Supplanted by #215

ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this pull request Sep 6, 2022
ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this pull request Sep 6, 2022
* Add IDL tests for flags interfaces

* disable no-undef, not useful with typescript
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants