Skip to content

Conversation

@zoddicus
Copy link
Contributor

The current value causes issues with machines that have more than 128GB of RAM. This increase takes the allocation to the highest precisely representable value, which is ~8,192TB.

I am aware of no commercially available machine supports anything near this much RAM, so this should be good for the foreseeable future. If there is a device in the future with this much RAM, hopefully Number will have been re-specified to something like a float128 by then.

Fixes #4488


Requirements for PR author:

  • All missing test coverage is tracked with "TODO" or .unimplemented().
  • New helpers are /** documented */ and new helper files are found in helper_index.txt.
  • Test behaves as expected in a WebGPU implementation. (If not passing, explain above.)
  • Test have be tested with compatibility mode validation enabled and behave as expected. (If not passing, explain above.)

Requirements for reviewer sign-off:

  • Tests are properly located in the test tree.
  • Test descriptions allow a reader to "read only the test plans and evaluate coverage completeness", and accurately reflect the test code.
  • Tests provide complete coverage (including validation control cases). Missing coverage MUST be covered by TODOs.
  • Helpers and types promote readability and maintainability.

When landing this PR, be sure to make any necessary issue status updates.

@zoddicus zoddicus requested review from dj2 and kainino0x October 28, 2025 18:40
@zoddicus zoddicus self-assigned this Oct 28, 2025
The current value version of the OOM allocation causes issues with
machines that have more than 128GB of RAM. Since one of the values it
tries to test is 128GB. This change removes this lower value, leaving
the higher value kMaxSafeMultipleOf8, which is ~8,192TB.

I am aware of no commercially available machine supports anything near
this much RAM, so this should be good for the foreseeable future. If
there is a device in the future with this much RAM, hopefully Number
will have been re-specified to something like a float128 by then.

Fixes gpuweb#4488
Copy link
Contributor Author

@zoddicus zoddicus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAAL

@zoddicus zoddicus requested a review from jrprice October 28, 2025 19:40
@zoddicus zoddicus merged commit 8a39e1d into gpuweb:main Oct 28, 2025
1 check passed
@zoddicus zoddicus deleted the updateOOMValue branch October 28, 2025 20:07
return oom
? [
kMaxSafeMultipleOf8,
0x20_0000_0000, // 128 GB
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, I think the reason to test both kMaxSafeMultipleOf8 and 128 GB was that we expected kMaxSafeMultipleOf8 to hit some hardcoded limit and 128 GB to hit an actual OOM. Unfortunately we did expect that to be possibly unreliable, so it was one of those test cases I call "aspirational".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be useful to introduce a new version of the 'actual OOM' limit, maybe 4TB or some other value, with a note about it potentially needing to upped in the future, or is it likely we are already getting into the space where it will be hard to distinguish between a hardcoded limit and a true OOM?


Did a brief investigation on what current operation system RAM limits look like, and Windows 11 supports between 128GB and 6TB of RAM depending on the license/version. And I believe the current memory page scheme of the Linux kernel has a theoretical ceiling of 4 PB of RAM.

So we are kinda in a world already where I can build a workstation with more that the maximum amount of RAM for a 'common' OS. So I am not sure how much having a seperate test value would give us, and I really don't like the idea of trying to do some sort of OS inspection to figure out an appropriate value.

Copy link
Collaborator

@kainino0x kainino0x Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably just leave it up to individual implementations to test this if they want to, since there's no way to know the hardcoded OOM threshold. Also that way the test could have some backdoor to see whether an OOM is real (came from the underlying API) or hardcoded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OOM allocation in src/webgpu/api/operation/buffers/map_oom.spec.ts too small

4 participants