Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

65535 word shader literal limit #5390

Closed
eadwu opened this issue Mar 15, 2024 · 13 comments · Fixed by #5503
Closed

65535 word shader literal limit #5390

eadwu opened this issue Mar 15, 2024 · 13 comments · Fixed by #5503
Labels
area: naga back-end Outputs of naga shader conversion lang: SPIR-V Vulkan's Shading Language naga Shader Translator type: bug Something isn't working

Comments

@eadwu
Copy link

eadwu commented Mar 15, 2024

VALIDATION [VUID-VkShaderModuleCreateInfo-pCode-08737 (0xa5625282)]
        Validation Error: [ VUID-VkShaderModuleCreateInfo-pCode-08737 ] | MessageID = 0xa5625282 | vkCreateShaderModule(): pCreateInfo->pCode is not valid SPIR-V: Literal string is longer than 65535 words: 72543 words long. The Vulkan spec states: If pCode is a pointer to SPIR-V code, pCode must adhere to the validation rules described by the Validation Rules within a Module section of the SPIR-V Environment appendix (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkShaderModuleCreateInfo-pCode-08737)

I tried Ctrl+F'ing https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html for 65535 and cannot find any mention of this limit. Also sounds pretty weird if external shaders can be more than 65535 words but not a string literal (or is this just OOM preventation type thing).

Does not impact --release, so is some debug warning. Another possible solution is toggling this warning off since it isn't inherently harmful.

@teoxoy
Copy link
Member

teoxoy commented Mar 15, 2024

I can't find this validation error in the SPIR-V Environment appendix neither.

@teoxoy teoxoy added the api: vulkan Issues with Vulkan label Mar 15, 2024
@grovesNL
Copy link
Collaborator

Looks like that's from:

https://github.com/KhronosGroup/SPIRV-Tools/blob/e39cabca20f6621b8fdde565516a1de47507930d/source/binary.cpp#L609-L614

The SPIR-V specification also only seems to define a minimum limit of 65,535 UTF-8 encoded characters, so we probably have to truncate at that point unless there's a way to query a higher limit from Vulkan somehow:

image

Each instruction also uses a 16-bit wordcount so I guess there's no real way to bypass a higher upper bound of u16::MAX even if it was technically supported by Vulkan/drivers.

Out of curiosity, why are you trying to pass a large literal there? Is it a generated type name that uses nested generics or something?

@eadwu
Copy link
Author

eadwu commented Mar 15, 2024

It is an autogenerated compute shader, so it just happened when I had a big enough input, also I didn't run into it in --release mode (which I was working on since other dependencies were failing on debug build until recently).

I don't think it is truncated right now since all my tests pass.

This doesn't seem to get hit when I do SPIRV passthrough (using Naga to validate and convert to SPIRV before using SPIRV-Tools crate to validate).

@eadwu
Copy link
Author

eadwu commented Mar 15, 2024

Comes from Vulkan Validation layers. Turns out there's a knob for it WGPU_VALIDATION so I've disabled that, though looks to be disabling all validation now.

Looks like there's now a memory leak? It just stalls and takes more memory at that point unless I add WGPU_DEBUG=0 as well.

It stalls on

libnvidia-glvkspirv.so.550.54.14
libnvidia-eglcore.so.550.54.14
#15 0x00005555562e47bf in wgpu_hal::vulkan::device::{impl#4}::create_compute_pipeline (self=0x55555b2e03a0, desc=0x7ffffffe7bc8)
    at src/vulkan/device.rs:1894
#16 0x0000555555e54280 in wgpu_core::device::resource::Device<wgpu_hal::vulkan::Api>::create_compute_pipeline<wgpu_hal::vulkan::Api> (
    self=0x7ffffffe81f8, desc=0x7ffffffe8a40, implicit_context=..., hub=0x555557567510)
    at /home/framework/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/device/resource.rs:2602
#17 0x00005555560c324e in wgpu_core::global::Global<wgpu_core::identity::IdentityManagerFactory>::device_create_compute_pipeline<wgpu_core::identity::IdentityManagerFactory, wgpu_hal::vulkan::Api> (self=0x555557567430, device_id=..., desc=0x7ffffffe8a40, id_in=(), implicit_pipeline_ids=...)
    at /home/framework/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/device/global.rs:1705
#18 0x0000555555fefe23 in wgpu::backend::wgpu_core::{impl#7}::device_create_compute_pipeline (self=0x555557567430, device=0x7ffffffe8ec8, 
    device_data=0x55555b2dd130, desc=0x7ffffffe9368) at src/backend/wgpu_core.rs:1186
#19 0x0000555556005c34 in wgpu::context::{impl#5}::device_create_compute_pipeline<wgpu::backend::wgpu_core::ContextWgpuCore> (self=0x555557567430, 
    device=0x7fffffff8e48, device_data=..., desc=0x7ffffffe9368) at src/context.rs:2275
#20 0x0000555556010e9b in wgpu::Device::create_compute_pipeline (self=0x7fffffff8e28, desc=0x7ffffffe9368) at src/lib.rs:2494

Don't know if it NVIDIA related since it is the host memory that is leaking.

@teoxoy
Copy link
Member

teoxoy commented Mar 28, 2024

Embedding the source shader was added in gfx-rs/naga#2379.

I think we need to break the source shader into multiple OpStrings and then point OpSourceContinued to them.

cc @wicast

@teoxoy teoxoy added type: bug Something isn't working area: naga back-end Outputs of naga shader conversion naga Shader Translator lang: SPIR-V Vulkan's Shading Language and removed api: vulkan Issues with Vulkan labels Mar 28, 2024
@jimblandy
Copy link
Member

Trying to figure out if this impacts Firefox (not that that's the only important consideration, just for my own sake):

naga/src/back/spv/mod.rs says:

        if cfg!(debug_assertions) {
            flags |= WriterFlags::DEBUG;
        }

The DEBUG flag is what determines whether we include source code in the SPIR-V output. So I think any time Rust code is built with debug assertions, Firefox could be affected by this.

@wicast
Copy link
Contributor

wicast commented Apr 3, 2024

@eadwu could you try this branch first? i will make a pr later.
https://github.com/wicast/wgpu/tree/naga-fix-source-limit

@eadwu
Copy link
Author

eadwu commented Apr 3, 2024

Yep that resolves the issue for me.

@wicast
Copy link
Contributor

wicast commented Apr 5, 2024

Hi @teoxoy,
I have added a new test for large source. But CJK don't work well with rspirv. So I did several other tests. Here is what happening.

  1. CJK will likely be broken with OpSourceContinued, this should be good according to spec(the same thing happen with glslc), which makes the continued string not a vaild utf8.

  2. Both naga and glslc produce this invaild continued string works fine with spirv-dis.
    图片

  3. Renderdoc it self doesn't work well with CJK, even without continued souce. But continued source doesn't affect debugging function.

I think this is a bug in rspirv. Here is my branch: https://github.com/wicast/wgpu/tree/naga-fix-source-limit

@teoxoy
Copy link
Member

teoxoy commented Apr 5, 2024

It could be that it's not valid to chunk the source string like this since the resulting byte slices are not guaranteed to be valid utf-8.

We should probably split the string at valid boundaries. str.floor_char_boundary() sounds like the perfect fn for this but it's not stable yet; maybe we can just copy-paste it until it becomes stable.

@wicast
Copy link
Contributor

wicast commented Apr 5, 2024

It could be that it's not valid to chunk the source string like this since the resulting byte slices are not guaranteed to be valid utf-8.

Split by boundry is better I agree.
But I think the invalid utf8 in OpSourceContinued is still a valid spirv as long as all OpSource* concat together is a valid utf8 string.
That's how glslc does, and spirv-dis disambles it without error, rspirv needs to treat this as well. For the moment, I just need a method to by passrspirv.

maybe we can just copy-paste it until it becomes stable

I don't get it, what is the subject being copied?

@teoxoy
Copy link
Member

teoxoy commented Apr 5, 2024

It could be that it's not valid to chunk the source string like this since the resulting byte slices are not guaranteed to be valid utf-8.

Split by boundry is better I agree. But I think the invalid utf8 in OpSourceContinued is still a valid spirv as long as all OpSource* concat together is a valid utf8 string. That's how glslc does, and spirv-dis disambles it without error, rspirv needs to treat this as well. For the moment, I just need a method to by passrspirv.

I don't think that's the case. The SPIR-V spec says that the argument of OpSourceContinued must be a Literal. And string literals are always UTF-8 encoded.

maybe we can just copy-paste it until it becomes stable

I don't get it, what is the subject being copied?

I was referring to the src of str.floor_char_boundary() since we can't use it yet.

@wicast
Copy link
Contributor

wicast commented Apr 5, 2024

It could be that it's not valid to chunk the source string like this since the resulting byte slices are not guaranteed to be valid utf-8.

Split by boundry is better I agree. But I think the invalid utf8 in OpSourceContinued is still a valid spirv as long as all OpSource* concat together is a valid utf8 string. That's how glslc does, and spirv-dis disambles it without error, rspirv needs to treat this as well. For the moment, I just need a method to by passrspirv.

I don't think that's the case. The SPIR-V spec says that the argument of OpSourceContinued must be a Literal. And string literals are always UTF-8 encoded.

maybe we can just copy-paste it until it becomes stable

I don't get it, what is the subject being copied?

I was referring to the src of str.floor_char_boundary() since we can't use it yet.

good idea, l will do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: naga back-end Outputs of naga shader conversion lang: SPIR-V Vulkan's Shading Language naga Shader Translator type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants