`wgpu_hal` tries to call `glFlushMappedBufferRange` on buffer without setting `GL_MAP_FLUSH_EXPLICIT_BIT` #6137

jimblandy · 2024-08-20T20:52:22Z

Simple programs using the gles backend can get errors like this:

[... ERROR wgpu_hal::gles] GLES: [API/Error] ID 2 : GL_INVALID_OPERATION in glFlushMappedBufferRange(GL_MAP_FLUSH_EXPLICIT_BIT not set)

In GLES 3.2, calling glFlushMappedBufferRange is only allowed if the buffer was previously mapped with the GL_MAP_FLUSH_EXPLICIT flag set. doc But wgpu_hal as driven by wgpu_core violates this rule.

In wgpu_hal, the code in <gles::Device as Device>::create_buffer sets GL_MAP_FLUSH_EXPLICIT_BIT only if BufferUses::MAP_WRITE is set. But when wgpu_core::device::map_buffer is presented with a HostMap::Read mapping of a buffer with uninitialized ranges, it calls wgpu_hal::Device::flush_mapped_ranges, which calls glFlushMappedBufferRange even though the buffer was created only with BufferUses::MAP_READ.

The text was updated successfully, but these errors were encountered:

jimblandy · 2024-08-20T20:54:42Z

Here's a dumb test case that shows the problem.

4 files changed, 180 insertions(+)
tests/tests/bounds_checks/buffer.rs           | 168 ++++++++++++++++++++++++++
tests/tests/bounds_checks/buffer_storage.wgsl |  10 ++
tests/tests/bounds_checks/mod.rs              |   1 +
tests/tests/root.rs                           |   1 +

new file   tests/tests/bounds_checks/buffer.rs
@@ -0,0 +1,168 @@
+use wgpu::{BufferDescriptor, BufferUsages};
+use wgpu_test::{gpu_test, valid, GpuTestConfiguration, TestParameters};
+
+#[gpu_test]
+static BUFFER_STORAGE: GpuTestConfiguration =
+    GpuTestConfiguration::new()
+    .parameters(
+        TestParameters::default()
+            .test_features_limits(),
+    )
+    .run_async(buffer_storage);
+
+#[allow(unused_variables)]
+async fn buffer_storage(ctx: wgpu_test::TestingContext) {
+    // We want to bind a window in the middle of the storage buffer.
+    // Sometimes `min_storage_buffer_offset_alignment` is 256, so the
+    // shortest length that will let us have regions at the start and
+    // end of the buffer that are allegedly not visible to the shader
+    // is 3 * 256.
+    //
+    // But this length is in `u32` elements, so that's 3 * 64.
+    const TOTAL_LEN: usize = 3 * 64;
+    const WINDOW_START: usize = 64;
+    const WINDOW_LEN: usize = 64;
+
+    // Create the buffer for copying compute shader output to the CPU.
+    let readback = ctx.device.create_buffer(&BufferDescriptor {
+        label: Some("readback"),
+        size: u32_bytes(TOTAL_LEN),
+        usage: BufferUsages::MAP_READ | BufferUsages::COPY_DST,
+        mapped_at_creation: false,
+    });
+
+    valid(&ctx.device, || {
+        let label = Some("buffer_storage");
+        let bind_group_layout =
+            ctx.device
+                .create_bind_group_layout(&wgpu::BindGroupLayoutDescriptor {
+                    label,
+                    entries: &[
+                        wgpu::BindGroupLayoutEntry {
+                            binding: 0,
+                            visibility: wgpu::ShaderStages::COMPUTE,
+                            ty: wgpu::BindingType::Buffer {
+                                ty: wgpu::BufferBindingType::Storage { read_only: true },
+                                has_dynamic_offset: false,
+                                min_binding_size: Some(wgpu::BufferSize::new(4).unwrap()),
+                            },
+                            count: None,
+                        },
+                        wgpu::BindGroupLayoutEntry {
+                            binding: 1,
+                            visibility: wgpu::ShaderStages::COMPUTE,
+                            ty: wgpu::BindingType::Buffer {
+                                ty: wgpu::BufferBindingType::Storage { read_only: false },
+                                has_dynamic_offset: false,
+                                min_binding_size: Some(wgpu::BufferSize::new(4).unwrap()),
+                            },
+                            count: None,
+                        },
+                    ],
+                });
+
+        let pipeline_layout = ctx
+            .device
+            .create_pipeline_layout(&wgpu::PipelineLayoutDescriptor {
+                label,
+                bind_group_layouts: &[&bind_group_layout],
+                push_constant_ranges: &[],
+            });
+
+        let module = ctx
+            .device
+            .create_shader_module(wgpu::include_wgsl!("buffer_storage.wgsl"));
+
+        let pipeline = ctx
+            .device
+            .create_compute_pipeline(&wgpu::ComputePipelineDescriptor {
+                label,
+                layout: Some(&pipeline_layout),
+                module: &module,
+                entry_point: Some("push_boundaries"),
+                compilation_options: <_>::default(),
+                cache: None,
+            });
+
+        // Create the buffer the shader reads from. We'll only bind
+        // the middle third of this.
+        let input = ctx.device.create_buffer(&BufferDescriptor {
+            label: Some("input"),
+            size: u32_bytes(TOTAL_LEN),
+            usage: BufferUsages::STORAGE,
+            mapped_at_creation: true,
+        });
+
+        let mut view = input.slice(..).get_mapped_range_mut();
+        let words: &mut [u32] = bytemuck::cast_slice_mut(&mut view);
+        words[..].fill(42);
+        words[WINDOW_START..WINDOW_START + WINDOW_LEN].fill(0);
+        drop(view);
+        input.unmap();
+
+        // Create the buffer to which the compute shader copies the
+        // bound portion of `input`.
+        let output = ctx.device.create_buffer(&BufferDescriptor {
+            label: Some("output"),
+            size: u32_bytes(WINDOW_LEN),
+            usage: BufferUsages::STORAGE | BufferUsages::COPY_SRC,
+            mapped_at_creation: false,
+        });
+
+        let bind_group = ctx.device.create_bind_group(&wgpu::BindGroupDescriptor {
+            label,
+            layout: &bind_group_layout,
+            entries: &[
+                wgpu::BindGroupEntry {
+                    binding: 0,
+                    resource: wgpu::BindingResource::Buffer(wgpu::BufferBinding {
+                        buffer: &input,
+                        offset: u32_bytes(WINDOW_START),
+                        size: Some(wgpu::BufferSize::new(u32_bytes(WINDOW_LEN)).unwrap()),
+                    }),
+                },
+                wgpu::BindGroupEntry {
+                    binding: 1,
+                    resource: wgpu::BindingResource::Buffer(wgpu::BufferBinding {
+                        buffer: &output,
+                        offset: 0,
+                        size: None,
+                    }),
+                },
+            ],
+        });
+
+        let mut encoder = ctx
+            .device
+            .create_command_encoder(&wgpu::CommandEncoderDescriptor { label });
+        {
+            let mut pass = encoder.begin_compute_pass(&wgpu::ComputePassDescriptor {
+                label,
+                timestamp_writes: None,
+            });
+
+            pass.set_pipeline(&pipeline);
+            pass.set_bind_group(0, &bind_group, &[]);
+            pass.dispatch_workgroups(1, 1, 1);
+        }
+        encoder.copy_buffer_to_buffer(&output, 0, &readback, 0, u32_bytes(WINDOW_LEN));
+        ctx.queue.submit([encoder.finish()]);
+        readback
+            .slice(..)
+            .map_async(wgpu::MapMode::Read, Result::unwrap);
+
+        log::info!("I love you");
+    });
+
+    ctx.device.poll(wgpu::Maintain::Wait);
+    /*
+    let view = readback.slice(..).get_mapped_range();
+    let words: &[u32] = bytemuck::cast_slice(&view);
+    log::info!("words: {words:?}");
+    */
+}
+
+/// The size of `n` `u32` values, in bytes.
+const fn u32_bytes(n: usize) -> wgt::BufferAddress {
+    (n * std::mem::size_of::<u32>())  as wgt::BufferAddress
+}
new file   tests/tests/bounds_checks/buffer_storage.wgsl
@@ -0,0 +1,10 @@
+@group(0) @binding(0)
+var<storage, read> input: array<u32>;
+
+@group(0) @binding(1)
+var<storage, read_write> output: array<u32>;
+
+@compute @workgroup_size(64)
+fn push_boundaries(@builtin(local_invocation_index) index: u32) {
+  output[index] = input[index];
+}
new file   tests/tests/bounds_checks/mod.rs
@@ -0,0 +1 @@
+mod buffer;
modified   tests/tests/root.rs
@@ -11,6 +11,7 @@ mod regression {
 mod bgra8unorm_storage;
 mod bind_group_layout_dedup;
 mod bind_groups;
+mod bounds_checks;
 mod buffer;
 mod buffer_copy;
 mod buffer_usages;

jimblandy · 2024-08-26T17:29:56Z

We believe the fix is to enable the GL_MAP_FLUSH_EXPLICIT_BIT flag unconditionally.

teoxoy · 2024-08-27T09:34:52Z

This is blocking #5714 since one of the tests added in that PR is running into this.

jimblandy assigned teoxoy Aug 26, 2024

teoxoy mentioned this issue Aug 28, 2024

[wgpu-core] when mapping buffers for reading, mark buffers as initialized only when they have MAP_WRITE usage #6178

Merged

ErichDonGubler closed this as completed in #6178 Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`wgpu_hal` tries to call `glFlushMappedBufferRange` on buffer without setting `GL_MAP_FLUSH_EXPLICIT_BIT` #6137

`wgpu_hal` tries to call `glFlushMappedBufferRange` on buffer without setting `GL_MAP_FLUSH_EXPLICIT_BIT` #6137

jimblandy commented Aug 20, 2024

jimblandy commented Aug 20, 2024

jimblandy commented Aug 26, 2024

teoxoy commented Aug 27, 2024

wgpu_hal tries to call glFlushMappedBufferRange on buffer without setting GL_MAP_FLUSH_EXPLICIT_BIT #6137

wgpu_hal tries to call glFlushMappedBufferRange on buffer without setting GL_MAP_FLUSH_EXPLICIT_BIT #6137

Comments

jimblandy commented Aug 20, 2024

jimblandy commented Aug 20, 2024

jimblandy commented Aug 26, 2024

teoxoy commented Aug 27, 2024

`wgpu_hal` tries to call `glFlushMappedBufferRange` on buffer without setting `GL_MAP_FLUSH_EXPLICIT_BIT` #6137

`wgpu_hal` tries to call `glFlushMappedBufferRange` on buffer without setting `GL_MAP_FLUSH_EXPLICIT_BIT` #6137