feat: Full SPIR-V interpreter — compute + particles + debugger + 84% coverage#172
Merged
feat: Full SPIR-V interpreter — compute + particles + debugger + 84% coverage#172
Conversation
…r support Complete the uniform/storage buffer path by implementing missing opcodes: OpVectorTimesScalar, OpDot, OpMatrixTimesVector, OpMatrixTimesScalar, OpMatrixTimesMatrix, OpTranspose, OpVectorShuffle, OpCopyObject. Add integer division (SDiv, UDiv, SMod, UMod, SRem), float mod (FMod, FRem), signed negate (SNegate), type conversions (ConvertFToS, SConvert, UConvert, FConvert), all comparison ops (unsigned and signed), logical ops, bitwise ops, shift ops, and OpUndef. Fixes pre-existing TestSPIRVUniformBufferMultiMember failure (VectorTimesScalar). Tests: 12 new, 50.3% coverage on shader/ package.
Implement OpSampledImage, OpImageSampleImplicitLod, OpImageSampleExplicitLod, OpImageFetch, and OpImageQuerySize. Add SampledImageValue type for combining texture and sampler references. Support nearest-neighbor and bilinear (4-tap) filtering with three wrap modes (repeat, clamp-to-edge, mirrored repeat). Parse OpTypeImage, OpTypeSampler, OpTypeSampledImage in module parser. Tests: 10 new (including full SPIR-V integration test with textureSample), 53.9% coverage on shader/ package.
Implement OpExtInst dispatch for the GLSL.std.450 extended instruction set. Covers 30+ intrinsics: trig (sin/cos/tan/asin/acos/atan/atan2), exponential (pow/exp/log/exp2/log2/sqrt/inverseSqrt), rounding (round/floor/ceil/fract/trunc), min/max/clamp, interpolation (mix/step/smoothstep), geometric (length/distance/normalize/ cross/reflect), and integer ops (sabs/ssign/smin/smax/umin/umax). Tests: 12 new (23 unary + 6 binary GLSL via full SPIR-V integration, vector length, normalize, cross, reflect, integer ops). 54.7% coverage on shader/ package.
…calls, switch) Implement structured control flow for the SPIR-V interpreter: - OpPhi with predecessor block tracking for SSA resolution - OpLoopMerge + backward branch detection for loop execution - OpSwitch for multi-way branching - OpFunctionCall with push/pop execution frames (child interpreter) - OpKill for fragment shader discard - Max iteration guard (100K) to prevent infinite loops - Max call depth guard (64) to prevent stack overflow Tests: 5 new (loop sum 1..N with phi, function call, switch dispatch, infinite loop guard, phi predecessor resolution). 60.0% coverage.
Implement compute shader execution for the software backend: - ExecuteCompute: single invocation execution with compute builtins - DispatchCompute: full workgroup dispatch (sequential, single-threaded) - BufferPointer: direct read/write to raw storage buffer bytes via access chains, fixing the copy-on-read problem for storage buffers - Compute builtins: GlobalInvocationId, LocalInvocationId, WorkgroupId, NumWorkgroups, WorkgroupSize, LocalInvocationIndex - Atomic ops: OpAtomicIAdd, OpAtomicISub, OpAtomicExchange, OpAtomicCompareExchange, OpAtomicIIncrement, OpAtomicIDecrement, OpAtomicSMin/UMin/SMax/UMax, OpAtomicLoad, OpAtomicStore - OpControlBarrier/OpMemoryBarrier: no-op in single-threaded interpreter - Workgroup shared memory allocation per workgroup - GetWorkgroupSize from OpExecutionMode LocalSize Tests: 9 new (array sum compute, builtins, multi-workgroup dispatch, atomic ops, CAS, wrong model error, workgroup size query, uvec3 conversion, triangle regression). 60.8% coverage on shader/ package.
Add comprehensive coverage tests exercising all type conversion helpers, composite operations, buffer serialization, and previously uncovered interpreter opcodes (FMod, FRem, SNegate, SDiv, UDiv, SMod, UMod, SRem, all bitwise/shift/logical ops, Select, all comparison variants). Parse OpConstantTrue and OpConstantFalse in the module parser. Fix gofmt issues in opcodes.go and uniform_test.go. Tests: 20 new, 73.1% coverage on shader/ package.
Add debugging capabilities to the SPIR-V interpreter: breakpoints, single-stepping, variable watches, and JSON trace output. Zero overhead when debug is nil (benchmarks confirm identical allocs/op). - DebugContext struct with OnInstruction, OnBreakpoint, OnError callbacks - ExecuteWithDebug method (Execute/ExecuteWithContext delegate to it) - JSON-lines trace output via pre-allocated traceEntry struct - Watch variables trigger stepping when SSA values change - 18 tests covering breakpoints, trace, watch, abort, step, zero-overhead - 3 benchmarks proving nil-debug path matches Execute baseline - Fix traceEntry variable shadowing its own type name in run()
…revention tests) Atomics (min/max/load/store), math edge cases (smoothstep div-by-zero, pow negative base, atan2 quadrants), workgroup memory allocation, vector shuffle variants, texture sampling boundaries, division by zero, comparison opcodes, OpKill/OpUnreachable, type conversion identity ops. Each test targets a specific regression class, not coverage padding.
Wire SPIR-V interpreter compute execution into software HAL: - CreateComputePipeline with ShaderModule + entryPoint - ComputePassEncoder.Dispatch executes SPIR-V via interpreter - OpExecutionMode LocalSize parsing for workgroup size - Naga integration tests: WGSL → SPIR-V → compute → verify output - Software-test example: end-to-end compute on software backend
- unconvert: remove unnecessary type conversions (Uint32/Int32/Float32 are type aliases, not distinct types) - revive: rename BuiltInGlobalInvocationId -> BuiltInGlobalInvocationID, BuiltInLocalInvocationId -> BuiltInLocalInvocationID, BuiltInWorkgroupId -> BuiltInWorkgroupID, BuiltInSampleId -> BuiltInSampleID - unparam: remove always-zero lo parameter from clampInt - unused: remove unused buildLoopSumSPIRV function - nestif: add nolint:nestif with justification for debug context checks - maintidx: add nolint:maintidx on executeGLSLExtInst - goconst: use existing constants (glslExtSetName, opNameLoad, etc.) in debug.go and interpreter.go - staticcheck: fix ineffective break in bufferAccessChain default case - gocritic: use switch instead of if-else in vectorShuffle, coord -= expr and coord++ instead of coord = coord - expr - whitespace: remove leading newline in DispatchCompute
Slice-based values (replace map[uint32]Value), Pointer pool (32 pre-alloc), optimized compositeConstruct fast path for vectors. Vertex: 18→8 allocs (-56%), 2848→712 B (-75%), 3400→1100 ns (-68%). Fragment: 11→4 allocs (-64%), 1096→384 B (-65%), 1375→490 ns (-64%).
…ncing, triangle strip Remove the resource guard that rejected shaders using Uniform/UniformConstant/ StorageBuffer variables. Populate the interpreter's ExecutionContext with bind group resources (buffers, textures, samplers) following the compute Dispatch pattern. New capabilities: - Instanced rendering: Draw(vertexCount, instanceCount, ...) loops over instances, advancing instance-rate vertex buffers per instance - TriangleStrip topology: converts strip vertices to triangle list with correct winding order alternation - SPIR-V vertex shader with mixed inputs: @Builtin(vertex_index), @Builtin(instance_index), and @location(N) from vertex buffers - Per-vertex output attributes: vertex shader @location outputs are collected and interpolated via DrawTrianglesInterpolated - buildExecutionContext wires bind group buffers/textures/samplers into the shader ExecutionContext for both render paths Target: GOGPU_GRAPHICS_API=software gogpu/examples/particles renders visible animated particles (compute + instanced render pipeline).
Three bugs in the SPIR-V interpreter caused black screen for any shader
that modifies struct members in local variables (e.g. p.pos += vel * dt):
1. OpAccessChain on function-local composites created disconnected
Pointer copies — OpStore to p.pos discarded the write instead of
updating the parent struct. Fixed by introducing SubPointer, which
maintains a reference to the root Pointer and writes back through
the parent on OpStore.
2. typeByteSize for structs used a naive i*4 offset fallback instead of
MemberDecorate Offset decorations. This produced wrong RuntimeArray
element stride (12 instead of 16 for Particle{vec2,vec2}), causing
storage buffer writes to corrupt adjacent elements.
3. zeroValueForVar did not handle TypeStruct — function-local struct
variables were initialized to Uint32(0) instead of a zero-initialized
Array of member values, so SubPointer stores into the struct had no
composite to navigate.
The vertex draw path (executeVertexDraw) was writing raster pipeline output in RGBA order to a BGRA framebuffer, causing R/B channel swap (particles rendered blue instead of orange). The SPIR-V draw path already had the correct BGRA swap but was unconditional. Changes: - Extract writeRasterToTarget() — single point for RGBA-to-framebuffer conversion, conditionally swaps R/B based on target texture format - Add isBGRA() helper for format detection - Add raster.FragmentShaderFunc type and DrawTrianglesWithFragmentShader method for per-pixel SPIR-V fragment shader execution - Add buildFragmentShaderFunc() that creates a closure executing the SPIR-V fragment shader with interpolated @location inputs per pixel - Add Module.GetTypeComponentCount() to resolve variable type widths for correct attribute slicing in the fragment shader dispatch - Wire up fragment shader path in both executeSPIRVDraw and executeVertexDraw when the fragment shader has @location inputs
…ixes CHANGELOG: document all 7 phases, compute HAL, per-pixel fragment, instancing, performance optimization, 3 struct bugs fixed. ARCHITECTURE: software backend description updated for SPIR-V scope. software-test: errcheck + nolint for linear test flow.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full SPIR-V interpreter for the software backend. Executes vertex, fragment, and compute shaders on CPU — no GPU required. First Pure Go SPIR-V interpreter with shader debugging.
Features (7 phases)
Integration
CreateComputePipeline+ComputePassEncoder.DispatchDrawTrianglesWithFragmentShaderin raster pipelinePerformance
Verified
gogpu/trianglegogpu/particles(4096, compute+render)wgpu/software-test(compute)Test plan
go test ./...— all pass (370+ tests in shader/)golangci-lint— 0 issues on Windows, Linux, macOSgofmt— clean