GPU embedding lookup writes garbage when invoked from HybridForwardPass

## Summary

The `EmbedLookup` / `EmbedLookupQ4K` Vulkan compute shaders produce garbage output when called from `HybridForwardPass.Forward`, but the same shaders produce correct output when called from `GpuForwardPass.Forward`.

## Evidence

With instrumentation that downloads `_gpuHidden` immediately after the embedding dispatch:

```
[DBG] after-embed: nans=7 big=815 h[0..4]=-4.438109E-20,2.464776E-36,-1.299449E+08,1.798577E+24
```

Out of 2048 floats, 7 are NaN and 815 have absolute value > 1e6 - clearly not dequantized embedding values (which should be roughly in [-0.1, 0.1]).

A Clear before the embed lookup *does* zero the buffer (verified), and a Clear *after* the embed lookup also zeros it - so both shaders execute. The embed shader runs but writes wrong values. The same shader running from `GpuForwardPass` produces correct values for the same model.

## What was ruled out

- Not a transfer/compute barrier issue (still happens with `RecordComputeCopy` everywhere)
- Not a buffer-aliasing bug (`_buffers` dictionary entries verified unique)
- Not an upload-ordering bug (moved embedding upload to after all layer uploads - same garbage)
- Not specific to the Q4_K shader (`EmbedLookup` produces identical garbage)
- Not a descriptor-set reuse issue (`_embedLookupQ4KPipeline._reusableDs` is updated only once per Forward; embedding is the only call)

## Workaround in place

`HybridForwardPass.ShouldKeepFixedWeightsOnCpu` now always returns `true`, forcing CPU dequantization of the single embedding row per token. Cost is negligible (one row vs. all the layer compute).

## TODO

Find the actual cause. Suspect something about descriptor-pool / pipeline state that differs between `HybridForwardPass` and `GpuForwardPass` constructor sequences.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU embedding lookup writes garbage when invoked from HybridForwardPass #3

Summary

Evidence

What was ruled out

Workaround in place

TODO

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

GPU embedding lookup writes garbage when invoked from HybridForwardPass #3

Description

Summary

Evidence

What was ruled out

Workaround in place

TODO

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions