vulkan: add cpy bf16 -> f32 pipelines#22677
vulkan: add cpy bf16 -> f32 pipelines#22677ServeurpersoCom wants to merge 3 commits intoggml-org:masterfrom
Conversation
|
Please add a test to test-backend-ops that would reproduce the original failure without this change. Otherwise, it looks good to me. |
Add explicit cpy test cases for BF16 <-> F32 in both directions. Address review feedback from @jeffbolznv
|
It looks like there was some existing test coverage that was correctly reporting "not supported": How did you run into the assert? Was it through some other op that expands bf16 to f32? Or were you not running through the graph API? |
You're right, I'm dropping the redundant test commit. The assert came from a non-CPY path: ggml_vk_get_cpy_pipeline is called directly inside several matmul variants to materialize a contiguous copy before the matmul runs, and none of those callsites consult supports_op since the surface op is MUL_MAT (Victor214's case is a BF16 LoRA hitting the matmul non-contig path with src0->type == BF16). |
This reverts commit fee98e9.
|
I check my LoRA/LoKr merge code... https://github.com/ServeurpersoCom/acestep.cpp/blob/master/src/adapter-merge.h |
|
If we're missing test coverage for matmul with noncontig sources, that would be good to add. |
I check this |
|
Ah, I understand! Actually, there are 3 relevant states:
With my patch I can confirm that LoRA loads correctly and the type is now supported ("OK"). Before the patch, "not supported" was the correct report. |
|
This is my final word : With PR :No PR : |
|
"Not supported" wouldn't have triggered the assertion failure. I think you were on the right track that there's some matmul permutation case that wasn't handled. But if it's hard to find, ultimately I can live without it (we do have coverage for CPY). |
I'm still looking at it! But I'll create another follow up PR to extend coverage. The actual error in my logs is "Missing CPY op for types: bf16 f32". Original log (no assert) : |
Overview
Add the missing reverse direction "cpy bf16 -> f32" to the Vulkan backend. Currently only "cpy f32 -> bf16" is supported, which causes runtime aborts when models or LoRAs stored in BF16 need to be transferred back to F32 buffers
(typical case: BF16-trained LoRA merge at runtime, yes, I'm merging with the GPU, it's much faster: same code work on CUDA)
Downstream issue (Successfully tested by me, awaiting user feedback): ServeurpersoCom/acestep.cpp#69
Testing
With PR :
No PR :
Requirements