Skip to content

Conversation

@0cc4m
Copy link
Collaborator

@0cc4m 0cc4m commented May 12, 2024

I updated Vulkan for the changes in #7192 and fixed a bug in the soft_max implementation. That allowed me to clean up some code that was only needed for the three input tensor soft_max op.

I also updated and fixed the argsort implementation. Now test-backend-ops fully passes for the Vulkan backend.

@0cc4m 0cc4m requested a review from ggerganov May 12, 2024 07:00
@mofosyne mofosyne added Vulkan Issues specific to the Vulkan backend Review Complexity : High Generally require indepth knowledge of LLMs or GPUs labels May 12, 2024
@Adriankhl
Copy link
Contributor

Adriankhl commented May 12, 2024

Not sure if this is the right place to discuss, I am digging into the issue #7130

Here is the root cause:

Embedding computation always try to first allocate buffer with 0 size.

Because of size += TENSOR_ALIGNMENT, size is always bigger than 0 for cpu backend (not sure if this is the correct behaviour though). So cpu backend can always allocate a buffer successsfully.

https://github.com/ggerganov/llama.cpp/blob/b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2/ggml-backend.c#L625-L631

For vulkan backend, ptr is still nullptr here after ggml_vk_host_malloc if size is 0.

https://github.com/ggerganov/llama.cpp/blob/b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2/ggml-vulkan.cpp#L6031-L6043

And because ggml_vk_host_malloc runs successfully, it doesn't throw an exception, which causes problem later on.

Should there be a null check here to throw an exception? Falling back to CPU buffer actually works despite the warning.

@mofosyne mofosyne added the bugfix fixes an issue or bug label May 12, 2024
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea before merging to run the 2 tests from the #7192 and verify that the output is reasonable

@Adriankhl
Copy link
Contributor

Not sure if this is the right place to discuss, I am digging into the issue #7130

Here is the root cause:

Embedding computation always try to first allocate buffer with 0 size.

Because of size += TENSOR_ALIGNMENT, size is always bigger than 0 for cpu backend (not sure if this is the correct behaviour though). So cpu backend can always allocate a buffer successsfully.

https://github.com/ggerganov/llama.cpp/blob/b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2/ggml-backend.c#L625-L631

For vulkan backend, ptr is still nullptr here after ggml_vk_host_malloc if size is 0.

https://github.com/ggerganov/llama.cpp/blob/b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2/ggml-vulkan.cpp#L6031-L6043

And because ggml_vk_host_malloc runs successfully, it doesn't throw an exception, which causes problem later on.

Should there be a null check here to throw an exception? Falling back to CPU buffer actually works despite the warning.

Nevermind, the issue is much deeper than this. Please ignore it here

@0cc4m 0cc4m merged commit c1b295e into master May 18, 2024
@0cc4m 0cc4m deleted the 0cc4m/soft-max-fix branch May 18, 2024 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix fixes an issue or bug Review Complexity : High Generally require indepth knowledge of LLMs or GPUs Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants