Skip to content

Conversation

@MollySophia
Copy link
Collaborator

Make sure to read the contributing guidelines before submitting a PR

It seems that some rwkv tensors are made FP16 rather than FP32 after specific commits. However, ggml_cuda_op_bin_bcast requires src1->type == FP32. As a result, newly converted RWKV models cannot run with cuda, while existing files aren't affected. This PR fixes the issue above.

Also add LLAMA_EXAMPLE_PERPLEXITY in the examples list of parameter --no-context-shift so that models without context shift support can do llama-perplexity again.

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
@github-actions github-actions bot added the python python script changes label Dec 20, 2024
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely caused by this change where I removed the squeeze() during conversion:

https://github.com/ggerganov/llama.cpp/blob/0bf2d10c5514ff61b99897a4a5054f846e384e1e/convert_hf_to_gguf.py#L298-L301

@MollySophia
Copy link
Collaborator Author

This is likely caused by this change where I removed the squeeze() during conversion:

https://github.com/ggerganov/llama.cpp/blob/0bf2d10c5514ff61b99897a4a5054f846e384e1e/convert_hf_to_gguf.py#L298-L301

I see. So there's another way to fix this: squeeze them in rwkv6's modify_tensors(), rather than adding them to the F32 list?
Should both solve the issue. I wonder which way is better

@ggerganov
Copy link
Member

Don't think there is any significant advantage one way or the other. Maybe squeezing in modify_tensors is a bit more localized.

@MollySophia
Copy link
Collaborator Author

Don't think there is any significant advantage one way or the other. Maybe squeezing in modify_tensors is a bit more localized.

Yeah that's what I meant. Let me change to use the more localized way then.

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
@ggerganov
Copy link
Member

This is good to merge?

@MollySophia
Copy link
Collaborator Author

This is good to merge?

Yes. Thanks a lot for your time!

@ggerganov ggerganov merged commit 0a11f8b into ggml-org:master Dec 20, 2024
50 of 51 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
* Enable --no-context-shift for llama-perplexity example

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV 6: Fix error in ggml_cuda_op_bin_bcast

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants