Skip to content

Conversation

@ixgbe
Copy link
Contributor

@ixgbe ixgbe commented Dec 3, 2025

This PR adds RISC-V Vector (RVV) extension support for the RWKV WKV6 operation, enabling vectorized computation on RISC-V platforms.

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
@ixgbe ixgbe requested a review from ggerganov as a code owner December 3, 2025 06:17
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 3, 2025
@CISC
Copy link
Collaborator

CISC commented Dec 3, 2025

Please note that ubuntu-cpu-cmake(-rpc)-riscv64-native will fail test-tokenizers-ggml-vocabs until git-lfs is fixed on the runners, all other tests will run to completion though.

@ggerganov
Copy link
Member

I'm thinking that we should deprecate these ops since they are very model-specific:

llama.cpp/ggml/include/ggml.h

Lines 2389 to 2417 in 37adc9c

GGML_API struct ggml_tensor * ggml_rwkv_wkv6(
struct ggml_context * ctx,
struct ggml_tensor * k,
struct ggml_tensor * v,
struct ggml_tensor * r,
struct ggml_tensor * tf,
struct ggml_tensor * td,
struct ggml_tensor * state);
GGML_API struct ggml_tensor * ggml_gated_linear_attn(
struct ggml_context * ctx,
struct ggml_tensor * k,
struct ggml_tensor * v,
struct ggml_tensor * q,
struct ggml_tensor * g,
struct ggml_tensor * state,
float scale);
GGML_API struct ggml_tensor * ggml_rwkv_wkv7(
struct ggml_context * ctx,
struct ggml_tensor * r,
struct ggml_tensor * w,
struct ggml_tensor * k,
struct ggml_tensor * v,
struct ggml_tensor * a,
struct ggml_tensor * b,
struct ggml_tensor * state);

Probably not worth investing much effort in optimizing. Rather, look to implement them as combination of other fundamental ops.

@ixgbe
Copy link
Contributor Author

ixgbe commented Dec 3, 2025

I'm thinking that we should deprecate these ops since they are very model-specific:

llama.cpp/ggml/include/ggml.h

Lines 2389 to 2417 in 37adc9c

GGML_API struct ggml_tensor * ggml_rwkv_wkv6(
struct ggml_context * ctx,
struct ggml_tensor * k,
struct ggml_tensor * v,
struct ggml_tensor * r,
struct ggml_tensor * tf,
struct ggml_tensor * td,
struct ggml_tensor * state);
GGML_API struct ggml_tensor * ggml_gated_linear_attn(
struct ggml_context * ctx,
struct ggml_tensor * k,
struct ggml_tensor * v,
struct ggml_tensor * q,
struct ggml_tensor * g,
struct ggml_tensor * state,
float scale);
GGML_API struct ggml_tensor * ggml_rwkv_wkv7(
struct ggml_context * ctx,
struct ggml_tensor * r,
struct ggml_tensor * w,
struct ggml_tensor * k,
struct ggml_tensor * v,
struct ggml_tensor * a,
struct ggml_tensor * b,
struct ggml_tensor * state);

Probably not worth investing much effort in optimizing. Rather, look to implement them as combination of other fundamental ops.

Wanted to check in on this PR. Should I:

  • Wait for further discussion on the WKV op deprecation?
  • Make any changes to the current implementation?

Happy to follow whatever direction works best for the project. Thanks!

@ggerganov
Copy link
Member

The best way is to try to implement these operations with other ggml ops. If we can do that, then we can replace the current implementations with the composite ones and make the deprecation process smoother. The main question is if they can be expressed as composite ops in a meaningful way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants