[KDA] Grouped Value Attention (GVA) support

## Description

Inspired from https://github.com/fla-org/flash-linear-attention/pull/833

- Support HV > H (num_v_heads > num_qk_heads) in KDA, following the gated_delta_rule GVA pattern
- Add corresponding test and benchmark config
- Support for both sm90 and sm100