-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify vsetvl intrinsics. #6
Comments
I have no preferences, but one comment: if it's part of the name, then it's obvious it has to be known at compile time - neither SEW not LMUL can be parametrized. If you put them as parameters, then someone (probably me at some point ;-) ) will find a use case where either one (or both) are parametrized and potentially not known at compile time. Which may or may not be legal and/or supported by an implementation... This is already an open question for VL, as I have use case where I want/need to play with VL (upper bounded by the return value of vsetvl()) and it's not clear to me in the specifications what is legit and what isn't - i.e., what restrictions can an implementation impose... |
The V-extension provides two instructions: I propose the intrinsics library offers both forms. It's tempting to simulate the non-immediate form using the immediate form invoked within a big Clearly, the Having SEW and LMUL as arguments would involve a big Some (subjective) advantages of having them as arguments:
Some (subjective) advantages of having them in the intrinsic name:
I vote for the latter approach, based mainly on my belief that very few users will ever use the |
Actually, I can think of cases where an end-user will use vsetvl() over the immediate form, especially if some implementations are tolerant to arbitrary VL. The obvious two are peel loops & tail loops, where it's a lot more natural in V to use a smaller VL rather than a larger VL + masking... |
If I understand correctly that mean the dynamic LMUL and SEW is what you want under some situation, however I think it hard to write code under current type system. e.g. int32_t *a;
int32_t *b;
int sew = 32;
int lmul = large_lmul ? 8 : 1;
for (; vl = vsetvl(sew, lmul, n); n -= vl) {
va = vload (a); // Type for va ?
...
} Of cause it can be resolve by write more code, but I think it hard to write and maintain? int32_t *a;
int32_t *b;
int sew = 32;
int lmul = large_lmul ? 8 : 1;
for (; vl = vsetvl(sew, lmul, n); n -= vl) {
if (large_lmul) {
vint32m8_t va_32m8 = vload_i32m8 (a);
} else {
vint32m1_t va_32m1 = vload_i32m1 (a);
}
...
} |
To be honest, SEW and LMUL are much less likely to change at runtime, except for the one case for SEW mentioned below. SEW would be conceivable (e.g. genericity between FP32 and FP64), but you still have to change every intrinsics/mnemonic name - and so it will be known at compile-time anyway in practice (i.e. macro-based generic implementation such as https://github.com/rdolbeau/fftw3/tree/riscv-v). SEW is likely to only change when reinterpreting data, e.g. between 64 and 32 bits integer as done in my Chacha20 implementation - and then the VL changes conversely so that SEW*VL remains constant (we don't change the bits or how many there are in a register, we just change how we interpret them). LMUL should basically almost always be the rounded-down-to-a-legit-value of (number of available registers)/(number of required registers), if there's enough parallelism in the algorithm. That's unlikely to change for a given implementation of an algorithm. |
I absolutely agree users will want to dynamically change VL. Both My earlier comment only concerned SEW and LMUL: Since these are part of the (static) type system in the current implementation, it is tricky to write code that changes them dynamically. Kito gave an artificial example, but I don't see any real-world value to it. @rdolbeau is it possible for you to extract a minimal example from your "Chacha20" code, demonstrating this use-case? The type-punning I'm concerned you're suggesting can easily lead to non-portable code, since it can expose differences in the architectural parameter SLEN. If it does prove useful in applications to type-pun on SEW and LMUL, I would prefer adding a |
Agree. We design the type system and intrinsics with static SEW and LMUL. It seems not useful to keep variable form for vsetvl intrinsics. For example, if we have run-time variables sew and lmul,
Eventually, the compiler will convert the intrinsics into
|
I write a draft RFC for it. It follows the naming rules of other intrinsics, i.e., encoding SEW and LMUL into intrinsic names. https://github.com/sifive/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#configuration-setting Please help me to review it. If we all agree to provide static version only, we could close the issue. |
Treat SEW and LMUL as parameters to vsetvl, instead of a bunch of vsetvl intrinsics as the following.
size_t vsetvl_8m1 (size_t avl);
size_t vsetvl_8m2 (size_t avl);
size_t vsetvl_8m4 (size_t avl);
size_t vsetvl_8m8 (size_t avl);
size_t vsetvl_16m1 (size_t avl);
size_t vsetvl_16m2 (size_t avl);
size_t vsetvl_16m4 (size_t avl);
size_t vsetvl_16m8 (size_t avl);
size_t vsetvl_32m1 (size_t avl);
size_t vsetvl_32m2 (size_t avl);
size_t vsetvl_32m4 (size_t avl);
size_t vsetvl_32m8 (size_t avl);
size_t vsetvl_64m1 (size_t avl);
size_t vsetvl_64m2 (size_t avl);
size_t vsetvl_64m4 (size_t avl);
size_t vsetvl_64m8 (size_t avl);
The text was updated successfully, but these errors were encountered: