Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector tuple type #17

Closed
kito-cheng opened this issue May 6, 2020 · 11 comments
Closed

Vector tuple type #17

kito-cheng opened this issue May 6, 2020 · 11 comments
Labels
Revisit after v1.0 Features or problems we will revisit after the v1.0 release

Comments

@kito-cheng
Copy link
Collaborator

There is several possible to implement vector tuple type:

  1. Define vector tuple type as primitive type
  2. Define vector tuple type by struct/aggregate
  3. Define vector tuple type by array
  4. Define vector tuple type as primitive type but provide curly braces initialization only.
  5. Define vector tuple type as primitive type but provide subscribe operator and curly braces initialization.

The advantage of 2, 3, 4 and 5 is it could provide syntax sugar to access element in the vector tuple type instead of intrinsic function call:

/* ------ Primitive style ------ */
vint32m2x3_t vt;
vint32m2_t va

vint32m2x3_t vt2 = vcreate_i32m2x3(va, va, va); // Creation.
vt = vset_i32m2x3(vt, 0, va); // Insertion.
va = vget_i32m2x3(vt, 0); // Extraction.

/* ------ Array style ------ */
typedef vint32m2_t vint32m2x3_t[3];
vint32m2x3_t vt;
vint32m2_t va

vint32m2x3_t vt2 = {va, va, va}; // Creation.
vt[0] = va; // Insertion.
va = vt[0]; // Extraction.

/* ------ Struct style ------ */
typedef struct {
  vint32m2_t x;
  vint32m2_t y;
  vint32m2_t z;
} vint32m2x3_t;
vint32m2x3_t vt;
vint32m2_t va

vint32m2x3_t vt2 = {va, va, va}; // Creation.
vt.x = va; // Insertion.
va = vt.y; // Extraction.

Currently SVE's GCC implementation is 4 and disallow declare array and struct with scalable vector type.

@rdolbeau
Copy link
Collaborator

rdolbeau commented May 6, 2020

'native', pre-defined tuple simply needs to exist (for things like Zvlsseg, etc.) and have accessors so they can be (de)constructed; the ability to access/update with either struct-like or array-like syntax falls under the "good-to-have" from my point-of-view - I could live without that if it's too difficult to implement. If it's doable and you want an opinion between the options, mine is 'whichever is easier to implement' :-) (I'm guessing array).

Used-defined arrays with vector element would be good for some algorithms (e.g. storing locally a copy of data for some block-based algorithms like they have in video processing). Arm might consider that for SVE - I've put a request for it and they didn't say 'no' straight away ;-)

User-defined structure with vector element would be nice, but maybe too difficult to implement (alignment rules are going to be hell...) and not worth the effort. I don't think Arm will do that for SVE.

In your examples, when you write vint32m2x3_t, we agree that means 6 architected registers ? (m2 means LMUL=2 so 2, the x3 is a tuple of 3, so 6 in all), and that the data- layout will be SLEN-dependent because LMUL>1? Whereas vfloat64m1x3_t would be just 3 registers with a implementation-independent data layout?

@kito-cheng
Copy link
Collaborator Author

@rdolbeau thanks your feedback, and yes, I am seeking feedback between those options.

Honestly I didn't have idea about the implementation cost amount those options yet. So personally I am not rush to decide which approach other than first option, just collect more feedback at this stage.

In your examples, when you write vint32m2x3_t, we agree that means 6 architected registers ? (m2 means LMUL=2 so 2, the x3 is a tuple of 3, so 6 in all), and that the data- layout will be SLEN-dependent because LMUL>1?  Whereas vfloat64m1x3_t would be just 3 registers with a implementation-independent data layout?

The data layout of vector tuple type same as the vector type version but repeat NF times, e.g. vint32m2x3_t has same data layout as vint32m2_t but repeat 3 times, and with extra register allocation constraint that need consecutive registers.

Conceptually vint32m2x3_t is equivalent to vint32m2_t[3].

So yes, vint32m2x3_t is SLEN-dependent if LMUL > 1, and vfloat64m1x3_t is implementation-independent data layout.

@rofirrim
Copy link
Collaborator

I think it might be slightly better to use structs instead of arrays.

C and C++ don't have values of array type and as such they can't be returned from a function call.

That would be the case of a load from Zvlsseg, in which a tuple of registers vfloat64m1x3_t could be returned from a call to a hypothetical vlseg_v_f64m1x3.

vfloat64m1x3_t vlseg3e_v_f64m1x3(const double *base);

Arguments don't need to have structs (we could always flatten them) but for consistency I'd expect the store look like this.

void vsseg3e_v_f64m1x3(const double *base, vfloat64m1x3_t);

In that sense that vfloat64m1x3_t would behave like a struct with fields (say) v0, v1, v2.

vfloat64m1x3_t vt;
vt.v0 = ...;
... = vt.v2

If we focus only on intrinsic functionality, it is unclear to me we need to allow users defining their own structs or array types with RVV vectors in them.

So my stance now would be like @kito-cheng 1 (a primitive type) above plus as much behaviour of 2 that makes sense for it.

In that sense I'd be inclined to do something like Arm's ACLE for SVE ( https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_00_en.pdf ).

Note that in general (page 13)

Members of unions, structures and classes cannot have sizeless type.

sizeless is Arm's term for we don't necessarily know the size of the object at compile time.

But then in page 14

Each type svBASExN_t is sizeless and contains a sequence of N svBASE_ts. The individual vectors are members with names v0, v1, and so on. For example, svfloat64x4_t contains four svfloat64_t vectors with the names v0, v1, v2 and v3.

Nothing seems to prevent making those types svBASExN_t primitive. Arm's implementation in their Arm Compiler for HPC exposes that detail in the headers via a __sizeless_struct syntax but this seems an implementation detail to me.

typedef __sizeless_struct { svfloat64_t v0, v1, v2; } svfloat64x3_t;

@kito-cheng
Copy link
Collaborator Author

C and C++ don't have values of array type and as such they can't be returned from a function call.
Good point, sounds like array is not an option.

Each type svBASExN_t is sizeless and contains a sequence of N svBASE_ts. The individual vectors are members with names v0, v1, and so on. For example, svfloat64x4_t contains four svfloat64_t vectors with the names v0, v1, v2 and v3.

This paragraph seems gone in later version, svBASExN_t can't access via v0..vn now.
https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_04_en.pdf


For some implementation detail for SVE GCC 10, they allow struct-style initialization, but other operation must call intrinsic function:

  svfloat64_t s64;
  svfloat64x3_t s64x3 = {s64, s64, s64}; // Creation.
  s64x3 = svcreate3(s64, s64, s64); // Creation.
  s64x3 = svset3(s64x3, 1, s64); // Insertion
  s64 = svget3(s64x3, 2); // Extraction

@rofirrim
Copy link
Collaborator

This paragraph seems gone in later version, svBASExN_t can't access via v0..vn now.
https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_04_en.pdf

Thanks @kito-cheng I wasn't aware of the new version.

For some implementation detail for SVE GCC 10, they allow struct-style initialization, but other operation must call intrinsic function:

  svfloat64_t s64;
  svfloat64x3_t s64x3 = {s64, s64, s64}; // Creation.
  s64x3 = svcreate3(s64, s64, s64); // Creation.
  s64x3 = svset3(s64x3, 1, s64); // Insertion
  s64 = svget3(s64x3, 2); // Extraction

This is a reasonable alternative to using structs.

@eopXD eopXD added the Revisit after v1.0 Features or problems we will revisit after the v1.0 release label Jul 29, 2022
@haolongzhangm
Copy link

haolongzhangm commented Jul 31, 2022

@kito-cheng I test "Primitive style" way with vcreate_*
after objdump, we can find vcreate_ will lead to use more vector register and use extra “vmv” instruct , which will lead to poor performance

#include "riscv_vector.h"                                                                                                                                                                                     
#include <cstdio>                                                                                                                                                                                             
                                                                                                                                                                                                              
int main() {                                                                                                                                                                                                  
                                                                                                                                                                                                              
  float src[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};                                                                                                                                    
                                                                                                                                                                                                              
  vfloat32m1_t t0, t1, t2, t3, ret0, ret1;                                                                                                                                                                    
                                                                                                                                                                                                              
  t0 = vle32_v_f32m1(src, 4);                                                                                                                                                                                 
  t1 = vle32_v_f32m1(src + 4, 4);
  t2 = vle32_v_f32m1(src + 8, 4);
  t3 = vle32_v_f32m1(src + 12, 4);

  ret0 = vfadd_vv_f32m1(t0, t1, 4);
  ret1 = vfadd_vv_f32m1(t2, t3, 4);
   
  float dst [8] = {0};

  vse32_v_f32m1(dst, ret0, 4);
  vse32_v_f32m1(dst + 4, ret1, 4);

  for (size_t i = 0; i < 8; i++) {
      printf("%f ", dst[i]);
  }

  printf("\n");


  return 0;
}


   10508:       0087f7d7                vsetvli a5,a5,e32,m1,d1
   1050c:       181c                    addi    a5,sp,48
   1050e:       0207f207                vle.v   v4,(a5)
   10512:       1004                    addi    s1,sp,32
   10514:       009c                    addi    a5,sp,64
   10516:       0207f087                vle.v   v1,(a5)
   1051a:       0204f107                vle.v   v2,(s1)
   1051e:       089c                    addi    a5,sp,80
   10520:       0207f187                vle.v   v3,(a5)
   10524:       02221157                vfadd.vv        v2,v2,v4
   10528:       021190d7                vfadd.vv        v1,v1,v3
   1052c:       e802                    sd      zero,16(sp)
   1052e:       ec02                    sd      zero,24(sp)
   10530:       e002                    sd      zero,0(sp)
   10532:       e402                    sd      zero,8(sp)
   10534:       081c                    addi    a5,sp,16
   10536:       840a                    mv      s0,sp
   10538:       6941                    lui     s2,0x10
   1053a:       02017127                vse.v   v2,(sp)
   1053e:       0207f0a7                vse.v   v1,(a5)
   10542:       00042787                flw     fa5,0(s0) # ffffffffffffd000 <__global_pointer$+0xfffffffffffea800>
   10546:       67090513                addi    a0,s2,1648 # 10670 <__libc_csu_fini+0x4>
   1054a:       420787d3                fcvt.d.s        fa5,fa5
   1054e:       0411                    addi    s0,s0,4
   10550:       e20785d3                fmv.x.d a1,fa5
   10554:       f6dff0ef                jal     ra,104c0 <printf@plt>
   10558:       fe8495e3                bne     s1,s0,10542 <main+0x72>
   1055c:       4529                    li      a0,10
   1055e:       f53ff0ef                jal     ra,104b0 <putchar@plt>
   10562:       70e6                    ld      ra,120(sp)
   10564:       7446                    ld      s0,112(sp)
   10566:       74a6                    ld      s1,104(sp)
   10568:       7906                    ld      s2,96(sp)
   1056a:       4501                    li      a0,0
   1056c:       6109                    addi    sp



use vcreate_
int main() {

  float src[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};

  //vfloat32m1_t t0, t1, t2, t3, ret0, ret1;

  vfloat32m1x4_t src4;// = vcreate_f32m1x4(t0, t1, t2, t3);
  vfloat32m1x2_t dst2;// = vcreate_f32m1x2(ret0, ret1);

  src4 = vset_f32m1x4(src4, 0, vle32_v_f32m1(src, 4));
  src4 = vset_f32m1x4(src4, 1, vle32_v_f32m1(src + 4, 4));
  src4 = vset_f32m1x4(src4, 2, vle32_v_f32m1(src + 8, 4));
  src4 = vset_f32m1x4(src4, 3, vle32_v_f32m1(src + 12, 4));

  dst2 = vset_f32m1x2(dst2, 0, vfadd_vv_f32m1(vget_f32m1x4_f32m1(src4, 0), vget_f32m1x4_f32m1(src4, 1), 4));
  dst2 = vset_f32m1x2(dst2, 1, vfadd_vv_f32m1(vget_f32m1x4_f32m1(src4, 2), vget_f32m1x4_f32m1(src4, 3), 4));
  
  float dst [8] = {0};

  vse32_v_f32m1(dst, vget_f32m1x2_f32m1(dst2, 0), 4);
  vse32_v_f32m1(dst + 4, vget_f32m1x2_f32m1(dst2, 1), 4);

  for (size_t i = 0; i < 8; i++) {
      printf("%f ", dst[i]);
  }

  printf("\n");


  return 0;
}

10508:       0087f7d7                vsetvli a5,a5,e32,m1,d1                                                                                                                                               
   1050c:       1004                    addi    s1,sp,32                                                                                                                                                      
   1050e:       0204f087                vle.v   v1,(s1)                                                                                                                                                       
   10512:       c2002f73                csrr    t5,vl                                                                                                                                                         
   10516:       c2102ff3                csrr    t6,vtype                                                                                                                                                      
   1051a:       00807057                vsetvli zero,zero,e32,m1,d1                                                                                                                                           
   1051e:       5e008257                vmv.v.v v4,v1                                                                                                                                                         
   10522:       81ff7057                vsetvl  zero,t5,t6                                                                                                                                                    
   10526:       32002057                vmv.x.s zero,v0                                                                                                                                                       
   1052a:       181c                    addi    a5,sp,48                                                                                                                                                      
   1052c:       0207f087                vle.v   v1,(a5)                                                                                                                                                       
   10530:       c2002f73                csrr    t5,vl                                                                                                                                                         
   10534:       c2102ff3                csrr    t6,vtype                                                                                                                                                      
   10538:       00807057                vsetvli zero,zero,e32,m1,d1                                                                                                                                           
   1053c:       5e0082d7                vmv.v.v v5,v1                                                                                                                                                         
   10540:       81ff7057                vsetvl  zero,t5,t6                                                                                                                                                    
   10544:       32002057                vmv.x.s zero,v0                                                                                                                                                       
   10548:       009c                    addi    a5,sp,64                                                                                                                                                      
   1054a:       0207f087                vle.v   v1,(a5)                                                                                                                                                       
   1054e:       c2002f73                csrr    t5,vl                                                                                                                                                         
   10552:       c2102ff3                csrr    t6,vtype                                                                                                                                                      
   10556:       00807057                vsetvli zero,zero,e32,m1,d1                                                                                                                                           
   1055a:       5e008357                vmv.v.v v6,v1                                                                                                                                                         
   1055e:       81ff7057                vsetvl  zero,t5,t6                                                                                                                                                    
   10562:       32002057                vmv.x.s zero,v0                                                                                                                                                       
   10566:       089c                    addi    a5,sp,80
   10568:       0207f087                vle.v   v1,(a5)
   1056c:       e802                    sd      zero,16(sp)
   1056e:       c2002f73                csrr    t5,vl
   10572:       c2102ff3                csrr    t6,vtype
   10576:       00807057                vsetvli zero,zero,e32,m1,d1
   1057a:       5e0083d7                vmv.v.v v7,v1
   1057e:       81ff7057                vsetvl  zero,t5,t6
   10582:       32002057                vmv.x.s zero,v0
   10586:       ec02                    sd      zero,24(sp)
   10588:       c2002f73                csrr    t5,vl
   1058c:       c2102ff3                csrr    t6,vtype
   10590:       00807057                vsetvli zero,zero,e32,m1,d1
   10594:       5e020457                vmv.v.v v8,v4
   10598:       81ff7057                vsetvl  zero,t5,t6
   1059c:       32002057                vmv.x.s zero,v0
   105a0:       c2002f73                csrr    t5,vl
   105a4:       c2102ff3                csrr    t6,vtype
   105a8:       00807057                vsetvli zero,zero,e32,m1,d1
   105ac:       5e0284d7                vmv.v.v v9,v5
   105b0:       81ff7057                vsetvl  zero,t5,t6
   105b4:       32002057                vmv.x.s zero,v0
   105b8:       c2002f73                csrr    t5,vl
 105bc:       c2102ff3                csrr    t6,vtype                                                                                                                                                      
   105c0:       00807057                vsetvli zero,zero,e32,m1,d1                                                                                                                                           
   105c4:       5e0300d7                vmv.v.v v1,v6
   105c8:       81ff7057                vsetvl  zero,t5,t6
   105cc:       32002057                vmv.x.s zero,v0
   105d0:       c2002f73                csrr    t5,vl
   105d4:       c2102ff3                csrr    t6,vtype
   105d8:       00807057                vsetvli zero,zero,e32,m1,d1
   105dc:       5e0382d7                vmv.v.v v5,v7
   105e0:       81ff7057                vsetvl  zero,t5,t6
   105e4:       32002057                vmv.x.s zero,v0
   105e8:       081c                    addi    a5,sp,16
   105ea:       840a                    mv      s0,sp
   105ec:       6941                    lui     s2,0x10
   105ee:       02849257                vfadd.vv        v4,v8,v9
   105f2:       021290d7                vfadd.vv        v1,v1,v5
   105f6:       e002                    sd      zero,0(sp)
   105f8:       e402                    sd      zero,8(sp)
   105fa:       c2002f73                csrr    t5,vl
   105fe:       c2102ff3                csrr    t6,vtype
   10602:       00807057                vsetvli zero,zero,e32,m1,d1
   10606:       5e020157                vmv.v.v v2,v4
   1060a:       81ff7057                vsetvl  zero,t5,t6
   1060e:       32002057                vmv.x.s zero,v0
   10612:       c2002f73                csrr    t5,vl
   10616:       c2102ff3                csrr    t6,vtype
   1061a:       00807057                vsetvli zero,zero,e32,m1,d1
   1061e:       5e0081d7                vmv.v.v v3,v1
   10622:       81ff7057                vsetvl  zero,t5,t6
   10626:       32002057                vmv.x.s zero,v0
   1062a:       c2002f73                csrr    t5,vl
   1062e:       c2102ff3                csrr    t6,vtype
   10632:       00807057                vsetvli zero,zero,e32,m1,d1
   10636:       5e010257                vmv.v.v v4,v2
   1063a:       81ff7057                vsetvl  zero,t5,t6
   1063e:       32002057                vmv.x.s zero,v0
   10642:       c2002f73                csrr    t5,vl
   10646:       c2102ff3                csrr    t6,vtype
   1064a:       00807057                vsetvli zero,zero,e32,m1,d1
   1064e:       5e0180d7                vmv.v.v v1,v3
   10652:       81ff7057                vsetvl  zero,t5,t6
   10656:       32002057                vmv.x.s zero,v0
   1065a:       02017227                vse.v   v4,(sp)
   1065e:       0207f0a7                vse.v   v1,(a5)
   10662:       00042787                flw     fa5,0(s0) # ffffffffffffd000 <__global_pointer$+0xfffffffffffea800>
   10666:       79090513                addi    a0,s2,1936 # 10790 <__libc_csu_fini+0x4>
   1066a:       420787d3                fcvt.d.s        fa5,fa5
   1066e:       0411                    addi    s0,s0,4
   10670:       e20785d3                fmv.x.d a1,fa5
   10674:       e4dff0ef                jal     ra,104c0 <printf@plt>

@haolongzhangm
Copy link

haolongzhangm commented Jul 31, 2022

sometimes need must use array vector type for easy coding

so now ,what is the best solution for declaration rvv array like
vfloat32m1_t src4[4]
what`s more
vfloat32m1_t src4[4][2]

at the same, do not import use more instruct

I build args:
riscv64-unknown-linux-gnu-g++ -march=rv64gcv0p7 -mabi=lp64d

@haolongzhangm
Copy link

i test -march=rv64gcv with do not have issue, but use -march=rv64gcv0p7 have "more move instruct" issue

BUT, there are so many board only support v0p7 now, so any possible fix this issue on v0p7

@eopXD
Copy link
Collaborator

eopXD commented Aug 2, 2022

Closing this issue and redirecting to #139. The question essentially boils down to how do we enable sizeless struct in the compiler implementation.

@eopXD eopXD closed this as completed Aug 2, 2022
@kito-cheng
Copy link
Collaborator Author

i test -march=rv64gcv with do not have issue, but use -march=rv64gcv0p7 have "more move instruct" issue

BUT, there are so many board only support v0p7 now, so any possible fix this issue on v0p7

That sounds like a T-head toolchain issue rather than intrinsic interface issue, I would suggest you could report that to T-head directly.

@haolongzhangm
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Revisit after v1.0 Features or problems we will revisit after the v1.0 release
Projects
None yet
Development

No branches or pull requests

5 participants