New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add an AVX1 / Sandy Bridge DGEMM kernel #1
Comments
woops, for some reason thought there wasnt an avx kernel |
No worries! We also struggle sometimes with which configurations and kernels we have available. |
gotcha. I may have a go at writing some toy kernels to compare against the ones BLIS has already none the less some time. |
Closed
Closed
loveshack
pushed a commit
to loveshack/blis
that referenced
this issue
Sep 24, 2019
This needs fixing properly somehow, but using -O3 (at least with gcc 8.3), we get this: Program received signal SIGILL, Illegal instruction. 0x000000001004c660 in bli_cntx_init_power9_ref (cntx=0x103e06b0) at ref_kernels/bli_cntx_ref.c:456 456 for ( i = 0; i < BLIS_NUM_LEVEL3_OPS; ++i ) vfuncs[ i ] = NULL; (gdb) bt #0 0x000000001004c660 in bli_cntx_init_power9_ref (cntx=0x103e06b0) at ref_kernels/bli_cntx_ref.c:456 flame#1 0x000000001004c0a8 in bli_cntx_init_power9 (cntx=<optimized out>) at config/power9/bli_cntx_init_power9.c:42 flame#2 0x000000001003c85c in bli_gks_register_cntx (id=BLIS_ARCH_POWER9, nat_fp=0x1004c090 <bli_cntx_init_power9>, ref_fp=0x1004c0d0 <bli_cntx_init_power9_ref>, ind_fp=<optimized out>) at frame/base/bli_gks.c:373 flame#3 0x000000001003c97c in bli_gks_init () at frame/base/bli_gks.c:155 flame#4 0x000000001003cfe8 in bli_init_apis () at frame/base/bli_init.c:78 flame#5 0x00007ffff7e045a8 in __pthread_once_slow () from /lib64/libpthread.so.0 flame#6 0x00000000100492e8 in bli_pthread_once (once=<optimized out>, init=<optimized out>) at frame/thread/bli_pthread.c:314 flame#7 0x000000001003d138 in bli_init_once () at frame/base/bli_init.c:104 flame#8 bli_init_auto () at frame/base/bli_init.c:54 flame#9 0x0000000010011300 in cdotc_ (n=<optimized out>, x=<optimized out>, incx=<optimized out>, y=<optimized out>, incy=<optimized out>) at frame/compat/bla_dot.c:89 flame#10 0x0000000010002a48 in check2_ (sfac=0x103d14dc <sfac>) at blastest/src/cblat1.c:529 flame#11 0x0000000010001ef4 in main () at blastest/src/cblat1.c:112
xrq-phys
referenced
this issue
in xrq-phys/blis
May 29, 2021
- x7, x8: Used to store address for Alpha and Beta. As Alpha & Beta was not used in k-loops, use x0, x1 to load Alpha & Beta's addresses after k-loops are completed, since A & B's addresses are no longer needed there. This "ldr [addr]; -> ldr val, [addr]" would not cause much performance drawback since it is done outside k-loops and there are plenty of instructions between Alpha & Beta's loading and usage. - x9: Used to store cs_c. x9 is multiplied by 8 into x10 and not used any longer. Directly loading cs_c and into x10 and scale by 8 spares x9 straightforwardly. - x11, x12: Not used at all. Simply remove from clobber list. - x13: Alike x9, loaded and scaled by 8 into x14, except that x13 is also used in a conditional branch so that "cmp x13, #1" needs to be modified into "cmp x14, flame#8" to completely free x13. - x3, x4: Used to store next_a & next_b. Untouched in k-loops. Load these addresses into x0 and x1 after Alpha & Beta are both loaded, since then neigher address of A/B nor address of Alpha/Beta is needed.
niyas-sait
pushed a commit
to niyas-sait/blis
that referenced
this issue
Feb 25, 2022
Reimplement setup, with jsonl of make commands
Aaron-Hutchinson
referenced
this issue
in sifive/sifive-blis
Apr 4, 2023
leekillough
pushed a commit
to leekillough/blis
that referenced
this issue
Apr 12, 2023
Added whitespace and other formatting fixes
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I may have a go exploring that if i find some time, but if someone else beats me to it, that'd be great too!
The text was updated successfully, but these errors were encountered: