New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to solve cross compile EMLL problem below #9
Comments
It seems that your toolchain doesn't support ARMv8.2-a architecture. You can try with this EMLL source package (fp16 GEMM and sdot/udot disabled): |
Will the performance of EMLL(fp16 GEMM and sdot/udot disabled) be much worse than EMLL(fp16 GEMM and sdot/udot enable)? |
It depends on CPU type. For aarch64 processors supporting armv8.2a-dotprod like cortex-A55/A75/A76/A77/A78, you may see performance degradation in (u)int8->(u)int32 GEMM tasks, but the speed of fp32 GEMM will not be affected. For other processors (cortex-A53/A35/A72) there's no difference. |
thanks very much, I will test performance between EMLL and openblas in my device, and I'll consult you if there are other problems |
How to let C is row-major in sgemm(A, B, C) ?and not manual conversion after sgemm. |
If C is row-major, calling sgemm(!b_rowmajor, !a_rowmajor, B, A, C, N, M, K, beta, num_threads) will do the job. |
Emll sgemm not support CblasTrans for A/B,we need manual trans before call sgemm func? |
Let C[MxN] = A[MxK] B[KxN], here is a summary for doing sgemm with all cases of matrix orders(NO NEED FOR additional transposition works):
|
Thnaks, I means EMLL sgemm do not have "CblasTrans" param for matrix B like openblas sgemm func below? but not row-major or col-major. |
The orders of matrix A-C can be determined from input parameters to cblas_sgemm:
|
Please note that EMLL doesn't support padding currently, which means And currently EMLL doesn't support alpha != 1. |
Let C[MxN] = A[MxK] B[KxN], and A/B/C are row-major,
So I don't know why this phenomenon occurs |
Please show your test code (and maybe compiled executable) to help us solve the problem:) |
device cpu: https://www.allwinnertech.com/index.php?c=product&a=index&id=92 |
Do we have a wechat communication group for EMLL MEN? |
This looks like a thread-local storage issue. You can try to modify codes as suggested in #8 to move buffers from TLS to stack, or set the environment variable OMP_STACKSIZE to increase the memory threshold for child threads. |
Also the file include/common/CommonSkinnyGer.h needs modifications to move its buffer from TLS to stack:
|
/EMLL/src/arm_neon/ARMCompareAndSwap.c:1:0: error: invalid feature modifier in '-march=armv8.2-a+dotprod+fp16'
/*****************************************************************************/
CMakeFiles/eml-armneon.dir/build.make:62: recipe for target 'CMakeFiles/eml-armneon.dir/src/arm_neon/ARMCompareAndSwap.c.o' failed
make[2]: *** [CMakeFiles/eml-armneon.dir/src/arm_neon/ARMCompareAndSwap.c.o] Error 1
CMakeFiles/Makefile2:109: recipe for target 'CMakeFiles/eml-armneon.dir/all' failed
make[1]: *** [CMakeFiles/eml-armneon.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
The text was updated successfully, but these errors were encountered: