how to solve cross compile EMLL problem below #9

xuhaoguang · 2021-10-09T08:40:47Z

/EMLL/src/arm_neon/ARMCompareAndSwap.c:1:0: error: invalid feature modifier in '-march=armv8.2-a+dotprod+fp16'
/*****************************************************************************/

CMakeFiles/eml-armneon.dir/build.make:62: recipe for target 'CMakeFiles/eml-armneon.dir/src/arm_neon/ARMCompareAndSwap.c.o' failed
make[2]: *** [CMakeFiles/eml-armneon.dir/src/arm_neon/ARMCompareAndSwap.c.o] Error 1
CMakeFiles/Makefile2:109: recipe for target 'CMakeFiles/eml-armneon.dir/all' failed
make[1]: *** [CMakeFiles/eml-armneon.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

netease-youdao · 2021-10-09T08:51:50Z

It seems that your toolchain doesn't support ARMv8.2-a architecture.

You can try with this EMLL source package (fp16 GEMM and sdot/udot disabled):
EMLL.tar.gz

xuhaoguang · 2021-10-09T09:29:32Z

Will the performance of EMLL(fp16 GEMM and sdot/udot disabled) be much worse than EMLL(fp16 GEMM and sdot/udot enable)?

netease-youdao · 2021-10-09T09:39:57Z

It depends on CPU type. For aarch64 processors supporting armv8.2a-dotprod like cortex-A55/A75/A76/A77/A78, you may see performance degradation in (u)int8->(u)int32 GEMM tasks, but the speed of fp32 GEMM will not be affected. For other processors (cortex-A53/A35/A72) there's no difference.

xuhaoguang · 2021-10-09T09:51:20Z

thanks very much, I will test performance between EMLL and openblas in my device, and I'll consult you if there are other problems

xuhaoguang · 2021-10-11T06:48:38Z

How to let C is row-major in sgemm(A, B, C) ？and not manual conversion after sgemm.

netease-youdao · 2021-10-11T06:53:08Z

If C is row-major, calling sgemm(!b_rowmajor, !a_rowmajor, B, A, C, N, M, K, beta, num_threads) will do the job.

xuhaoguang · 2021-10-11T07:17:29Z

Emll sgemm not support CblasTrans for A/B，we need manual trans before call sgemm func?

netease-youdao · 2021-10-11T07:27:47Z

Let C[MxN] = A[MxK] B[KxN], here is a summary for doing sgemm with all cases of matrix orders(NO NEED FOR additional transposition works):

A	B	C	how_to_call
row major	row major	row major	sgemm(0, 0, B, A, C, N, M, K, beta, num_threads)
row major	row major	column major	sgemm(1, 1, A, B, C, M, N, K, beta, num_threads)
column major	row major	row major	sgemm(0, 1, B, A, C, N, M, K, beta, num_threads)
column major	row major	column major	sgemm(0, 1, A, B, C, M, N, K, beta, num_threads)
row major	column major	row major	sgemm(1, 0, B, A, C, N, M, K, beta, num_threads)
row major	column major	column major	sgemm(1, 0, A, B, C, M, N, K, beta, num_threads)
column major	column major	row major	sgemm(1, 1, B, A, C, N, M, K, beta, num_threads)
column major	column major	column major	sgemm(0, 0, A, B, C, M, N, K, beta, num_threads)

xuhaoguang · 2021-10-11T07:32:37Z

Thnaks, I means EMLL sgemm do not have "CblasTrans" param for matrix B like openblas sgemm func below? but not row-major or col-major.
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasTrans, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc);

netease-youdao · 2021-10-11T07:42:48Z

The orders of matrix A-C can be determined from input parameters to cblas_sgemm:

layout	transa	transb	order of A	order of B	order of C
CblasColMajor	CblasNoTrans	CblasNoTrans	column major	column major	column major
CblasColMajor	CblasTrans	CblasNoTrans	row major	column major	column major
CblasColMajor	CblasNoTrans	CblasTrans	column major	row major	column major
CblasColMajor	CblasTrans	CblasTrans	row major	row major	column major
CblasRowMajor	CblasNoTrans	CblasNoTrans	row major	row major	row major
CblasRowMajor	CblasTrans	CblasNoTrans	column major	row major	row major
CblasRowMajor	CblasNoTrans	CblasTrans	row major	column major	row major
CblasRowMajor	CblasTrans	CblasTrans	column major	column major	row major

netease-youdao · 2021-10-11T07:47:27Z

Please note that EMLL doesn't support padding currently, which means
(1) lda must be K for row-major A, or M for column-major A;
(2) ldb must be N for row-major B, or K for column-major B;
(3) ldc must be N for row-major C, or M for column-major C.

And currently EMLL doesn't support alpha != 1.

xuhaoguang · 2021-10-12T02:32:02Z

Let C[MxN] = A[MxK] B[KxN], and A/B/C are row-major,
If I call sgemm(0, 0, A_f, B_f, C_f, M, N, K, 0, 3), my program run normally just result is not correct，
But if I call sgemm(0, 0, B_f, A_f, C_f, N, M, K, 0, 3), my program run with coredump, Looks like it's out of memory.

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 9937]
0x0000007fb7fda8bc in do_lookup_x () from /lib/ld-linux-aarch64.so.1
(gdb) bt
bt
#0 0x0000007fb7fda8bc in do_lookup_x () from /lib/ld-linux-aarch64.so.1
#1 0x0000007fb7fdb094 in _dl_lookup_symbol_x ()
from /lib/ld-linux-aarch64.so.1
#2 0x0000007fb7fde36c in _dl_fixup () from /lib/ld-linux-aarch64.so.1
#3 0x0000007fb7fe3ee4 in _dl_runtime_resolve ()
from /lib/ld-linux-aarch64.so.1
#4 0x0000007fa653f4d0 in sgemm._omp_fn () from ./libproject.so
#5 0x0000007fa6146ee4 in gomp_thread_start () from /lib/libgomp.so.1
#6 0x0000007fb7d81f4c in start_thread () from /lib/libpthread.so.0
#7 0x0000007fb7cee190 in thread_start () from /lib/libc.so.6
(gdb)

So I don't know why this phenomenon occurs

netease-youdao · 2021-10-12T06:28:21Z

Please show your test code (and maybe compiled executable) to help us solve the problem:)

xuhaoguang · 2021-10-12T07:11:55Z

int emll_sgemm_thread_count = 0;
if (transposed_a == DU_NOTRANS && transposed_b == DU_NOTRANS) {
    // output A[13, 384], B[384,384], C[13, 384]
    fprintf(stderr, "2222 A[%d, %d], B[%d, %d], C[%d, %d]", a->_n, a->_m, b->_n, b->_m, c->_n, c->_m);

    // below can run  normally just result is not correct，
    sgemm(0, 0, (DTYPE*)a->_data, (DTYPE*)b->_data, (DTYPE*)c->_data, a->_n, b->_m, a->_m, beta, emll_sgemm_thread_count);

    // below run occurs coredump
    //sgemm(0, 0, (DTYPE*)b->_data, (DTYPE*)a->_data, (DTYPE*)c->_data, b->_m, a->_n, a->_m, beta, emll_sgemm_thread_count);

    fprintf(stderr, "finish sgemm multiply\n");
}

device cpu： https://www.allwinnertech.com/index.php?c=product&a=index&id=92

xuhaoguang · 2021-10-12T07:37:45Z

Do we have a wechat communication group for EMLL MEN？

xuhaoguang · 2021-10-12T08:10:46Z

This is a gdb info for coredumps.

netease-youdao · 2021-10-13T07:21:03Z

This looks like a thread-local storage issue. You can try to modify codes as suggested in #8 to move buffers from TLS to stack, or set the environment variable OMP_STACKSIZE to increase the memory threshold for child threads.

netease-youdao · 2021-10-13T11:49:26Z

Also the file include/common/CommonSkinnyGer.h needs modifications to move its buffer from TLS to stack:

diff --git a/include/common/CommonSkinnyGer.h b/include/common/CommonSkinnyGer.h
index b0af350..802ebf6 100644
--- a/include/common/CommonSkinnyGer.h
+++ b/include/common/CommonSkinnyGer.h
@@ -326,8 +326,6 @@ static inline void inline_##gemm##_acolmajor_bskinny_beta_##n_dim(\
   k_mask, m_mask, stack_size, atype, btype) \
 GEMM_SKINNY_GER_BETA_FUNC(gemm, n_dim)\
 GEMM_SKINNY_GER_INLINE_FUNCS(gemm, n_dim, k_mask, m_mask)\
-__attribute__((aligned(4096))) static __thread gemm##_skinnyger_cscalar\
-  gemm##_acolmajor_bskinny_a##atype##_b##btype##_##n_dim##_cscratch[stack_size];\
 GEMM_SKINNY_GER_INLINE_DEPACK_FUNC(gemm, m_mask, n_dim)\
 void gemm##_acolmajor_bskinny_a##atype##_b##btype##_n##n_dim(\
   const gemm##_skinnyger_ascalar *A,\
@@ -335,6 +333,9 @@ void gemm##_acolmajor_bskinny_a##atype##_b##btype##_n##n_dim(\
   gemm##_skinnyger_cscalar *C,\
   uint32_t M, uint32_t K, uint8_t b_c_order,\
   gemm##_skinnyger_cscalar beta_inp) {\
+\
+  __attribute__((aligned(4096))) gemm##_skinnyger_cscalar\
+    gemm##_acolmajor_bskinny_a##atype##_b##btype##_##n_dim##_cscratch[stack_size];\
 \
   const bool b_rowmajor = b_c_order & 1;\
   const bool c_rowmajor = b_c_order & 2;\
@@ -431,6 +432,8 @@ void gemm##_acolmajor_bskinny_a##atype##_b##btype##_n##n_dim##_omp(\
   omp_set_num_threads(num_threads);\
   _Pragma("omp parallel")\
   {\
+    __attribute__((aligned(4096))) gemm##_skinnyger_cscalar\
+      gemm##_acolmajor_bskinny_a##atype##_b##btype##_##n_dim##_cscratch[stack_size];\
     const gemm##_skinnyger_ascalar * const A = task_info.m_A;\
     const gemm##_skinnyger_bscalar * const B = task_info.m_B;\
     gemm##_skinnyger_cscalar * const C = task_info.m_C;\

xuhaoguang closed this as completed Oct 25, 2021

kismit mentioned this issue Aug 20, 2022

使用EMLL比OpenBlas+RUY的方式多使用10M内存 #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to solve cross compile EMLL problem below #9

how to solve cross compile EMLL problem below #9

xuhaoguang commented Oct 9, 2021

netease-youdao commented Oct 9, 2021

xuhaoguang commented Oct 9, 2021

netease-youdao commented Oct 9, 2021

xuhaoguang commented Oct 9, 2021

xuhaoguang commented Oct 11, 2021

netease-youdao commented Oct 11, 2021

xuhaoguang commented Oct 11, 2021

netease-youdao commented Oct 11, 2021

xuhaoguang commented Oct 11, 2021

netease-youdao commented Oct 11, 2021

netease-youdao commented Oct 11, 2021

xuhaoguang commented Oct 12, 2021 •

edited

netease-youdao commented Oct 12, 2021

xuhaoguang commented Oct 12, 2021

xuhaoguang commented Oct 12, 2021

xuhaoguang commented Oct 12, 2021

netease-youdao commented Oct 13, 2021 •

edited

netease-youdao commented Oct 13, 2021 •

edited

how to solve cross compile EMLL problem below #9

how to solve cross compile EMLL problem below #9

Comments

xuhaoguang commented Oct 9, 2021

netease-youdao commented Oct 9, 2021

xuhaoguang commented Oct 9, 2021

netease-youdao commented Oct 9, 2021

xuhaoguang commented Oct 9, 2021

xuhaoguang commented Oct 11, 2021

netease-youdao commented Oct 11, 2021

xuhaoguang commented Oct 11, 2021

netease-youdao commented Oct 11, 2021

xuhaoguang commented Oct 11, 2021

netease-youdao commented Oct 11, 2021

netease-youdao commented Oct 11, 2021

xuhaoguang commented Oct 12, 2021 • edited

netease-youdao commented Oct 12, 2021

xuhaoguang commented Oct 12, 2021

xuhaoguang commented Oct 12, 2021

xuhaoguang commented Oct 12, 2021

netease-youdao commented Oct 13, 2021 • edited

netease-youdao commented Oct 13, 2021 • edited

xuhaoguang commented Oct 12, 2021 •

edited

netease-youdao commented Oct 13, 2021 •

edited

netease-youdao commented Oct 13, 2021 •

edited