-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compilation failed with g++ 6.3.0 #4
Comments
I am getting this error also with g++ 5.4.0
|
Could it be that you are not compiling on an AVX-enabled CPU? I just changed the Makefile to always use -mavx irrespective of the underlying architecture. Is this solving your issues? |
@springer13 it builds, but I get the following error when trying to use it within the CTF test_suite (is there a native test I can run?), which at first sight looks possibly related to avx
|
Have you tried the scalar version? Is this working? You can build the benchmark (see Readme.md for instructions). Alternatively, you could build the testframework. Please let me know if this is givin you any problems. |
Hi, My cpu is a skylake (6700K). |
Hi again, |
Thanks for spotting this. I just pushed a fix. I am now using the unaligned loades and stores. As long as the data is aligned to 32byte boundaries there is no performance penalty. However, these instructions also work in the non-aligned case. |
OK, I am a bit surprised about the performance penalty... In my experience the unaligned store and load are slower... |
There is an easy way to test this: run the benchmark (once with storeu_ps and once with store_ps) and see if you encounter any performance penalty, usually this should not be the case. However, if you actually shift the array say to a 16byte boundary, then it might be different. This would require HPTT to peal-off the first 16 bytes and align it to a 32byte boundary; this is not done yet. |
Ok, I will check. Note that my input arrays A and B are properly 32 bytes aligned. |
Wow.. It is fast ! |
I am happy to hear that :) From what I have seen, HPTT achieves close to peak performance for across a wide range of tensor transpositions and sizes (see paper). Its advantages really play out once the tensors become too large to fit into the caches. That being said, notice that 50x50x50 is too small to get reliable timings since this actually fits into L3 cache (~488 KiB). |
Yep, I should increase the array size. |
Without the source code it is hard to tell whether this is a bug in HPTT or on your side. However, if you tell HPTT to use streaming stores via the expert interface, then HPTT will use streaming stores no matter what. Thus, you have to make sure that streaming stores are applicable. This is done to reduce the overhead due to these branches for very small tensor transpositions when such an overhead becomes noticable. |
I have the segfault with the non-expert execute method... |
can you please provide the tensor transposition and size that you have used? beta = 0? Which compiler did you use? |
const int dim = 3; int perm[dim]; double alpha=1.0; const int numThreads=4; auto plan = hptt::create_plan(perm, dim, gcc version 6.3.0 20170406 (Ubuntu 6.3.0-12ubuntu2) |
Changing line 262 seems to solve the problem ;) |
Oh yes, good catch. Thanks. Please pull. |
Done. Thanks ! |
Hi,
the hptt compilation failed on my machine (6700K-ubuntu 17.04-g++ 6.3.0) with the following message:
/usr/lib/gcc/x86_64-linux-gnu/6/include/avxintrin.h:994:1: error: inlining failed in call to always_inline ‘void hptt::_mm256_stream_ps(float*, hptt::__m256)’: target specific option mismatch _mm256_stream_ps (float *__P, __m256 __A) ^~~~~~~~~~~~~~~~
The compilation is OK with intel icpc 2018.
The text was updated successfully, but these errors were encountered: