-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4mh failures with icc #51
Comments
Does this only appear with Intel 16.0.1 compilers, i.e. not e.g. Intel 15.1 compilers? I'm trying to figure out if this is a real compiler bug or something else. |
I'm going to try 14, 15, and different flags today. On March 26, 2016 12:53:36 AM CDT, Jeff Hammond notifications@github.com wrote:
|
It happens with 14.0.3 and 15.0.3 too, with either -xAVX or -mavx. What does fix it is changing -O3 to -O0 or -O1 (-O2 is still broken). Note that you can reproduce this directly from devinamatthews/blis@e6e5664 by configuring with |
Can we manually mix flags to narrow down where the bad codegen happens? Is micro kernel assembly or intrinsics? |
Changing flags for ukernel vs. everything else is easy, just tell me what to try. Sandybridge kernels are asm (as are dunnington I think). @fgvanzee would know better what specific code for 4mh might be the problem. |
Although this makes me wonder, does anybody know if cgemm4mh et al. pass on KNC? |
Just get the matrix of {O0,O1,O2,O3}^2 or a 2x2 subset thereof for ukernel and non-ukernel. That should tell us what we need to know. |
Looks like -O2 and above in the framework, regardless of optimization in the ukernel triggers it. I suspect the 4mh packing code, but I do not know how to isolate that. @fgvanzee? |
I assume we can manually recompile specific object code with different This situation motivates the creation of unit tests associated with It is surprising that three major releases of Intel compilers have the same I'll work on this next week. |
clang scan-build comes up empty, but cranking up the warning with icc and clang shows lots of shadowing of |
@devinamatthews, @jeffhammond: Thanks for your efforts in looking into this. Currently, the default induced method for complex gemm is 4m1, so thankfully there is little risk in this issue biting anyone. (You would have to be calling one of the 4mh interfaces explicitly, or change the default at runtime.) And of course, this only applies when a real gemm micro-kernel is optimized but its complex gemm equivalent is not optimized. But still, I appreciate the idea of getting to the bottom of this even if it is not in anyone's use path. @jeffhammond: You raise a good point: it would be good to have some test functionality for packm in the test suite. However, many of these induced methods have their own packm kernels, some of which would be awkward to test. I'd have to give it some thought. "Native" packm would be much easier, but of course that one is not giving us trouble. @devinamatthews: I may come visit you in-person later this week so we can talk more about the shadowing. I haven't looked into it, but I assume it's an easy fix to rule that out. |
My motivation was mostly to get this figured out before putting in PR #54, but since 4mh is not the default (nor is icc for that matter), I went ahead. Should get fixed at some point, though. |
@devinamatthews Perhaps when you get a chance, you can verify that these test failures still manifest with icc post-"big commit". It's low very priority, though. |
The failures have disappeared as mysteriously as they appeared. Closing. |
Spooky. There are very few bugs I've never gotten to the bottom of. This is one I am happy to let go. |
|
KNL is dead, so closing. |
@devinamatthews Not dead, just a branch that was merged into the main branch 😜 |
The following operations fail with the dunnington and sandybridge configurations (and probably haswell too, but not tested) using icc 16.0.1 and compiling with CVECOPTS = '-xSSE4.2' and '-xAVX' respectively:
cgemm4mh
chemm4mh
csymm4mh
csyrk4mh
csyr2k4mh
ctrmm34mh
The text was updated successfully, but these errors were encountered: