Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4mh failures with icc #51

Closed
devinamatthews opened this issue Mar 26, 2016 · 18 comments
Closed

4mh failures with icc #51

devinamatthews opened this issue Mar 26, 2016 · 18 comments

Comments

@devinamatthews
Copy link
Member

The following operations fail with the dunnington and sandybridge configurations (and probably haswell too, but not tested) using icc 16.0.1 and compiling with CVECOPTS = '-xSSE4.2' and '-xAVX' respectively:

cgemm4mh
chemm4mh
csymm4mh
csyrk4mh
csyr2k4mh
ctrmm34mh

@jeffhammond
Copy link
Member

Does this only appear with Intel 16.0.1 compilers, i.e. not e.g. Intel 15.1 compilers? I'm trying to figure out if this is a real compiler bug or something else.

@devinamatthews
Copy link
Member Author

I'm going to try 14, 15, and different flags today.

On March 26, 2016 12:53:36 AM CDT, Jeff Hammond notifications@github.com wrote:

Does this only appear with Intel 16.0.1 compilers? I'm trying to
figure out if this is a real compiler bug or something else.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#51 (comment)

@devinamatthews
Copy link
Member Author

It happens with 14.0.3 and 15.0.3 too, with either -xAVX or -mavx. What does fix it is changing -O3 to -O0 or -O1 (-O2 is still broken). Note that you can reproduce this directly from devinamatthews/blis@e6e5664 by configuring with ./configure CC=icc sandybridge.

@jeffhammond
Copy link
Member

Can we manually mix flags to narrow down where the bad codegen happens? Is micro kernel assembly or intrinsics?

@devinamatthews
Copy link
Member Author

Changing flags for ukernel vs. everything else is easy, just tell me what to try. Sandybridge kernels are asm (as are dunnington I think). @fgvanzee would know better what specific code for 4mh might be the problem.

@devinamatthews
Copy link
Member Author

Although this makes me wonder, does anybody know if cgemm4mh et al. pass on KNC?

@jeffhammond
Copy link
Member

Just get the matrix of {O0,O1,O2,O3}^2 or a 2x2 subset thereof for ukernel and non-ukernel. That should tell us what we need to know.

@devinamatthews
Copy link
Member Author

Looks like -O2 and above in the framework, regardless of optimization in the ukernel triggers it. I suspect the 4mh packing code, but I do not know how to isolate that. @fgvanzee?

@jeffhammond
Copy link
Member

I assume we can manually recompile specific object code with different
flags.

This situation motivates the creation of unit tests associated with
pack/unpack routines.

It is surprising that three major releases of Intel compilers have the same
issue. Are we sure that the code is strict ISO C?

I'll work on this next week.

@devinamatthews
Copy link
Member Author

clang scan-build comes up empty, but cranking up the warning with icc and clang shows lots of shadowing of i and j in packm_struc_cxk_*. Mostly looks harmless but icc could theoretically be using the wrong variable somewhere.

@fgvanzee
Copy link
Member

@devinamatthews, @jeffhammond: Thanks for your efforts in looking into this. Currently, the default induced method for complex gemm is 4m1, so thankfully there is little risk in this issue biting anyone. (You would have to be calling one of the 4mh interfaces explicitly, or change the default at runtime.) And of course, this only applies when a real gemm micro-kernel is optimized but its complex gemm equivalent is not optimized. But still, I appreciate the idea of getting to the bottom of this even if it is not in anyone's use path.

@jeffhammond: You raise a good point: it would be good to have some test functionality for packm in the test suite. However, many of these induced methods have their own packm kernels, some of which would be awkward to test. I'd have to give it some thought. "Native" packm would be much easier, but of course that one is not giving us trouble.

@devinamatthews: I may come visit you in-person later this week so we can talk more about the shadowing. I haven't looked into it, but I assume it's an easy fix to rule that out.

@devinamatthews
Copy link
Member Author

My motivation was mostly to get this figured out before putting in PR #54, but since 4mh is not the default (nor is icc for that matter), I went ahead. Should get fixed at some point, though.

@fgvanzee
Copy link
Member

@devinamatthews Perhaps when you get a chance, you can verify that these test failures still manifest with icc post-"big commit". It's low very priority, though.

@devinamatthews
Copy link
Member Author

devinamatthews commented Apr 18, 2016

The failures have disappeared as mysteriously as they appeared. Closing.

@fgvanzee
Copy link
Member

Spooky. There are very few bugs I've never gotten to the bottom of. This is one I am happy to let go.

@devinamatthews
Copy link
Member Author

cher2k3m1 and cher2k4m1a are giving residuals of NaN on KNL, so the underlying cause may be resurfacing (everything else passes).

@devinamatthews
Copy link
Member Author

KNL is dead, so closing.

@jeffhammond
Copy link
Member

@devinamatthews Not dead, just a branch that was merged into the main branch 😜

myeh01 added a commit to sifive/sifive-blis that referenced this issue Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants