Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSE2 and SSE3 for smearing and clover_leaf.c #37

Open
urbach opened this issue Jan 19, 2012 · 14 comments
Open

SSE2 and SSE3 for smearing and clover_leaf.c #37

urbach opened this issue Jan 19, 2012 · 14 comments
Labels

Comments

@urbach
Copy link
Contributor

urbach commented Jan 19, 2012

compiling with --enable-sse3|2 gives errors

../../tmLQCD/smearing/stout_stout_smear.c:39: error: can't find a register in class ‘GENERAL_REGS’ while reloading ‘asm’
../../tmLQCD/smearing/stout_stout_smear.c:30: error: ‘asm’ operand has impossible constraints

../tmLQCD/clover_leaf.c:719: error: can't find a register in class ‘GENERAL_REGS’ while reloading ‘asm’
../tmLQCD/clover_leaf.c:611: error: ‘asm’ operand has impossible constraints

which is due to problems in the inline assembly implementation of the su3 etc. macros.

Need to either rework the routines or undef SSE macros in those files.

@deuzeman
Copy link

I'm actually running into issues with the SSE macros for the change to C99 complex as well and am currently having a look at them.

@urbach
Copy link
Contributor Author

urbach commented Jan 20, 2012

I am not sure it's not a compiler problem. However, the SSE macros were explicitly written for the Dirac-operator. And there are inter-marco dependecies which might lead to the observed error messages if the macros are used elsewhere.

@kostrzewa
Copy link
Member

While we're discussing the SSE macros: I've noticed that with icc we disable the SSE2 and SSE3 defines. (I guess because icc automatically attempts to use as much sse as possible and we would conflict.) So if one forces their compilation when using -DSSE2, the code segfaults (although it compiles without errors). I see that the code compiled with ICC is faster despite this, but do we know that this is optimal? I would be interested in exploring this at some point.

@urbach
Copy link
Contributor Author

urbach commented Jan 20, 2012

you are saying icc without -DSSE2 is faster than gcc with -DSSE2 ? What about gcc with -DSSE3?

@kostrzewa
Copy link
Member

I wouldn't trust the ICC with SSE numbers though because the HMC produces loads of NaN's. I think there's a good reason the SSE flags are turned off in the configure.in when using icc!

@deuzeman
Copy link

There might be an issue with the syntax there. We're using AT&T ordering, if I'm not mistaken, while icc might be expecting Intel ordering. That would exchange source and destination and explain the appearance of NaN's. In that case, it might be just a matter of setting compiler flags.

Still, it is interesting that icc appears to do a better job than our manual code -- I suppose that might well be progress in compiler design. It would be good to see what happens for the XLC compiler, actually.

@kostrzewa
Copy link
Member

I see, the code also segfaulted on the p4 machine when the icc code was (force-)compiled with -DSSE2.

I reproduced the benchmark results using the hmc (sample-hmc4.input) and the values are consistent. Gcc without SSE is faster than gcc with SSE3, ICC with SSE2/3 doesn't get anywhere because of NaNs, while icc without SSE still fastest on average even though it doesn't use the hand-written code.

@urbach
Copy link
Contributor Author

urbach commented Sep 1, 2012

actually, I don't really understand this, because I see a significant difference in between gcc with SSE3 and gcc without. With SSE3 is a factor 2 faster. So maybe I need to switch on some compiler flag?

@kostrzewa
Copy link
Member

I think we should remove most of the comments to this issue because I cannot reproduce these results at all anymore. As you say, there's almost a factor 2 difference between gcc/see and gcc/nosse.

Having said that, it would still be interesting to see how fast the code with SSE would become if compiled with ICC.

@deuzeman
Copy link

Does anyone still see issue here? I don't...

@urbach
Copy link
Contributor Author

urbach commented Feb 14, 2013

yes, I have undefined SSE2 and SSE3 at the beginning of the corresponding files. So, its not bug anymore. But we could optimise the code there by using SSE2 and SSE3.

so, should we close this issue?

@kostrzewa
Copy link
Member

compiling with --enable-sse3|2 gives errors
../../tmLQCD/smearing/stout_stout_smear.c:39: error: can't find a register in class ‘GENERAL_REGS’ while reloading ‘asm’
../../tmLQCD/smearing/stout_stout_smear.c:30: error: ‘asm’ operand has impossible constraints
../tmLQCD/clover_leaf.c:719: error: can't find a register in class ‘GENERAL_REGS’ while reloading ‘asm’
../tmLQCD/clover_leaf.c:611: error: ‘asm’ operand has impossible constraints

was this with GCC?

@urbach
Copy link
Contributor Author

urbach commented Feb 14, 2013

ah, sorry... yes, this was with the gcc of course.

@deuzeman
Copy link

I added a missing ALIGN to the declaration of a temporary variable in one of the smearing routines a few days ago. That fixed a segfault for me when compiling with SSE enabled using gcc. Perhaps this was a related issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants