-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C99 complex #42
C99 complex #42
Conversation
Implementing unit testing on our codebase is not an easy task, mainly because of the many interdependencies and initialisations. For this particular part I can imagine it would not be as complicated as long as we can figure out a representative set of tests for the affected parts. I will attempt to figure something out but I'm really just startig out with the unit testing stuff in C. |
rho1.im=g_mu; | ||
rho2.re=1.; | ||
rho2.im=-g_mu; | ||
rho1 = (1.) + cimag(rho1) * I; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have these lines been created by an automated script? Seems a bit funny but I might be missing context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've used an automated script, then polished the result. I cleaned up a fair number of this type of assignment, but will have missed more than just this. Assignments like this look weird, but I believe they are generally semantically correct. The idea would be to clean them up when we find them.
The sample input that Carsten is writing for the solvers might be a start for unit tests that would at least partially cover this. |
@@ -944,8 +944,7 @@ void compute_little_D_diagonal() { | |||
Block_D_psi(&block_list[blk], tmp, block_list[blk].basis[i]); | |||
for(j = 0; j < g_N_s; j++) { | |||
M[i * g_N_s + j] = scalar_prod(block_list[blk].basis[j], tmp, block_list[blk].volume, 0); | |||
block_list[blk].little_dirac_operator32[i*g_N_s + j].re = M[i * g_N_s + j].re; | |||
block_list[blk].little_dirac_operator32[i*g_N_s + j].im = M[i * g_N_s + j].im; | |||
block_list[blk].little_dirac_operator32[i*g_N_s + j] = creal(M[i * g_N_s + j]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as below, actually...
Not sure what is the best thing to do here, but I suppose it makes sense that I make corrections to the code in my personal fork. I could then close the request and send out a pull request for the mended version. Or would two merges be fine? |
On 23/01/12 22:31, Albert Deuzeman wrote:
Anything you commit will show up in the pull request. No need to close Bartek |
Well, but those are not really unit tests in the sense that they test very large chunks of code rather than the smallest possible subsets. Having that, however, would already be a huge improvement. Especially if we have some "sane" results to go by. |
I was wondering how we should organize reviewing this. Do you want some more time to go through it on your own and then two or three of us read through different sections, making notes as I did yesterday? |
Originally, I had in mind that we could have a first implementation that would be semantically correct at least, if not particularly nicely written. After your comments, I decided to see how much work it would be to get rid of at least most of the remaining ugliness too. It wasn't as bad as I had though, so I think my additional commits have polished things quite a bit. At this point, I just don't know of any other points where immediate improvement is to be had, though they're probably there. So for that reason, I would really be happy for fresh eyes on the work. The alternative would perhaps be for me to let the stuff rest for a bit and then get back to rereading everything, but I would prefer to actually move on to the smearing itself. While test cases for the solvers might not themselves make for very good unit tests, they would actually be very good here. If I did things correctly, the C99 version should be completely equivalent to the previous version. Since the solvers use a good deal of the complex functionality, getting matching results would give me some confidence that things are in order generally. |
I also found a few files that were missed by the script: benchmark.c I'm not sure whether these are all of them. I found them by comparing the diff of this pull request with a
and
so I might well have missed some where only imaginary parts were taken. |
Thanks! It's true that the smearing directory hasn't been included yet, because I need to go over those files for different reasons anyway. For the other files, I guess they were never compiled, which is how I checked. I'll have a look this afternoon, but there may actually be some dead code in that list as well. |
Sloppy that I hadn't done the grep myself, I know, but that should be all... :) |
@@ -3,23 +3,24 @@ | |||
* | |||
* This file is part of tmLQCD. | |||
* | |||
* tmLQCD is free software: you can redistribute it and/or modify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script unfortunately is also attacking comment lines. Should be an easy fix, but put it on a list somewhere.
It fails to compile for me right now, complaining about multiple definitions of smearing routines. |
I thought I had dealt with this, but apparently not... :( Hmm. I think I'll solve this by removing the smearing directory from build system path temporarily. That code is in flux and currently unused, so there doesn't seem to be much point in patching it up in this state. |
Compiles cleanly for me, again. Sorry about that. |
Note that there was supposed to be a compiler flag to switch to C99 complex. The current implementation is essentially this one, though 'complex' is replaced by the standard _Complex double.
… using C99 in io.
can you check for NaNs? because that sort of slowdown would be indicative of that and therefore some remaining problem. We found a few spots which created NaNs on Intel hardware too, leading to that type of slowdown as program dealt with the NaN arithmetic. |
no NaNs at all, the result does agree with that of the old code... |
just compare this (its interactive with llrun on jugene test rack):
versus old code:
|
hmm, interesting: in the commit
all creal with __creal and I am getting the performance of the old code now. So, what about merging this pull request?! |
That's great to hear. There's one last thing that springs to mind though: Given the abysmal performance of the non-builtin complex operations on XLC/BGP, it would be cautious to test the performance of cexp. As far as I understand, cexp is used extensively in the cayley-hamilton exponentiation the smearing code and it would be quite disappointing to find it bogged down substantially because of the lack of a "built-in" implementation. |
well, we have to write a BG version of the clover term anyhow, and cexp should be replacable by an optimised version, too, shouldn't it? What would be the alternative? I don't see any!? |
there is no __cexp as far as I know (unless the xlc docu is out of date or doesn't reflect the true state of affairs) ( http://publib.boulder.ibm.com/infocenter/compbgpl/v9v111/index.jsp )
Implementing our own optimized cexp... |
but before we merge in c99complex? |
I was gone for the weekend, so there's a bit of catching up I have to do. Reading your comments about Bartek, do you any sources on the problem? One could always rewrite |
As a bit of a tl;dr for now... are we reasonably sure about the code? The high statistics runs worked out? |
Albert: yes, I have updated my comments about the criteria with all the tests I did and where I think they are passed now Maybe this should be moved to a wiki page...!? |
AFAIK Carsten is getting good results to within machine precision for the expectation values. I don't know how the individual histories differ at this point. Albert, you mentioned above (https://gist.github.com/2049892) that you saw strongly diverging histories even with the bug fixed in expo? I have no idea how worrisome this is. |
…ormance of the old code Conflicts: bgl.h
@urbach Yes, we can move that to a wiki page. That will also make sure we have the list for future reference. @kostrzewa Well, I saw that the configurations diverge slowly, the differences creeping in over a series of trajectories. That's not necessarily a sign of anything particularly bad happening -- as I mentioned in that post, the rounding behaviour of the code is known to have been changed a little. It seems that there is a point where the two runs really decouple, but that could actually be down to a single Metropolis step resulting in something different. |
Speaking of the diverging histories: There is clearly still a bug left in there. The rectangle part of the action is not computed correctly even in the first step! |
well, if we have round off errors we have to see strongly diverging histories. Currently it takes about O(100) trajectories, which is reasonable, I think. And we get agreement within statistical uncertainties. So, I am not worried. Of course, we cannot exclude and error which hides at the order of round off. Do you expect less differences? Any estimate for what you would expect? |
@kostrzewa Are you referring to the last column in that comparison? I believe that one refers to the timing information -- performance was a lot better with C99 for me. |
Oh dear, indeed hmc0 does not have rectangles... |
where do you see this? |
because for me it is correctly computed. Don't mix up with the running time for the trajectory. |
False alarm, sorry about that. I forgot that hmc0 uses the standard wilson gauge action (and therefore the last column is the trajectory time). Although it would be informative to actually check this. |
it is checked in sample-hmc3.input, for instance. Guys, can we have a short skype conference to make this a bit more effective!? Say at 11:30? |
Euhm... I don't have the setup to do that here. I could probably arrange something, but not within twenty minutes... Is later in the afternoon an option? |
Sure, I've got my headphones with me. I just don't know whether it will work this time around as it failed miserably on the 14th... I'm okay with any time today before 16:30 |
say at 2 pm? |
That should work. I'll let you know in advance if it won't. |
Conversion to c99 complex implementation.
This a large set of changes, first of all weeding out all the occurrences of "complex" and "complex32" and replacing them by their C99 counterparts. After that easy part was done, all of the ".re" and ".im" occurrences were found and replaced by "creal()" and "cimag()" where applicable, or completely rewritten in terms of proper complex algebra. All complex defines were replaced by their operator equivalents as well, to make for more readable code.
While this code compiles for me and I am not aware of any concrete errors, it is highly likely that I missed out on one or two problematic cases and the meaning of the code may have subtly changed somewhere. Having unit tests available would be fantastic, but in their absence, I would urge whomever reviews this to be cautious.