Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C99 complex #42

Merged
merged 66 commits into from
Mar 19, 2012
Merged

C99 complex #42

merged 66 commits into from
Mar 19, 2012

Conversation

deuzeman
Copy link

This a large set of changes, first of all weeding out all the occurrences of "complex" and "complex32" and replacing them by their C99 counterparts. After that easy part was done, all of the ".re" and ".im" occurrences were found and replaced by "creal()" and "cimag()" where applicable, or completely rewritten in terms of proper complex algebra. All complex defines were replaced by their operator equivalents as well, to make for more readable code.

While this code compiles for me and I am not aware of any concrete errors, it is highly likely that I missed out on one or two problematic cases and the meaning of the code may have subtly changed somewhere. Having unit tests available would be fantastic, but in their absence, I would urge whomever reviews this to be cautious.

@kostrzewa
Copy link
Member

Implementing unit testing on our codebase is not an easy task, mainly because of the many interdependencies and initialisations. For this particular part I can imagine it would not be as complicated as long as we can figure out a representative set of tests for the affected parts. I will attempt to figure something out but I'm really just startig out with the unit testing stuff in C.

rho1.im=g_mu;
rho2.re=1.;
rho2.im=-g_mu;
rho1 = (1.) + cimag(rho1) * I;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have these lines been created by an automated script? Seems a bit funny but I might be missing context.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've used an automated script, then polished the result. I cleaned up a fair number of this type of assignment, but will have missed more than just this. Assignments like this look weird, but I believe they are generally semantically correct. The idea would be to clean them up when we find them.

@deuzeman
Copy link
Author

The sample input that Carsten is writing for the solvers might be a start for unit tests that would at least partially cover this.

@@ -944,8 +944,7 @@ void compute_little_D_diagonal() {
Block_D_psi(&block_list[blk], tmp, block_list[blk].basis[i]);
for(j = 0; j < g_N_s; j++) {
M[i * g_N_s + j] = scalar_prod(block_list[blk].basis[j], tmp, block_list[blk].volume, 0);
block_list[blk].little_dirac_operator32[i*g_N_s + j].re = M[i * g_N_s + j].re;
block_list[blk].little_dirac_operator32[i*g_N_s + j].im = M[i * g_N_s + j].im;
block_list[blk].little_dirac_operator32[i*g_N_s + j] = creal(M[i * g_N_s + j]);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as below, actually...

@deuzeman
Copy link
Author

Not sure what is the best thing to do here, but I suppose it makes sense that I make corrections to the code in my personal fork. I could then close the request and send out a pull request for the mended version. Or would two merges be fine?

@kostrzewa
Copy link
Member

On 23/01/12 22:31, Albert Deuzeman wrote:

Not sure what is the best thing to do here, but I suppose it makes sense that I make corrections to the code in my personal fork. I could then close the request and send out a pull request for the mended version. Or would two merges be fine?

Anything you commit will show up in the pull request. No need to close
and reup.

Bartek

@kostrzewa
Copy link
Member

The sample input that Carsten is writing for the solvers might be a start for unit tests that would at least partially cover this.

Well, but those are not really unit tests in the sense that they test very large chunks of code rather than the smallest possible subsets. Having that, however, would already be a huge improvement. Especially if we have some "sane" results to go by.

@kostrzewa
Copy link
Member

I was wondering how we should organize reviewing this. Do you want some more time to go through it on your own and then two or three of us read through different sections, making notes as I did yesterday?

@deuzeman
Copy link
Author

Originally, I had in mind that we could have a first implementation that would be semantically correct at least, if not particularly nicely written. After your comments, I decided to see how much work it would be to get rid of at least most of the remaining ugliness too. It wasn't as bad as I had though, so I think my additional commits have polished things quite a bit.

At this point, I just don't know of any other points where immediate improvement is to be had, though they're probably there. So for that reason, I would really be happy for fresh eyes on the work. The alternative would perhaps be for me to let the stuff rest for a bit and then get back to rereading everything, but I would prefer to actually move on to the smearing itself.

While test cases for the solvers might not themselves make for very good unit tests, they would actually be very good here. If I did things correctly, the C99 version should be completely equivalent to the previous version. Since the solvers use a good deal of the complex functionality, getting matching results would give me some confidence that things are in order generally.

@kostrzewa
Copy link
Member

I also found a few files that were missed by the script:

benchmark.c
check_locality.c
hopping_test.c
init_jacobi_field.c
jacobi.c
overrelaxation.c
sf_calc_action.c
xchange_field.c (this is only in a comment)
test/scalar_prod_r_test.c
smearing/utils_print_su3.c
smearing/utils_project_antiherm.c
smearing/utils_project_herm.c
smearing/utils_reunitarize.c
smearing/utils_reunitarize_MILC.c
test/check_nan.c
test/test_eigenvalues.c

I'm not sure whether these are all of them. I found them by comparing the diff of this pull request with a

grep "\.re" ./*.[c,h]

and

grep "\.re" */*.[c,h]

so I might well have missed some where only imaginary parts were taken.

@deuzeman
Copy link
Author

Thanks! It's true that the smearing directory hasn't been included yet, because I need to go over those files for different reasons anyway. For the other files, I guess they were never compiled, which is how I checked. I'll have a look this afternoon, but there may actually be some dead code in that list as well.

@deuzeman
Copy link
Author

Sloppy that I hadn't done the grep myself, I know, but that should be all... :)

@@ -3,23 +3,24 @@
*
* This file is part of tmLQCD.
*
* tmLQCD is free software: you can redistribute it and/or modify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script unfortunately is also attacking comment lines. Should be an easy fix, but put it on a list somewhere.

This was referenced Jan 30, 2012
@kostrzewa
Copy link
Member

It fails to compile for me right now, complaining about multiple definitions of smearing routines.

@deuzeman
Copy link
Author

I thought I had dealt with this, but apparently not... :( Hmm. I think I'll solve this by removing the smearing directory from build system path temporarily. That code is in flux and currently unused, so there doesn't seem to be much point in patching it up in this state.

@deuzeman
Copy link
Author

Compiles cleanly for me, again. Sorry about that.

@kostrzewa
Copy link
Member

but on the BG the c99complex version is a factor 20 slower, even if the BG optimisation is switched on
arg...! :(

can you check for NaNs? because that sort of slowdown would be indicative of that and therefore some remaining problem. We found a few spots which created NaNs on Intel hardware too, leading to that type of slowdown as program dealt with the NaN arithmetic.

@urbach
Copy link
Contributor

urbach commented Mar 18, 2012

no NaNs at all, the result does agree with that of the old code...

@urbach
Copy link
Contributor

urbach commented Mar 18, 2012

just compare this (its interactive with llrun on jugene test rack):

The number of processes is 32 
The lattice size is 32 x 16 x 16 x 16
The local lattice size is 16 x 8 x 8 x 4
Communication switched on:
(4 Mflops [64 bit arithmetic])

Communication switched off: 
(28 Mflops [64 bit arithmetic])
The size of the package is 221184 bytes.
The bandwidth is  0.37 +  0.37 MB/sec

versus old code:

total time 2.065312e+00 sec, Variance of the time 6.847701e-06 sec. (itarations=4096). 

(639 Mflops [64 bit arithmetic])

communication switched off 
(684 Mflops [64 bit arithmetic])

The size of the package is 221184 Byte 
The bandwidth is 746.10 + 746.10   MB/sec

@urbach urbach closed this Mar 18, 2012
@urbach urbach reopened this Mar 18, 2012
@urbach
Copy link
Contributor

urbach commented Mar 18, 2012

hmm, interesting: in the commit
d7a9b81089302fbd78d9b4699c97f5d2b2706fd6
I replaced in

bgl.h

all creal with __creal
and cimag with __cimag

and I am getting the performance of the old code now. So, what about merging this pull request?!

@kostrzewa
Copy link
Member

and I am getting the performance of the old code now. So, what about merging this pull request?!

That's great to hear. There's one last thing that springs to mind though: Given the abysmal performance of the non-builtin complex operations on XLC/BGP, it would be cautious to test the performance of cexp. As far as I understand, cexp is used extensively in the cayley-hamilton exponentiation the smearing code and it would be quite disappointing to find it bogged down substantially because of the lack of a "built-in" implementation.

@urbach
Copy link
Contributor

urbach commented Mar 19, 2012

well, we have to write a BG version of the clover term anyhow, and cexp should be replacable by an optimised version, too, shouldn't it?

What would be the alternative? I don't see any!?

@kostrzewa
Copy link
Member

cexp should be replacable by an optimised version, too, shouldn't it?

there is no __cexp as far as I know (unless the xlc docu is out of date or doesn't reflect the true state of affairs) ( http://publib.boulder.ibm.com/infocenter/compbgpl/v9v111/index.jsp )

What would be the alternative? I don't see any!?

Implementing our own optimized cexp...

@urbach
Copy link
Contributor

urbach commented Mar 19, 2012

but before we merge in c99complex?

@deuzeman
Copy link
Author

I was gone for the weekend, so there's a bit of catching up I have to do. Reading your comments about cexp, however, I must say I'm not too worried. We should be able write an optimized version just for the BG without too many problems and it isn't going to come into play for some time -- we're not using the Cayley-Hamilton method in production yet.

Bartek, do you any sources on the problem? One could always rewrite cexp in terms of sines and cosines -- is that broken somehow?

@deuzeman
Copy link
Author

As a bit of a tl;dr for now... are we reasonably sure about the code? The high statistics runs worked out?

@urbach
Copy link
Contributor

urbach commented Mar 19, 2012

Albert: yes, I have updated my comments about the criteria with all the tests I did and where I think they are passed now

Maybe this should be moved to a wiki page...!?

@kostrzewa
Copy link
Member

but before we merge in c99complex?
are we reasonably sure about the code? The high statistics runs worked out?

AFAIK Carsten is getting good results to within machine precision for the expectation values. I don't know how the individual histories differ at this point.

Albert, you mentioned above (https://gist.github.com/2049892) that you saw strongly diverging histories even with the bug fixed in expo? I have no idea how worrisome this is.

@deuzeman
Copy link
Author

@urbach Yes, we can move that to a wiki page. That will also make sure we have the list for future reference.

@kostrzewa Well, I saw that the configurations diverge slowly, the differences creeping in over a series of trajectories. That's not necessarily a sign of anything particularly bad happening -- as I mentioned in that post, the rounding behaviour of the code is known to have been changed a little. It seems that there is a point where the two runs really decouple, but that could actually be down to a single Metropolis step resulting in something different.

@kostrzewa
Copy link
Member

Speaking of the diverging histories: There is clearly still a bug left in there. The rectangle part of the action is not computed correctly even in the first step!

@urbach
Copy link
Contributor

urbach commented Mar 19, 2012

well, if we have round off errors we have to see strongly diverging histories. Currently it takes about O(100) trajectories, which is reasonable, I think. And we get agreement within statistical uncertainties. So, I am not worried. Of course, we cannot exclude and error which hides at the order of round off.

Do you expect less differences? Any estimate for what you would expect?

@deuzeman
Copy link
Author

@kostrzewa Are you referring to the last column in that comparison? I believe that one refers to the timing information -- performance was a lot better with C99 for me.

@kostrzewa
Copy link
Member

Oh dear, indeed hmc0 does not have rectangles...

@urbach
Copy link
Contributor

urbach commented Mar 19, 2012

There is clearly still a bug left in there. The rectangle part of the action is not computed correctly even in the first step!

where do you see this?

@urbach
Copy link
Contributor

urbach commented Mar 19, 2012

because for me it is correctly computed. Don't mix up with the running time for the trajectory.

@kostrzewa
Copy link
Member

There is clearly still a bug left in there. The rectangle part of the action is not computed correctly even in the first step!
where do you see this?

False alarm, sorry about that. I forgot that hmc0 uses the standard wilson gauge action (and therefore the last column is the trajectory time). Although it would be informative to actually check this.

@urbach
Copy link
Contributor

urbach commented Mar 19, 2012

it is checked in sample-hmc3.input, for instance.

Guys, can we have a short skype conference to make this a bit more effective!? Say at 11:30?

@deuzeman
Copy link
Author

Euhm... I don't have the setup to do that here. I could probably arrange something, but not within twenty minutes... Is later in the afternoon an option?

@kostrzewa
Copy link
Member

Guys, can we have a short skype conference to make this a bit more effective!? Say at 11:30?

Sure, I've got my headphones with me. I just don't know whether it will work this time around as it failed miserably on the 14th... I'm okay with any time today before 16:30

@urbach
Copy link
Contributor

urbach commented Mar 19, 2012

say at 2 pm?

@deuzeman
Copy link
Author

That should work. I'll let you know in advance if it won't.

kostrzewa added a commit that referenced this pull request Mar 19, 2012
Conversion to c99 complex implementation.
@kostrzewa kostrzewa merged commit ae01fb2 into etmc:master Mar 19, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants