C99 complex #42

deuzeman · 2012-01-23T11:02:39Z

This a large set of changes, first of all weeding out all the occurrences of "complex" and "complex32" and replacing them by their C99 counterparts. After that easy part was done, all of the ".re" and ".im" occurrences were found and replaced by "creal()" and "cimag()" where applicable, or completely rewritten in terms of proper complex algebra. All complex defines were replaced by their operator equivalents as well, to make for more readable code.

While this code compiles for me and I am not aware of any concrete errors, it is highly likely that I missed out on one or two problematic cases and the meaning of the code may have subtly changed somewhere. Having unit tests available would be fantastic, but in their absence, I would urge whomever reviews this to be cautious.

kostrzewa · 2012-01-23T19:01:52Z

Implementing unit testing on our codebase is not an easy task, mainly because of the many interdependencies and initialisations. For this particular part I can imagine it would not be as complicated as long as we can figure out a representative set of tests for the affected parts. I will attempt to figure something out but I'm really just startig out with the unit testing stuff in C.

kostrzewa · 2012-01-23T19:11:22Z

D_psi.c

-  rho1.im=g_mu;
-  rho2.re=1.;
-  rho2.im=-g_mu;
+  rho1 = (1.) + cimag(rho1) * I;


Have these lines been created by an automated script? Seems a bit funny but I might be missing context.

Yes, I've used an automated script, then polished the result. I cleaned up a fair number of this type of assignment, but will have missed more than just this. Assignments like this look weird, but I believe they are generally semantically correct. The idea would be to clean them up when we find them.

deuzeman · 2012-01-23T21:20:45Z

The sample input that Carsten is writing for the solvers might be a start for unit tests that would at least partially cover this.

deuzeman · 2012-01-23T21:26:39Z

block.c

@@ -944,8 +944,7 @@ void compute_little_D_diagonal() {
      Block_D_psi(&block_list[blk], tmp, block_list[blk].basis[i]);
      for(j = 0; j < g_N_s; j++) {
 	M[i * g_N_s + j]  = scalar_prod(block_list[blk].basis[j], tmp, block_list[blk].volume, 0);
-	block_list[blk].little_dirac_operator32[i*g_N_s + j].re = M[i * g_N_s + j].re;
-	block_list[blk].little_dirac_operator32[i*g_N_s + j].im = M[i * g_N_s + j].im;
+	block_list[blk].little_dirac_operator32[i*g_N_s + j] = creal(M[i * g_N_s + j]);


Same as below, actually...

deuzeman · 2012-01-23T21:31:10Z

Not sure what is the best thing to do here, but I suppose it makes sense that I make corrections to the code in my personal fork. I could then close the request and send out a pull request for the mended version. Or would two merges be fine?

kostrzewa · 2012-01-24T07:12:34Z

On 23/01/12 22:31, Albert Deuzeman wrote:

Not sure what is the best thing to do here, but I suppose it makes sense that I make corrections to the code in my personal fork. I could then close the request and send out a pull request for the mended version. Or would two merges be fine?

Anything you commit will show up in the pull request. No need to close
and reup.

Bartek

kostrzewa · 2012-01-24T12:19:29Z

The sample input that Carsten is writing for the solvers might be a start for unit tests that would at least partially cover this.

Well, but those are not really unit tests in the sense that they test very large chunks of code rather than the smallest possible subsets. Having that, however, would already be a huge improvement. Especially if we have some "sane" results to go by.

kostrzewa · 2012-01-24T12:34:53Z

I was wondering how we should organize reviewing this. Do you want some more time to go through it on your own and then two or three of us read through different sections, making notes as I did yesterday?

deuzeman · 2012-01-24T13:55:29Z

Originally, I had in mind that we could have a first implementation that would be semantically correct at least, if not particularly nicely written. After your comments, I decided to see how much work it would be to get rid of at least most of the remaining ugliness too. It wasn't as bad as I had though, so I think my additional commits have polished things quite a bit.

At this point, I just don't know of any other points where immediate improvement is to be had, though they're probably there. So for that reason, I would really be happy for fresh eyes on the work. The alternative would perhaps be for me to let the stuff rest for a bit and then get back to rereading everything, but I would prefer to actually move on to the smearing itself.

While test cases for the solvers might not themselves make for very good unit tests, they would actually be very good here. If I did things correctly, the C99 version should be completely equivalent to the previous version. Since the solvers use a good deal of the complex functionality, getting matching results would give me some confidence that things are in order generally.

kostrzewa · 2012-01-25T09:17:38Z

I also found a few files that were missed by the script:

benchmark.c
check_locality.c
hopping_test.c
init_jacobi_field.c
jacobi.c
overrelaxation.c
sf_calc_action.c
xchange_field.c (this is only in a comment)
test/scalar_prod_r_test.c
smearing/utils_print_su3.c
smearing/utils_project_antiherm.c
smearing/utils_project_herm.c
smearing/utils_reunitarize.c
smearing/utils_reunitarize_MILC.c
test/check_nan.c
test/test_eigenvalues.c

I'm not sure whether these are all of them. I found them by comparing the diff of this pull request with a

grep "\.re" ./*.[c,h]

and

grep "\.re" */*.[c,h]

so I might well have missed some where only imaginary parts were taken.

deuzeman · 2012-01-25T09:53:45Z

Thanks! It's true that the smearing directory hasn't been included yet, because I need to go over those files for different reasons anyway. For the other files, I guess they were never compiled, which is how I checked. I'll have a look this afternoon, but there may actually be some dead code in that list as well.

deuzeman · 2012-01-25T14:33:59Z

Sloppy that I hadn't done the grep myself, I know, but that should be all... :)

Siebren · 2012-01-26T15:59:19Z

linalg/assign_diff_mul_bi.c

@@ -3,23 +3,24 @@
 *
 * This file is part of tmLQCD.
 *
- * tmLQCD is free software: you can redistribute it and/or modify


The script unfortunately is also attacking comment lines. Should be an easy fix, but put it on a list somewhere.

kostrzewa · 2012-02-10T16:12:55Z

It fails to compile for me right now, complaining about multiple definitions of smearing routines.

deuzeman · 2012-02-10T16:34:15Z

I thought I had dealt with this, but apparently not... :( Hmm. I think I'll solve this by removing the smearing directory from build system path temporarily. That code is in flux and currently unused, so there doesn't seem to be much point in patching it up in this state.

deuzeman · 2012-02-10T20:49:01Z

Compiles cleanly for me, again. Sorry about that.

Note that there was supposed to be a compiler flag to switch to C99 complex. The current implementation is essentially this one, though 'complex' is replaced by the standard _Complex double.

… using C99 in io.

kostrzewa · 2012-03-17T20:54:46Z

but on the BG the c99complex version is a factor 20 slower, even if the BG optimisation is switched on
arg...! :(

can you check for NaNs? because that sort of slowdown would be indicative of that and therefore some remaining problem. We found a few spots which created NaNs on Intel hardware too, leading to that type of slowdown as program dealt with the NaN arithmetic.

urbach · 2012-03-18T09:15:46Z

no NaNs at all, the result does agree with that of the old code...

urbach · 2012-03-18T17:21:47Z

just compare this (its interactive with llrun on jugene test rack):

The number of processes is 32 
The lattice size is 32 x 16 x 16 x 16
The local lattice size is 16 x 8 x 8 x 4
Communication switched on:
(4 Mflops [64 bit arithmetic])

Communication switched off: 
(28 Mflops [64 bit arithmetic])
The size of the package is 221184 bytes.
The bandwidth is  0.37 +  0.37 MB/sec

versus old code:

total time 2.065312e+00 sec, Variance of the time 6.847701e-06 sec. (itarations=4096). 

(639 Mflops [64 bit arithmetic])

communication switched off 
(684 Mflops [64 bit arithmetic])

The size of the package is 221184 Byte 
The bandwidth is 746.10 + 746.10   MB/sec

urbach · 2012-03-18T17:39:12Z

hmm, interesting: in the commit
d7a9b81089302fbd78d9b4699c97f5d2b2706fd6
I replaced in

bgl.h

all creal with __creal
and cimag with __cimag

and I am getting the performance of the old code now. So, what about merging this pull request?!

kostrzewa · 2012-03-19T08:45:54Z

and I am getting the performance of the old code now. So, what about merging this pull request?!

That's great to hear. There's one last thing that springs to mind though: Given the abysmal performance of the non-builtin complex operations on XLC/BGP, it would be cautious to test the performance of cexp. As far as I understand, cexp is used extensively in the cayley-hamilton exponentiation the smearing code and it would be quite disappointing to find it bogged down substantially because of the lack of a "built-in" implementation.

urbach · 2012-03-19T09:31:22Z

well, we have to write a BG version of the clover term anyhow, and cexp should be replacable by an optimised version, too, shouldn't it?

What would be the alternative? I don't see any!?

kostrzewa · 2012-03-19T09:38:51Z

cexp should be replacable by an optimised version, too, shouldn't it?

there is no __cexp as far as I know (unless the xlc docu is out of date or doesn't reflect the true state of affairs) ( http://publib.boulder.ibm.com/infocenter/compbgpl/v9v111/index.jsp )

What would be the alternative? I don't see any!?

Implementing our own optimized cexp...

urbach · 2012-03-19T09:45:46Z

but before we merge in c99complex?

deuzeman · 2012-03-19T09:46:27Z

I was gone for the weekend, so there's a bit of catching up I have to do. Reading your comments about cexp, however, I must say I'm not too worried. We should be able write an optimized version just for the BG without too many problems and it isn't going to come into play for some time -- we're not using the Cayley-Hamilton method in production yet.

Bartek, do you any sources on the problem? One could always rewrite cexp in terms of sines and cosines -- is that broken somehow?

deuzeman · 2012-03-19T09:47:54Z

As a bit of a tl;dr for now... are we reasonably sure about the code? The high statistics runs worked out?

urbach · 2012-03-19T09:51:45Z

Albert: yes, I have updated my comments about the criteria with all the tests I did and where I think they are passed now

Maybe this should be moved to a wiki page...!?

kostrzewa · 2012-03-19T09:52:00Z

but before we merge in c99complex?
are we reasonably sure about the code? The high statistics runs worked out?

AFAIK Carsten is getting good results to within machine precision for the expectation values. I don't know how the individual histories differ at this point.

Albert, you mentioned above (https://gist.github.com/2049892) that you saw strongly diverging histories even with the bug fixed in expo? I have no idea how worrisome this is.

…ormance of the old code Conflicts: bgl.h

deuzeman · 2012-03-19T09:57:15Z

@urbach Yes, we can move that to a wiki page. That will also make sure we have the list for future reference.

@kostrzewa Well, I saw that the configurations diverge slowly, the differences creeping in over a series of trajectories. That's not necessarily a sign of anything particularly bad happening -- as I mentioned in that post, the rounding behaviour of the code is known to have been changed a little. It seems that there is a point where the two runs really decouple, but that could actually be down to a single Metropolis step resulting in something different.

kostrzewa · 2012-03-19T09:57:51Z

Speaking of the diverging histories: There is clearly still a bug left in there. The rectangle part of the action is not computed correctly even in the first step!

urbach · 2012-03-19T09:58:40Z

well, if we have round off errors we have to see strongly diverging histories. Currently it takes about O(100) trajectories, which is reasonable, I think. And we get agreement within statistical uncertainties. So, I am not worried. Of course, we cannot exclude and error which hides at the order of round off.

Do you expect less differences? Any estimate for what you would expect?

deuzeman · 2012-03-19T10:03:00Z

@kostrzewa Are you referring to the last column in that comparison? I believe that one refers to the timing information -- performance was a lot better with C99 for me.

kostrzewa · 2012-03-19T10:03:50Z

Oh dear, indeed hmc0 does not have rectangles...

urbach · 2012-03-19T10:04:28Z

There is clearly still a bug left in there. The rectangle part of the action is not computed correctly even in the first step!

where do you see this?

urbach · 2012-03-19T10:06:07Z

because for me it is correctly computed. Don't mix up with the running time for the trajectory.

kostrzewa · 2012-03-19T10:06:25Z

There is clearly still a bug left in there. The rectangle part of the action is not computed correctly even in the first step!
where do you see this?

False alarm, sorry about that. I forgot that hmc0 uses the standard wilson gauge action (and therefore the last column is the trajectory time). Although it would be informative to actually check this.

urbach · 2012-03-19T10:08:29Z

it is checked in sample-hmc3.input, for instance.

Guys, can we have a short skype conference to make this a bit more effective!? Say at 11:30?

deuzeman · 2012-03-19T10:09:59Z

Euhm... I don't have the setup to do that here. I could probably arrange something, but not within twenty minutes... Is later in the afternoon an option?

kostrzewa · 2012-03-19T10:11:22Z

Guys, can we have a short skype conference to make this a bit more effective!? Say at 11:30?

Sure, I've got my headphones with me. I just don't know whether it will work this time around as it failed miserably on the 14th... I'm okay with any time today before 16:30

urbach · 2012-03-19T10:12:18Z

say at 2 pm?

deuzeman · 2012-03-19T10:14:59Z

That should work. I'll let you know in advance if it won't.

Conversion to c99 complex implementation.

kostrzewa reviewed Jan 23, 2012
View reviewed changes

deuzeman reviewed Jan 23, 2012
View reviewed changes

Siebren reviewed Jan 26, 2012
View reviewed changes

This was referenced Jan 30, 2012

Move from own complex to C99 complex #16

Closed

Remove APEnext define #21

Closed

Albert Deuzeman added 9 commits February 14, 2012 13:22

Remove the complex.h header.

921d787

Reduce su3.h to only the C99 complex implementations.

c48f52c

Note that there was supposed to be a compiler flag to switch to C99 complex. The current implementation is essentially this one, though 'complex' is replaced by the standard _Complex double.

Rewrite the functions in linalg/ in terms of C99 complex.

b215fc3

Replace all references to the custom complex class to the equivalents…

9105981

… using C99 in io.

Add (temporary) script for introducing C99 complex.

5b656a9

Small extensions of rewriting script

3043b22

Change header files outside of solver script for C99 compatibility

6b5dc43

Modify solver routines for C99 compatibility. Add forgotten block.h.

b23409d

Remove the deprecated stout code

a549915

urbach closed this Mar 18, 2012

urbach reopened this Mar 18, 2012

creal and cimag replaced by __creal and __cimag, now getting the perf…

b8a2b34

…ormance of the old code Conflicts: bgl.h

kostrzewa added a commit that referenced this pull request Mar 19, 2012

Merge pull request #42 from deuzeman/c99_complex

ae01fb2

Conversion to c99 complex implementation.

kostrzewa merged commit ae01fb2 into etmc:master Mar 19, 2012

C99 complex #42

C99 complex #42

Conversation

deuzeman commented Jan 23, 2012

kostrzewa commented Jan 23, 2012

kostrzewa Jan 23, 2012

Choose a reason for hiding this comment

deuzeman Jan 23, 2012

Choose a reason for hiding this comment

deuzeman commented Jan 23, 2012

deuzeman Jan 23, 2012

Choose a reason for hiding this comment

deuzeman commented Jan 23, 2012

kostrzewa commented Jan 24, 2012

kostrzewa commented Jan 24, 2012

kostrzewa commented Jan 24, 2012

deuzeman commented Jan 24, 2012

kostrzewa commented Jan 25, 2012

deuzeman commented Jan 25, 2012

deuzeman commented Jan 25, 2012

Siebren Jan 26, 2012

Choose a reason for hiding this comment

kostrzewa commented Feb 10, 2012

deuzeman commented Feb 10, 2012

deuzeman commented Feb 10, 2012

kostrzewa commented Mar 17, 2012

urbach commented Mar 18, 2012

urbach commented Mar 18, 2012

urbach commented Mar 18, 2012

kostrzewa commented Mar 19, 2012

urbach commented Mar 19, 2012

kostrzewa commented Mar 19, 2012

urbach commented Mar 19, 2012

deuzeman commented Mar 19, 2012

deuzeman commented Mar 19, 2012

urbach commented Mar 19, 2012

kostrzewa commented Mar 19, 2012

deuzeman commented Mar 19, 2012

kostrzewa commented Mar 19, 2012

urbach commented Mar 19, 2012

deuzeman commented Mar 19, 2012

kostrzewa commented Mar 19, 2012

urbach commented Mar 19, 2012

urbach commented Mar 19, 2012

kostrzewa commented Mar 19, 2012

urbach commented Mar 19, 2012

deuzeman commented Mar 19, 2012

kostrzewa commented Mar 19, 2012

urbach commented Mar 19, 2012

deuzeman commented Mar 19, 2012