# getting smearing to work #198

Open
opened this Issue Dec 19, 2012 · 139 comments

### 3 participants

European Twisted Mass Collaboration member

I've invested a bit of time into helping Albert in getting this rather complex but beautifully written piece of code working properly. I would like to use this issue to somewhat track my progress.

I started by turning the numerical derivative into a symmetric one to remove linear discretization errors. I then added a real numerical derivative for the Wilson plaquette gauge action to the gauge monomial. One trajectory resulting in a 100% change in the plaquette measurement only has a difference in dH of 1e-8 or so and acceptance is the same when compared to the trajectory computed with the analytical derivative.

Adding support for smearing was a bit trickier so I add a stout smearing control with hardcoded parameters in the gauge monomial derivative itself and destroy it afterwards. I also explicitly disabled the computation and accumulation of smearing force terms in update_momenta and stout_stout_smear.

With these constructs in place and the same parameters in the input file I am now able (apparently) to perform pure-gauge smeared HMC with 100% acceptance. (there is an input file in the branch https://github.com/kostrzewa/tmLQCD/tree/numerical_deriv_num_gauge_monomial ) The code still fails to produce acceptance for more than one iteration of stout smearing which is a bit strange.

Maybe the buffer swapping doesn't work properly? Or maybe there needs to be a separation between the updating of the gauge energy and the various acceptance and heatbath functions? Right now it is unfortunately necessary to set g_update_gauge_energy explicitly many times and hope that measure_gauge_action is called with the correct gauge field next time around... This really should be changed but since we'll never calculate the gauge energy on a smeared gauge field, this can wait as long as we make sure it's always called on g_gf.

Finally a comment on the numerical derivative. I think the type of thing implemented by Albert (rotations on one gauge link only, 8 components) is a great sanity check tool to implement on for a more general per-monomial basis. (call analytical derivative, compute, check for a flag, compute numerical derivative for this one gauge link, display output, at debug level 5, say) I'm not sure how it would be best integrated into the code though to be practical and inconspicuous. Since it scales with 64*VOLUME^2, it is unrealistic to do a full numerical derivative for anything other than the gauge monomial (and even here only for 2^4 volumes).

European Twisted Mass Collaboration member

I think the type of thing implemented by Albert (rotations on one gauge link only, 8 components) is a great sanity check tool to implement on for a more general per-monomial basis.

This needs a little bit of work to take into account parallelization. In that case it would also be better to compute something at a boundary and something in the bulk as an additional check.

European Twisted Mass Collaboration member

Testing smearing forces

I've now re-enabled the calculation of the force contribution of the smearing but I don't accumulate it in df at the end. I call the monomial derivative with df0 set to df (no remapping of df0!) and then explicitly call smear_forces with df as an input.

smearing_control_monomial[s_type]->force_result[1][1] then contains the following compared to df in the first trajectory, the first time the forces are computed:

# Time for GAUGE monomial derivative: 5.819154e-02 s
[DEBUG] Comparison of force calculation at [1][1]!
Involves monomials: GAUGE
Numerical total force <-> smear forces
[0]  +0.677441909147 <-> +0.674610818420
[1]  +0.651245056815 <-> +0.622717380971
[2]  -0.650447663020 <-> -0.645114865239
[3]  -0.384530363817 <-> -0.366998593273
[4]  +0.027326740337 <-> +0.057354221129
[5]  -0.583583420166 <-> -0.604351922979
[6]  +0.865918178761 <-> +0.860550664196
[7]  -0.944989812979 <-> -0.944308827867


Soon, however (still in the same trajectory), the two have diverged subtantially:

[DEBUG] Comparison of force calculation at [1][1]!
Involves monomials: GAUGE
Numerical total force <-> smear forces
[0]  +2.188621795085 <-> +2.513886660328
[1]  +2.763872868172 <-> +2.223844539664
[2]  -2.256187411831 <-> -2.343240983808
[3]  -0.849516123935 <-> -0.814242697326
[4]  +0.072232415960 <-> +1.132103245325
[5]  -2.549349795800 <-> -2.980789873106
[6]  +3.128423037424 <-> +3.023725888304
[7]  -4.163150993008 <-> -4.485607838646


If I can trust what I did then I would suspect that there is some buffer or something else which is not properly cleared between calls to smear_forces, therefore accumulating an error.

How large is the contribution to the force coming from the smearing expected to be?

That is a very useful comparison to have, nice work! What sort of configuration are these force comparisons made on, though?

European Twisted Mass Collaboration member

That is a very useful comparison to have, nice work! What sort of configuration are these force comparisons made on, though?

I guess I don't really understand the question. Since I have only one monomial and this one is smeared there is one relevant smearing and one smeared gauge field in update_momenta. This smeared gauge field should be, by construction, the same one that is used in the monomial derivative (as there I memmove from g_gf, rotate, then smear with a local control with the same parameters, so if I smeared g_gf right there and then, I would obtain the same smeared gauge_field as here in update_momenta)

  for (int s_ctr = 0; s_ctr < no_relevant_smearings; ++s_ctr)
{
int s_type = relevant_smearings[s_ctr];

smear(smearing_control_monomial[s_type], g_gf);
ohnohack_remap_g_gauge_field(smearing_control_monomial[s_type]->result);
g_update_gauge_energy=1;
g_update_rectangle_energy=1;

for(int k = 0; k < no; k++)
{
if (monomial_list[ mnllist[k] ].smearing == s_type)
{

if(monomial_list[ mnllist[k] ].derivativefunction != NULL)
{
#ifdef MPI
atime = MPI_Wtime();
#else
atime = (double)clock()/(double)(CLOCKS_PER_SEC);
#endif
// as df0 has not been remapped, this writes directly into df
monomial_list[ mnllist[k] ].derivativefunction(mnllist[k], hf);

#ifdef MPI
etime = MPI_Wtime();
#else
etime = (double)clock()/(double)(CLOCKS_PER_SEC);
#endif
}
}
}
smear_forces(smearing_control_monomial[s_type], df);

[ output comparison ]

}
ohnohack_remap_g_gauge_field(g_gf);


That should be legitimate, no?

European Twisted Mass Collaboration member

Okay, found one bug just need to figure out how to fix it. In stout_smear_plain, when multiple iterations are used the result blows up into NaN. I'm presuming, again, a fault in the buffer swap or so. If I explicitly construct the control with "compute force terms" enabled (and then simply discard the intermediate stuff from the call to stout_smear_with_force_terms), multiple iterations work just fine and I have acceptance.

European Twisted Mass Collaboration member

I looked into your concern regarding "which configuration the force is computed on" and moved the comparison into the gauge monomial. Here, for one particular component, direction and gauge link I rotate, smear and then smear_forces. The result does not agree with the numerical one. I think that should be a good sign that the force terms are wrong, no? Although I think what I did before was more correct...

European Twisted Mass Collaboration member
    stout_control* control = construct_stout_control(1,2,0.18);

for(int x = 0; x < VOLUME; ++x)
{
for(int mu = 0; mu < 4; ++mu)
{
xm=(double*)&hf->derivative[x][mu];
control->smearing_performed = 0;
for (int component = 0; component < 8; ++component)
{
double h_rotated[2] = {0.0,0.0};
printf("Rotating at %d %d in component %d\n",x,mu,component);
for(int direction = 0; direction < 2; ++direction)
{
/* Introduce a rotation along one of the components */
ar_rotation[component] = epsilon[direction];
exposu3(&mat_rotation, &rotation);
_su3_times_su3(rotated[direction][x][mu], mat_rotation, old_value);

stout_smear(control, rotated[direction]);

if(x == 1 && mu == 1 && direction == 1 && component == 1 ) {
stout_smear_forces(control,df);
fprintf(stderr, "[DEBUG] Comparison of force calculation at [1][1]!\n");
fprintf(stderr, "   smear forces <-> numerical total force\n");
fprintf(stderr, "    [%d]  %+14.12f <-> ", component, control->force_result[1][1].d1); //*/
}

// compute gauge action
g_update_gauge_energy = 1;
g_update_rectangle_energy = 1;
h_rotated[direction] = -g_beta*(mnl->c0 * measure_gauge_action(control->result)); //rotated[direction]));
// reset modified part of gauge field
} // direction
// calculate force contribution from gauge field due to rotation
xm[component] += (h_rotated[1]-h_rotated[0])/(2*eps);
if( x == 1 && mu == 1 && component == 1 )
fprintf(stderr, "%+14.12f\n", df[1][1].d1); //*/
} // component
} // mu
} // x
free_stout_control(control);
return_gauge_field(&rotated[0]);
return_gauge_field(&rotated[1]);

European Twisted Mass Collaboration member

Just as a side note I can now confirm that I have acceptance even with 4 iterations of stout smearing. For this small volume this pushes the plaquette up to 0.98 though :)

European Twisted Mass Collaboration member

Although I think what I did before was more correct...

Indeed, the very first time the force is computed the result from smear_forces in gauge_monomial.c is further from the numerical result. In my opinion this makes perfect sense because here the forces are computed on a rotated and then smeared gauge_field and not on one that has just been smeared.

[DEBUG] Comparison of force calculation at [1][1]!
smear forces <-> numerical total force
[1]  +0.949474738841 <-> +1.300266217186
[DEBUG] Comparison of force calculation at [1][1]!
Involves monomials: GAUGE
Numerical total force <-> smear forces
[0]  +1.300266217186 <-> +1.376853348262
[1]  +1.643219286507 <-> -0.445783308699
[2]  +5.559405138911 <-> +5.240163473078
[3]  -5.011576001834 <-> -4.214708159595
[4]  -3.231603625409 <-> -1.894558971907
[5]  -0.102085107301 <-> -1.403325081582
[6]  +3.653467098275 <-> +0.851817872119
[7]  -7.133829001305 <-> -7.357334898408


In any case I think we need to investigate the force computation now which will inevitably involve going through the derivation...

UPDATE

By the way, the second output still comes from update_momenta. I didn't actually notice the massive deviations in [1], [4], [5] and [6].

European Twisted Mass Collaboration member

In any case I think we need to investigate the force computation now which will inevitably involve going through the derivation...

Well, first I will go through the code line-by-line to check if I don't notice something that has gone by.

European Twisted Mass Collaboration member

One thing that perplexes me a little is the lack of any mention of beta in the computation of the force terms. The gauge field is stored independently of beta which is why in the computation of the gauge derivative there is an explicit multiplication by beta (in the form of "factor"). Does rho implicitly contain beta? Anyway, I'll read the papers and notes and then I'm sure I'll understand.

the force of the smeared part should be independent of the monomials at all, its just the chain rule, isn't it?

European Twisted Mass Collaboration member

the force of the smeared part should be independent of the monomials at all, its just the chain rule, isn't it?

The output here

[DEBUG] Comparison of force calculation at [1][1]!
smear forces <-> numerical total force
[1]  +0.949474738841 <-> +1.300266217186


is wrong because it is computed on a rotated gaugefield and cannot be correct. Further, the comparison I'm making in update_momenta is also wrong because the derivative that is fed into smear_forces already contains the smearing term implicitly. I will compute the derivative of the gauge monomial seperately analytically and call smear forced on this temporary derivative to see if any differences are present.

European Twisted Mass Collaboration member

I changed the code a bit more but I still have my doubts... I first compute the numerical derivative as explained before (the gauge field is the smeared one). Then I compute the analytical derivative but I remap df0 first, the gauge field is still mapped to the smeared field. Now I call smear_forces on this derivative.

Does the numerical derivative really capture the complete influence of the smearing or is there somethig missing, something too much? Since I get acceptance I would think that everything is correct,

# Time for GAUGE monomial derivative: 3.404317e-01 s
# Time for GAUGE monomial derivative: 8.992700e-05 s
[DEBUG] Comparison of force calculation at [1][1]!
numerical force <-> analytical force <-> analytical force + smeared
[0]  -1.399133145696 <-> -1.614094684864 <-> -1.581404238899
[1]  +1.379116918088 <-> +1.273875681870 <-> 1.230786767312
[2]  +3.851927692722 <-> +3.154854008718 <-> 3.174869727485
[3]  -2.446024618052 <-> -1.753064895060 <-> -1.745692061800
[4]  -0.683806388224 <-> -0.141420359739 <-> -0.104222017625
[5]  +2.037271710265 <-> +2.023458429842 <-> 1.976318650656
[6]  -0.187620292991 <-> -0.757265165433 <-> -0.818135956147
[7]  -2.591452161482 <-> -1.657051994768 <-> -1.692047414142
# Time gauge update: 1.118730e-04 s

European Twisted Mass Collaboration member

Reducing rho subtanitally seems to indicate that the numerical derivative is wrong after all... actually no.. forget what I just said!! Just had a parameter not upated. For rho 0.0001:

[DEBUG] Comparison of force calculation at [1][1]!
numerical force <-> analytical force <-> analytical force + smeared
[0]  -1.716202169177 <-> -1.716295097285 <-> -1.716294988623
[1]  +1.106646996618 <-> +1.106390433190 <-> +1.106390261847
[2]  +2.880228134927 <-> +2.879208553427 <-> +2.879208658993
[3]  -1.236969762886 <-> -1.235798700022 <-> -1.235798717189
[4]  +0.409221449615 <-> +0.410269441569 <-> +0.410269551942
[5]  +2.118844865606 <-> +2.118757717394 <-> +2.118757550812
[6]  -1.362196374544 <-> -1.363141425826 <-> -1.363141647349
[7]  -1.145151750848 <-> -1.143913614477 <-> -1.143913783982
# Time gauge update: 1.133960e-04 s


Without smearing (rho=0.0) everything agrees up to discretisation errors:

# Time for GAUGE monomial derivative: 1.753207e-01 s
# Time for GAUGE monomial derivative: 8.376600e-05 s
[DEBUG] Comparison of force calculation at [1][1]!
numerical force <-> analytical force <-> analytical force + smeared
[0]  -1.206807453968 <-> -1.206807453026 <-> -1.206807453026
[1]  +1.138477345819 <-> +1.138477345366 <-> +1.138477345366
[2]  +2.416553850537 <-> +2.416553851685 <-> +2.416553851685
[3]  -1.283008155895 <-> -1.283008157294 <-> -1.283008157294
[4]  -0.301118627988 <-> -0.301118628069 <-> -0.301118628069
[5]  +2.658250213017 <-> +2.658250212384 <-> +2.658250212384
[6]  -1.354716569324 <-> -1.354716569960 <-> -1.354716569960
[7]  -1.729094647374 <-> -1.729094647107 <-> -1.729094647107

European Twisted Mass Collaboration member

the force of the smeared part should be independent of the monomials at all, its just the chain rule, isn't it?

Hi, on the train ride to Luxembourg I read and tried to fully understand the Peardon and Morningstar paper. As far as I did it is not true that the smearing contribution to the force will be independent of the monomial(s) that use(s) smearing. Every part of the force computation implicitly contains Sigma' (through lambda and gamma) and therefore also contains details of the monomial(s).

The first iteration in the force computation is the derivative of the terms of the action that include fat links with respect to the fattest link. This is exactly the same as the standard derivative computation just evaluated on the fattest gauge configuration. The further iterations (there are at least two in total for one iteration of stout smearing) then use this force and the smearing (which types of staples are used in the fattening) definition to compute the full force iteratively. (equation 75 in the P&M paper). Therefore in the case of a pure gauge smeared action like the one the numerical derivative hmc computed, beta enters in the first step implicitly through what you (or Hasenbusch?) called the force factor in the code.

As far as I understand for a fully smeared pure gauge action, equation 41 in P&M would not contain a thin link contribution at all. (It would only enter at the lowest level (\sigma_0) of the analytical force computation, see eq 75 where the LHS would be \sigma_0 and U in the RHS would correspond to thin links.)

Note also that therefore I am now confident that the numerical derivative does the right thing with respect to smearing:

Using V for the smeared gauge field: I compute
dS/dU (eq 40) which for this action corresponds to
dS_st/dV dV/dU
where dV/dU would given by "( V(U+dU)-V(U-dU) )/(2eps)" (formal considerations required to carry out the chain rule "multiplication" )

or alternatively

(S[ V(U+dU) ] - S[ V(U-dU) ]) / 2*eps

this computes the full derivative because dS/dV would be (S[V+dV] - S[V-dV]) / 2*eps which is not the same! In that case only one component of the smeared field would be rotated while in the above definitions the rotation in U propagates to neighbouring links as it corresponds to the steps: rotage gauge field, smear, compute derivative in that order.

For an action in which only the fermionic part is smeared, there would of course still be the standard thin link gauge derivative and no fat link gauge derivative. The fat link derivative would come in through the fermion terms only.

I believe I now have a good enough understanding to attempt to follow the force computation in the code and spot problems. I will just have to read the BMW and Albert's notes carefully when I get back from holidays.

I also think that I can actually compute the iterations numerically by follwing eqn 75. To do this I would first compute the numerical derivative with respect to the smeared field (so smear first, rotate a component in the smeared field and then evaluate the derivative) and then feed this to the final force computation.

Every part of the force computation implicitly contains Sigma' (through lambda and gamma) and therefore also contains details of the monomial(s).

This is true, but the point is that all of the information has already been encoded within the single input force vector. The modifications to that vector made by the smearing routine are completely independent of the details of its production. For that reason, there should really be no entanglement between the code for the specific monomials and the smearing of the forces. It's not clear to me why you implemented the force calculation the way you did.

Note also that therefore I am now confident that the numerical derivative does the right thing with respect to smearing:

Using V for the smeared gauge field: I compute
dS/dU (eq 40) which for this action corresponds to
dS_st/dV dV/dU
where dV/dU would given by "( V(U+dU)-V(U-dU) )/(2eps)" (formal considerations required to carry out the chain rule >"multiplication" )

or alternatively

(S[ V(U+dU) ] - S[ V(U-dU) ]) / 2*eps

this computes the full derivative because dS/dV would be (S[V+dV] - S[V-dV]) / 2*eps which is not the same! In that case only one component of the smeared field would be rotated while in the above definitions the rotation in U propagates to neighbouring links as it corresponds to the steps: rotate gauge field, smear, compute derivative in that order.

For an action in which only the fermionic part is smeared, there would of course still be the standard thin link gauge derivative and no fat link gauge derivative. The fat link derivative would come in through the fermion terms only.

Agreed. Its physically more correct to think of the smearing as a part of the definition of the operators appearing in the action, rather than a procedure applied to the gauge fields. What we care about is the change induced in some scalar product of those operators, given a small change in the underlying gauge field. The correct point to act with a small rotation if we want to know the MD forces is therefore always the original gauge field, rather than what amounts to some arbitrary stage half-way the calculation of the operator.

(formal considerations required to carry out the chain rule "multiplication" )

This part is actually much better discussed in the BMW paper. Due to the constraints imposed by the group structure, one has to explicitly define what one means by the multiplication. Unfortunately, because life would have been much simpler with a proper factorization of the different contributions...

I also think that I can actually compute the iterations numerically by following eqn 75. To do this I would first compute the numerical derivative with respect to the smeared field (so smear first, rotate a component in the smeared field and then evaluate the derivative) and then feed this to the final force computation.

While some procedure like this should work to get the different stages out, I'm not sure how much we'd gain. Since equation 75 in P&M and its equivalent 3.35 in BMW are iterative and one level of stout smearing doesn't work yet, we can first try to work that single level out. In that case, the value of Sigma is taken directly from the monomial and is known to be correct. It would be very beneficial if we could find some way to segregate the different contributions to the force calculation beyond that, though.

European Twisted Mass Collaboration member

This is true, but the point is that all of the information has already been encoded within the single input force vector. The modifications to that vector made by the smearing routine are completely independent of the details of its production.

Yes, I understand and agree!

For that reason, there should really be no entanglement between the code for the specific monomials and the smearing of the forces. It's not clear to me why you implemented the force calculation the way you did.

Only out of convenience because I know that I can't do a numerical computation for anything other than the gauge monomial. It seemed easiest to explicitly disable force smearing in update_momenta and compute the entire derivative in the monomial. (rather than adding a complicated set of conditionals to the steps in update_momenta which call the derivatives) And since I need to rotate first and only then smear, this approach would have been quite clumsy in update_momenta, where I would have to pass the smearing control [or at least the parameters] through to the derivative (thereby needing to adjust the derivativefunction interface and so on...)

European Twisted Mass Collaboration member

[...] and one level of stout smearing doesn't work yet, we can first try to work that single level out.

That was the plan but in a sufficiently general way to make it work for multiple iterations too.

In that case, the value of Sigma is taken directly from the monomial and is known to be correct.

Just to be concrete, you are referring to Sigma' rather than Sigma, correct?

It would be very beneficial if we could find some way to segregate the different contributions to the force calculation beyond that, though.

I absolutely agree, of course.

Just to be concrete, you are referring to Sigma' rather than Sigma, correct?

Yes, I was.

Hey guys, merry christmas!

European Twisted Mass Collaboration member

:) Merry Christmas Carsten and Albert :)

:D A merry Christmas to both of you as well! 🎄

I made some progress. While I hope I can push this a little further soon, I'm going to be somewhat occupied in the coming days by personal matters. That's why I figured I should post what I suspect now. Sorry if it's a little rambling, but I'm afraid this is the best I can do for now. If nothing else, it can function as a reminder to myself.

Since we know the numerical derivative is correct even for smeared code, I figured I'd try to explicitly check a very simple case. First make sure the formulae as we understand them produce the numerical result, then make sure the code replicates the formulae. Any errors would have to show up along the way.

My test case was a unit gauge field, but with the [0][0] component set to the exponential of a matrix proportional to lambda_3. That makes for a diagonal SU(3) element, with the last diagonal element equal to unity. I calculated the forces for a pure gauge simulation, with a single level of stout smearing on that gauge monomial.

This setup produces an immediate deviation on the first element of the forces, but in an interesting way. The force calculated for an unsmeared field is only non-zero for the third entry, which makes intuitive sense. The force as calculated by the monomial on the smeared field has the same structure. And so does the force that is numerically calculated for the smeared action. But the analytical calculation of the smeared force has non-zero entries in the third and eighth entry, so it's mixing with the other diagonal generator of SU(3).

After poking around in the code and going through some calculations with Matlab, I'm now suspecting that there may be a problem in the definition of Z as given by BMW. I may be overlooking something or having a mistake somewhere in my calculations, but it doesn't seem to have the right symmetry properties. Note that its definition, except for the added factor of U^\dagger, is identical to that of \Lambda in the P&M paper. But in the latter, \Sigma is defined without an application of the smeared gauge field. If I use that definition and follow P&M, the first part of the force corrections (up to the staple derivatives) produces a result just in the third entry. If I follow BMW's definition, things break down already there...

European Twisted Mass Collaboration member

I give up... I've checked the calculation and I think I now match P&M everywhere so...

European Twisted Mass Collaboration member

Are we confident that the P&M paper is correct? Or more likely, are we sure that the projection of the force from the adjoint to the defining representation is the correct thing to do? Maybe we should transpose after the projection?

Sorry, how does this fit with Alberts latest observations? You would say its not the definition of Z?

Albert, the code you used for your latest test, is it available? It should be possible to find out where the non-zero eights element comes from...!?

European Twisted Mass Collaboration member

I will need to look at BMW carefully but I think the factor of V drops out in the actual definition of the smeared force.

are we using the correct basis for the generators?

European Twisted Mass Collaboration member

We are using the functions already present in tmLQCD and we correct for the factor of -2 after projecting back to the adjoint (from trace_lambda).

European Twisted Mass Collaboration member

Regarding your worries about Z, according to 3.32 when it is eventually used the extra factor of U^dagger is cancelled off by the multiplication from the left by U.

From the same equation I can see that the only difference in the final force is a multiplication from the left by U in the first term, by U^dagger from the right in the second term and I guess by U in the third and final term.

The P&M paper should make sense one way or the other, or I'm sure Istvan would have mentioned something. But we may be interpreting it wrongly. And then I am also thinking of something like a subtle issue in the translation between different representations, such that what we use as Sigma is not what appears in their derivation. Not sure where that would be happening, though. We're mainly recycling routines from the rest of the code here, so it's hard to imagine we're mixing up conventions or something.

I don't think I've pushed all the testing modifications to Github, mainly because it produces a pretty mangled executable. But I can make the branch available.

The issue that I saw should not be related to the projection, though. Since I was working with diagonal matrices everywhere, the projection simply amounts to taking the imaginary component of the matrix.

European Twisted Mass Collaboration member

And then I am also thinking of something like a subtle issue in the translation between different representations, such that what we use as Sigma is not what appears in their derivation.

I also doubt this should be a problem because our projections are inverses of eachother, so it doesn't really matter which particular form of the generators we use...

Regarding your worries about Z, according to 3.32 when it is eventually used the extra factor of U^dagger is cancelled off by the multiplication from the left by U.

True, that factor does get cancelled. But I'm not sure the factor V^dagger in Sigma (3.24) can be cancelled in a similar manner when Sigma appears rather non-trivially in the definition of Z. Mind you, I did at some point convince myself that there was no difference between the two definitions. So I was either overlooking something then, or I'm doing it now... ^_^

From the same equation I can see that the only difference in the final force is a multiplication from the left by U in the first term, by U^dagger from the right in the second term and I guess by U in the third and final term.

I think the sandwiching works out in the end, it should be a convenient rewriting of P&M's expression anywhere, except possibly for Sigma in Z and, as far as I'm aware, for the two terms with a negative prefactor in the staple derivative. For the latter, it seems there's some hermitian conjugate being taken and I think the difference between the two definitions is cancelled in the projection.

European Twisted Mass Collaboration member

But I'm not sure the factor V^dagger in Sigma (3.24) can be cancelled in a similar manner when Sigma appears rather non-trivially in the definition of Z

Yes, I agree... also, the post-multiplication of sigma by V in 3.32/3.33 is essential to recover exp(A) [through post-multiplication with U^dagger]. This V should therefore better not be involved in any cancellations!

Hey Albert, isn't it possible to backtrace where the non-zero eights element came from?

European Twisted Mass Collaboration member

My test case was a unit gauge field, but with the [0][0] component set to the exponential of a matrix proportional to lambda_3. That makes for a diagonal SU(3) element, with the last diagonal element equal to unity.

Which definition of lambda_3 are you using? When I run such a matrix to exposu3_inplace this results in the unit matrix and when I run it though cayley hamilton I get a diagonal matrix proportional to the unit matrix. I might be using the wrong lambda though...

~ lambda_3
[ + 1.550000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 1.550000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]

exp_in_place(~lambda_3) [projects onto gell-mann and then exponentiates that]
[ + 1.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, + 1.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, + 1.000000000000 - 0.000000000000 * i]

~cayley hamilton(~lambda_3)
[ + 0.020794827803 + 0.999783764189 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, + 0.020794827803 - 0.999783764189 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, + 1.000000000000 - 0.000000000000 * i]


European Twisted Mass Collaboration member

Which definition of lambda_3 are you using?

I guess you used the traceless antihermitian version, correct?

European Twisted Mass Collaboration member

Doing the same thing I first of all see that Cayley hamilton breaks down for this... But exposu3_inplace does what you say. However, I see no such deviation in the eighth component (with my P&M code):

P&M

~ lambda_3 (TA)
[ - 0.000000000000 + 1.550000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 1.550000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]

gauge field at [1][1]
[ + 0.020794827750 + 0.999783764185 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, + 0.020794827750 - 0.999783764185 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, + 1.000000000000 - 0.000000000000 * i]

force:
[DEBUG] Comparison of force calculation at [1][1]!
numerical force <-> analytical force <-> analytical force + smeared
[0]  +0.000000000000 <-> +0.000000000000 <-> +0.000000000000
[1]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[2]  -4.873255619486 <-> -5.236968500253 <-> +4.080262758119
[3]  +0.000000000000 <-> +0.000000000000 <-> +0.000000000000
[4]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[5]  +0.000000000000 <-> +0.000000000000 <-> +0.000000000000
[6]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[7]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000

European Twisted Mass Collaboration member

I first of all see that Cayley hamilton breaks down for this

this is, of course, for the Cayley Hamilton as used in P&M where it returns exp(iM)...

European Twisted Mass Collaboration member

Ooops... I made a big mistake.. sorry! I had the P&M commit in what I though was my BMW branch.. I only noticed after the last message.. I can confirm that the BMW version of the code produces the weirdness observed by Albert:

BMW

[ - 0.000000000000 + 1.550000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 1.550000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]

[ + 0.020794827750 + 0.999783764185 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, + 0.020794827750 - 0.999783764185 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, + 1.000000000000 - 0.000000000000 * i]
# Initialised monomial of type GAUGE, no_monomials= 1
# monomial id 0 type = 2 timescale 0
# Computed plaquette value: 0.959199784490.
#
# Starting trajectory no 1
called gauge_heatbath for id 0 1 energy0 = 488.040850
# Time for numerical GAUGE monomial derivative: 7.811116e-02 s
# Time for analytical GAUGE monomial derivative: 2.783700e-05 s
[DEBUG] Comparison of force calculation at [1][1]!
numerical force <-> analytical force <-> analytical force + smeared
[0]  +0.000000000000 <-> +0.000000000000 <-> +0.000000000000
[1]  +0.000000000000 <-> +0.000000000000 <-> +0.000000000000
[2]  -4.873255619486 <-> -5.236968500253 <-> -5.324035854924
[3]  +0.000000000000 <-> +0.000000000000 <-> +0.000000000000
[4]  +0.000000000000 <-> +0.000000000000 <-> +0.000000000000
[5]  +0.000000000000 <-> +0.000000000000 <-> +0.000000000000
[6]  +0.000000000000 <-> +0.000000000000 <-> +0.000000000000
[7]  +0.000000000000 <-> +0.000000000000 <-> -1.147610951801

European Twisted Mass Collaboration member

And hence there is also no agreement between P&M and BMW:

BMW

    [0]  -0.262370019755 <-> +0.107905120571 <-> +0.129676042557
[1]  -0.887668051064 <-> +0.467106242213 <-> +0.471951073001
[2]  +0.241326313244 <-> -0.160306431233 <-> -0.182220618397
[3]  -0.225912856422 <-> +0.246215513575 <-> +0.252953544621
[4]  +0.404052889280 <-> -0.250874926745 <-> -0.240528280807
[5]  +0.635294225049 <-> -0.407611167750 <-> -0.408778522129
[6]  -0.295563387454 <-> +0.302704762536 <-> +0.288898460358
[7]  +0.847369193480 <-> -0.603752630725 <-> -0.591925958260


P&M

    [0]  -0.262370195969 <-> +0.107905141341 <-> -0.110081222496
[1]  -0.887668170435 <-> +0.467106247189 <-> -0.127970716785
[2]  +0.241326529249 <-> -0.160306450699 <-> -0.089983558874
[3]  -0.225912907581 <-> +0.246215547881 <-> -0.045263761964
[4]  +0.404052951808 <-> -0.250874930380 <-> +0.064943694746
[5]  +0.635294054518 <-> -0.407611191183 <-> +0.018747520689
[6]  -0.295563279451 <-> +0.302704711049 <-> +0.006915465669
[7]  +0.847369142321 <-> -0.603752667446 <-> +0.042348747294


As you can see the smearing is the same but the force computation is not compatible. It's a shame neither of them is correct though at this stage...

My next step was to try to introduce a different projector which uses the standard form for the Gell-Mann matrices and see what happens.

European Twisted Mass Collaboration member

Sorry again about the confusion.. I don't know how that commit snuck in there...

European Twisted Mass Collaboration member

Maybe we should ask Istvan whether he know details about the development of the P&M implementation in 'his' code?

It's good that you see the same issue, but indeed a shame that the P&M formalism doesn't produce the right answer either. Still, it does a better job than the rather random mixing of BMW...

First, with regards to your remarks on the Cayleigh-Hamilton method... It wasn't quite clear to me if you think there's an issue there in the end. I have seen it produce correct results, when comparing to Urs' stout smearing code and to Matlab's results for exponentiation. But it's been written to work with BMW's conventions, so it actually calculates exp(A) for A being a traceless antihermitian matrix. Was that accounted for?

Of course, the values of B1 and B2 cannot be checked easily in that manner, so they may still be buggy somehow. They involve some subtle cancellations, however, leading to obviously problematic values when I had a sign wrong earlier. There could, however, be an overall phase missing for those matrices. Maybe something to look into as well?

European Twisted Mass Collaboration member

First, with regards to your remarks on the Cayleigh-Hamilton method... It wasn't quite clear to me if you think there's an issue there in the end. I have seen it produce correct results, when comparing to Urs' stout smearing code and to Matlab's results for exponentiation. But it's been written to work with BMW's conventions, so it actually calculates exp(A) for A being a traceless antihermitian matrix. Was that accounted for?

Yes, I changed it to compute exp(iQ) and I now store Q in place of A and I've taken care to change all factors of (-i)^2 and (-i) that you introduced. This is why it blew up when I exponentiated the matrix further above. (so it doesn't work as a replacement for exposu3_inplace anymore)

Of course, the values of B1 and B2 cannot be checked easily in that manner, so they may still be buggy somehow. They involve some subtle cancellations, however, leading to obviously problematic values when I had a sign wrong earlier. There could, however, be an overall phase missing for those matrices. Maybe something to look into as well?

Well, I checked the calculation line by line w.r.t. to the P&M paper and I can't find fault with it.

I have found the stout implementation in Istvan's code and am currently trying to decode what's done. Luckily it follows the P&M paper to the letter. (not even optimizing multiplications) (/homea/hch02/hch02c/ForAbdou/UpdateCode.JUGENE In Juelich)

European Twisted Mass Collaboration member

Yes, I changed it to compute exp(iQ) and I now store Q in place of A and I've taken care to change all factors of (-i)^2 and (-i) that you introduced. This is why it blew up when I exponentiated the matrix further above. (so it doesn't work as a replacement for exposu3_inplace anymore)

oh wait, but then it computes the wrong B1 and B2, no? I mean, I changed exponent_from_coefficients.. Oh wait, no it computes it correctly since I store Q!

Hey! I think you are making really good progress! Thanks for all of this! Not having looked into the details of the code, but where does the non-zero eights element show up exactly. I would think that this can be traced back? And then its maybe easier to understand whats going wrong (which was my original intention by proposing this kind of check).

I'll try to implement a rational monomial while you debug smearing, unless I can be of help somewhere...!

Hmm, and which terms do violate the proportionality?

European Twisted Mass Collaboration member

Potentially, yes. But I think they may be derivatives of IQ, rather than Q. I think I looked at it and concluded it was fine. But since something must be wrong somewhere...

I think that everything is fine since B_i = b_i0 * I + b_i1 * Q + b_i2 * Q^2 and hence my exponent_from_coefficients computes the correct B1 and B2 with the factors of -i and -1 removed from f1 and f2 respectively. (as I store Q instead of A)

European Twisted Mass Collaboration member

Hmm... looking at Istvan's code sigma^prime seems to contain more than just the standard force... The computations of Gamma, Lambda and the cayley hamilton seem to match what I have now. The only differences are:

• in line 755 of stout.C, before sigma is projected back to the adjoint representation for use in the HMC, it's multiplied by the unsmeared gauge field (the computation of sigma is the same)
• in line 2228 of update.C, after the (quark) force with respect to the maximally smeared gauge field has been computed but before entering the stout force smearing loop, the quark force is POST-multiplied by V^dagger and then saved as sigma^prime (so it matches neither P&M nor BMW I guess)

so confused...

Where did you obtain Istvan's code?

European Twisted Mass Collaboration member

I have found the stout implementation in Istvan's code and am currently trying to decode what's done. Luckily it follows the P&M paper to the letter. (not even optimizing multiplications) (/homea/hch02/hch02c/ForAbdou/UpdateCode.JUGENE In Juelich)

European Twisted Mass Collaboration member

Needless to say these changes don't fix our problem...

Hmm, and which terms do violate the proportionality?

I was writing a more elaborate answer, but I'm not sure it really clarifies that much. So in short. I'm going to leave indices out, but I'm looking at the local calculation at the point where I introduced the 'delta spike' in the gauge field.

Taking equation 3.31, we need the projected part of Z to be proportional to \lambda_3, in order for the part of the forces without staple derivatives (the part sandwiched between U and U^\dagger) to be constrained to \lambda_3.

Looking at the different components going into that operator, all of the 'matrix parts' are actually proportional to \lambda_3 after projection, except for A^2 which is itself proportional to the identity. This means that all scalars multiplying them (f1, f2 , Tr(U \Sigma B1) and Tr(U \Sigma B2) have to be real, or the product will be proportional to some superposition of \lambda_3 and \lambda_8. Nothing special about \lambda_8, by the way, it's just tjhat together they form a basis for the space of traceless anti-hermitian diagonal matrices of rank 3.

Unfortunately, f1 and Tr(U \Sigma B2) are both pure imaginary. For Z to be proportional to \lambda_3, they will need to cancel exactly for all values of \rho and any chosen rotation angle. And if Z isn't proportional to \lambda_3, it's as good as impossible to get the total force to be proportional to it. We'd need some miracle cancellations for all parameter values...

What makes me doubt the whole definition given here, is that f1 should be pure imaginary. It has to be correct, or the exponentiation would be off.

Is this at all helpful? Or should I write something out in LaTeX?

Hmm... looking at Istvan's code sigma^prime seems to contain more than just the standard force...

Some convention in his code? It seems a bit odd, otherwise.

Needless to say these changes don't fix our problem...

Yeah, of course that would be too easy. :(

European Twisted Mass Collaboration member

Hmm... looking at Istvan's code sigma^prime seems to contain more than just the standard force...

Some convention in his code? It seems a bit odd, otherwise.

Well.. the multiplication by V^dagger certainly embeds it deeply into Sigma through the usage of sigma^prime while the multiplication by U[0] could potentially be conventional... Doing only the former doesn't help either though.

As a follow-up on the potential of an issue in Z... I've commented out the contributions to Z proportional to f1 and Tr(U \Sigma B2) -- i.e., lines 14-16 and 25-27. This is enough to remove the \lambda_8 component from the analytical force calculation altogether, so the problem really does originate there somehow.

Quite puzzling. This is all directly from the paper. Is it just plain wrong? Or are we misreading?

But the force is not correct now, by chance? Well, can't be...

It's speculation as long as we don't have a working expression, but I can confirm that rewriting the BMW Z using P&M's definition of Lambda -- i.e., don't multiply the given force by V^\dagger within the expression for Z -- gets rid of the unwanted mixing. That's not quite trivial, because it means the imaginary versus real mess above is fixed everywhere. It's also somewhat plausible that this was a mistake by BMW. They clearly copied the expression, explicitly referring to the P&M paper for most of this part of the calculation and not even defining the numerical coefficients. As such, they may have managed to confuse themselves, too, with the different Sigma's.

Enfin, this could be one issue solved. But then there must be more lurking.

Hmm, thats of course annoying, because it doesn't really improve the confidence in the BMW paper...

@kostrzewa I've looked over the P&M modifications in your branch and I didn't see any issues. Perhaps there is a phase factor somewhere, because they're easy to miss. But it's correct for all I know.

Hmm. Is there an obvious reason why the numerical and analytical force calculations should deviate when one sets rho to 0.0 with your P&M version?

European Twisted Mass Collaboration member

I guess that's normal unless you set rho in update_momenta and in the input file at the same time. I set up a local stout control in update_momenta for convenience so that I wouldn't have to modify the derivativefunction interface (because I would have to pass a smearing control through to the numerical derivative function so it can smear it's rotated field). It's a bit of an awkward setup but it was the quickest way to implement a full numerical derivative I guess. I guess it would have been better to extend the function you wrote.

European Twisted Mass Collaboration member

From looking at Enno's code I had an idea regarding the nature of \Sigma. Tomorrow I will take a very careful analytical look at the meaning of the P&M \Sigma = (dS/dU)^T. I have a feeling we are feeding the algorithm with the wrong starting \Sigma.

In the BMW implementation I also think I've figured out why they multiply by V^dagger. If my idea about the P&M \Sigma is correct, then the two are actually equivalent. I hope I can finish it tomorrow and send you my notes.

I guess that's normal unless you set rho in update_momenta and in the input file at the same time.

Aha, that would explain things. Good!

I have a feeling we are feeding the algorithm with the wrong starting \Sigma.

That's the sort of explanation it would almost have to be, so I'm really curious what you think might be the problem.

I can confirm that rewriting the BMW Z using P&M's definition of Lambda [...] gets rid of the unwanted mixing.

I'm not sure of this anymore, as for some reason I'm now seeing the mixing either way. Maybe I mucked something up in the mean time, but don't take this as a given for the moment... Sorry!

European Twisted Mass Collaboration member

I guess that's normal unless you set rho in update_momenta and in the input file at the same time.

Aha, that would explain things. Good!

I meant rho in gauge_monomial, sorry for the confusion...

before sigma is projected back to the adjoint representation for use in the HMC, it's multiplied by the unsmeared gauge field

This seems to echo the structure of BMW's equation 3.12, doesn't it?

the quark force is POST-multiplied by V^dagger

With the transpose as a difference between BMW and P&M, doesn't this mean Sigma' is identical to BMW's Sigma?

European Twisted Mass Collaboration member

With the transpose as a difference between BMW and P&M, doesn't this mean Sigma' is identical to BMW's Sigma?

Yes, I think I'm convincing myself now that this should be the case. In the P&M formulation for the particular situation of a smeared pure-gauge action (say Wilson plaquette with fat links, for simplicity), it seems to me like Sigma should be the transpose of the fat sum of staples multiplied by -beta/3. This also clarifies the notation dS/dU if spelled out in index notation and when the derivative is understood, as in BMW, as a derivative with respect to components of a particular U.

What we usually consider as the "hmc force", by contrast (see eqs 13,14 in Gottlieb (87) ) contains an extra factor of U at each site (for the gauge contribution as well as for the pseudo-fermion contribution). The pre-multiplication by V^dagger in BMW should remove this factor but I'm not quite clear about the compatibility of this and the projection from the adjoint representation of the algebra.

It seems that in the form we use it in this doesn't seem to work as I've been trying to reconstruct the -beta/3*staples from the analytic gauge force by pre-multiplication with V^dagger and I don't get them back, not even a traceless-hermitian remnant thereof...

Also, trying to use my best guess (-3/beta*staples) for the P&M sigma currently does not produce the right force, but maybe I'm not multiplying correctly before using it as the momentum derivative.

European Twisted Mass Collaboration member

Just to be definite and in contrast to the action used in Gottlieb, which contains the hermitian cojugate of the plaquette but doesn't have it's real part taken, let's define our gauge action as (in terms of smeared fields):

• S_g = \sum_x ReTr(1 - beta/3 plaq(x) )

Now, looking a bit further, in the notation of Gottlieb (where H is traceless-hermitian!) we store in our momentum field:

• iH in the adjoint representation (as we exponentiate it without multiplication by i so what we store is traceless anti-hermitian)

and therefore we store in the derivative field (as we update the momenta with - deriv * step, due to the extra minus from trace_lambda, as noted in the code)

• -i*dH/dt in the adjoint representation

transforming this to TA form leaves us with

• -i*dH/dt in the defining representation in what we call the smeared_force.

I'm not sure here about either the sign or the prefactor because of the lambda matrices that we use.

Continuing anyway, using the identity that can be derived from eq 12 in Gottlieb and eq 11 (taking into account the simpler gauge action)

• dH/dt = 1/3 tr iF - iF [1],

the cyclicity of the trace, the hermiticity of H and equation 13 and using V for the smeared gauge field and S for the sum of staples,

leaves us with

• F_mu(x) = beta/6.0 V_mu(x) \sum_staples S_mu(x)

which I believe is what we want to multiply by V^+ and then take the transpose of. I think there is a factor of two between sigma and F though so that we want

• \sigma ~ ( V^+ 2 F_mu(x) )^T

which matches what I get for (dS/dV)^T in index notation...

However our derivative field -i*dH/dt and using [1],

• -idH/dt = 2P_ta(F)

Therefore, when we multiply this by V^dagger to form sigma (for a pure-gauge smeared action only now) our starting sigma would contain, using S for sums of staples and V for the smeared gauge field:

• \beta/3.0 \sum_staples ( ( S - V^+ S^+ V^+) - 1/n_c V^+ tr ( V S - S^+ V^+ ) )

which is not what we actually want, I think...

It appears to me that Sigma is a lot like a sum of staples for pure-gauge theory and hence the final multiplication by U and traceless antihermitian projection to obtain dH/dt. Now I just hope this makes sense and I can coax this damn thing into working.

European Twisted Mass Collaboration member

Note that the pseudo-fermion part of the force has the same prefactor and the same argument applies there.

European Twisted Mass Collaboration member

Tomorrow I will try to dismantle dS/dU for a pure smeared gauge action to see if I can recover the recursion relation for sigma...

European Twisted Mass Collaboration member

I also think that the PF part of the force has an additional symmetry under TA projection which might explain why what Enno does:

V^+ 2 P_ta( F_f )

gives the correct sigma. I think it should be possible to see this explicitly using your implementation with a multiplication by a factor of i and a check of your staple derivative (which I haven't cross-checked yet) .

Thanks for the detailed write-up, Bartek! II will have to take a careful look at this tomorrow before I can really form an opinion. But just to make sure that we're talking about the same thing: would you agree that the force that we obtain as input to the smearing routine in our code is just (d S / d V)? The latter is exactly what the numerical calculations produces, after all, and we see perfect agreement there if we don't smear. There may still be a definition issue with respect to the literature, of course.

European Twisted Mass Collaboration member

Well, setting rho=0.0 we just copy the forces because expA = 1 and the staple derivative is 0.

We have to be a bit careful about notation because as I understand it from following the derivation of Gottlieb yesterday and a year ago, there is a difference between what BMW uses variational notation for and the partial derivative of the action with respect to a gauge field.

In deriving the derivative of the conjugate momenta in the Gottlieb paper:

you get terms like:

dU/dt U U U (U at different points and different directions)

as an essential step in the derivation giving you the staples from

U U U.

dU/dt is then replaced using the equation of motion, eventually leading to eq 12).

Now when you write down the partial derivative of the action with respect to gauge field components in index notation (the transpose of which is the input sigma for P&M), for a pure gauge action you should just get the staples. In particular this object should not be in the Lie algebra. This also coincides with eq 17 of BMW (sorry, only have the arxiv version here) where \var S / \var V is -P_ta( V (\partial S / \partial V)^T )

Now looking at this definition is seems like what we store in our derivative field really is \var S / \var V, but as I argued above this means that the resulting \sigma is inequivalent to the P&M sigma as that should be just (\partial S / \partial V)^T which are just staples for pure gauge theory and what we get is the weird love-child of a P_ta projection and a multiplication by V^+. This could well be different for the PF part of the action because of those projectors (P in Gottlieb notation) and the daggering in there. If everything else is implemented correctly, I think this means that BMW should give us the correct output for our input though...

European Twisted Mass Collaboration member

In particular I think this difference between the two sigmas can be seen in eq 47 of P&M, which is essentially equivalent to a sub-step in the computation of the momentum derivative in Gottlieb. Here the sigma multiplies the derivative of the gauge field at a given smearing level. Without smearing for a pure-gauge action this would just be the staple (times beta/3) and for a smeared pure-gauge action at level n_\rho, this is also just the staple (times beta/3).

European Twisted Mass Collaboration member

On a side note, isn't the notation (eq 2.5 f. ex.) for the staple wrong in the BMW paper? The staple "from below" should include a U_nu(x-nu+mu)

which is wrongly replaced by (using V_-nu = V_nu^dagger)

V_-nu (x+mu)^dagger = V_nu(x+mu)

in the minus part of the sum...

European Twisted Mass Collaboration member

okay I'm still confused though because according to the derivation of the staple derivative the last term in the P&M sigma (not the primed one) is equivalent to the unprojected U\sigma(0) in BMW and that doesn't make sense to me..

European Twisted Mass Collaboration member

Of course, my whole argument rests on the hope that when P&M write a partial derivative, that's what they actually mean...

European Twisted Mass Collaboration member

One last comment and then I need to take a break from smearing to prepare my talk for the 22nd..

Comparing equation 47 in P&M and equation 8 in Gottlieb and taking into account the cyclicity of the trace, one can essentially read off what the P&M \sigma should be, no? And this, for a pure-gauge smeared action, should just correspond to beta/3 * (sum of staples) which corroborates my interpretation of their partial derivative. Of course, this still leaves me wondering why it doesn't work when I feed it with the staples ... but maybe I'm getting a factor or sign wrong or something...

European Twisted Mass Collaboration member

What speaks against my interpretation is the fact that if I try using \beta / 3 staples as an input for pure-gauge with very small rho, I get an answer which is completely wrong... (for my P&M implementation) (order of magnitude wrong...)

When I use 2*\beta/3 staples as an input and multiply the hermitian conjugate (like in the gauge force computation) of the output by the unsmeared field before turning it into a force I get a result which (at least for rho=0.1) always has the correct sign and similar magnitude... but I get a similar quality result by simply using our derivative as input and not multiplying by the unsmeared field at the end. I think in both cases the fact that the sign is correct is simply coincidental because of the limited number of quantities that go into the computation of this force (and so the ta components just have to have the same signs...).

European Twisted Mass Collaboration member

I don't know where to go from here... one thing I noticed is that there exists a mix-up of data types because of the tuple definition of su3 in gauge.h... A consequence of this is that

generic_staples(&v,i,mu,hf->gaugefield) != generic_staples(&v,i,mu,_AS_GAUGE_FIELD_T(hf->gaugefield))

it doesn't crash but it's numerically incompatible... Maybe our problem stems from there? Barring a mistake in the staple derivative I don't see what could be wrong in your implementation of BMW...

Wait, that's different? That's certainly not supposed to be happening. And it should have an effect then, so that's as good a lead as any. I've just been trial-and-erroring my way around modifying Sigma, but it doesn't seem to be going anywhere either for now. I'll have a look, at least see if I can understand / fix the difference.

European Twisted Mass Collaboration member

Well, the compiler warns that it's an incompatible pointer type. Even without alignment it still seems to differ... (gcc -O0, no sse, no alignment)

Ah, wait, this is expected behaviour. generic_staples expects a gauge_field_t as its argument. In that case, passing hf->gaugefield will not work, because the (x,mu) structure is imposed through the secondary pointer array. Even if the data is identical at the binary level, we do need to recast the pointer type to inform the compiler about the fact that the su3's come in blocks of 4. So yeah, calling it as generic_staples(&v,i,mu,hf->gaugefield) is simply wrong. Does that happen anywhere in the code?

European Twisted Mass Collaboration member

So yeah, calling it as generic_staples(&v,i,mu,hf->gaugefield) is simply wrong

I noticed this right when I started looking at the code because I wanted to compare generic_staples and get_staples and thought as much so I didn't mention it before.

Does that happen anywhere in the code?

I don't think so but I haven't checked explicitly.

European Twisted Mass Collaboration member

Hmm.. I'm a bit perplexed now by something. You changed measure_gauge_action to work with gauge_field_t. In my numerical derivative I use it to compute the energy difference for the evaluation of the derivative. However, when I call it directly with control->result as a parameter, the result is wrong. Instead, I need to call it with &control->result[0][0]. That's a bit strange, no?

Actually forget what I said.. I had my rho's mixed up so it seemed like I was getting the wrong derivative..

European Twisted Mass Collaboration member

I'm trying to derive the derivative of the momentum field directly from a smeared pure gauge action but it is quite hellish. Did you attempt this by any chance? (following Gottlieb's argument but with smeared fields, then expanding the time derivatives of the smeared fields to obtain expressions in terms of unsmeared fields, exp(A) and the time derivative of unsmeared fields only, so that I can plug in the equation of motion for the unsmeared fields directly)

My hope is that I can establish a definite mapping between our momentum derivative and \sigma because all the various combinations and permutations I've tried so far failed miserably...

Looking at the paper all this effort has already been done... looking at eq 47 (for k = n_\rho) should be trivial because one would obtain simply, using V = U^(n_\rho) and S_mu(V) = the sum of staples with factor of beta/3.0 included:

dS/dt = 2 \sum ReTr( S_mu(V) dV/dt )

so \Sigma^{n_\rho} is just S_mu(V)... what do you think? And why does it not work...?!

Did you attempt this by any chance?

No, this seemed like it would be complicated and confusing. Sadly, I think the really appropriate approach is exactly the one used by BMW -- deriving a rigorous chain rule and separating the problem. That clearly doesn't get us anywhere.

Perhaps one workable approach is to try and derive the smeared derivative for a special case; something like our delta spike field? That should drastically reduce the complexity of the expressions, while still telling you something about \Sigma.

And why does it not work...?!

Doesn't this assume a plain chain rule, though? Then there may be a factor V needed to constrain to SU(3), isn't it? Also, I'm not sure if your argument appropriately takes the localization of the staples into account. Maybe it does, but I find it a little hard to see like this.

Perhaps there's still an alternative approach, though. Since the very simplistic case with a rotation of the gauge field in the direction of \lambda_3 produces wrong results, we can try to fix that one. It's simpler, because the Abelian nature of the matrices involved. Finding a mapping pretty much amounts to tweaking a single angle then, doesn't it? That should be doable and if we find it, we can try to figure what combination of V and U produces it.

I've added a rotation with V^\dagger at the start and a rotation with U at the end of the force smearing routines in Bartek's P&M implementation. For the diag(I, -I, 1) test case, now placed at location (1,1), and a stouting constant of 0.1, I get the following:

[DEBUG] Comparison of force calculation at [1][1]!
numerical force <-> analytical force <-> analytical force + smeared
[0]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[1]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[2]  +12.967255338481 <-> +12.967255340835 <-> +12.967255340835
[3]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[4]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[5]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[6]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[7]  +0.000000000000 <-> -0.000000000000 <-> -0.000000000000


Not sure why analytical force and analytical force + smeared are identical -- this code ended up rather hackish and I may have gotten things mixed up. But I get agreement for every value of rho I throw in, so this is something. I'll try to compare this to the BMW implementation on the same field.

European Twisted Mass Collaboration member

Hmm, if all you changed was to multiply the input analytical force by V^dagger and the output by U then that would mean that exp(A) = 1, Lambda and the staple derivative = 0 or the last two terms cancel eachother by accident. I found it insighful to print_su3 separately the three terms of the resulting smeared_force.

No mix-ups. It seems this particular configuration has the interesting property that, even when the smearing and the intermediate results aren't trivial , it doesn't produce a correction to the gauge force other than through the smeared field itself.

Before:
[ - 0.000000000000 + 12.967255340835 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 12.967255340835 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 + 0.000000000000 * i]
Sigma:
[ + 10.702337660419 + 7.321863128794 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, + 10.702337660419 - 7.321863128794 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, + 0.000000000000 + 0.000000000000 * i]
Z:
[ - 12.967255340835 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, + 12.967255340835 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, + 0.000000000000 - 0.000000000000 * i]
i * C^+ * Lambda:
[ - 0.000000000000 - 7.780353204501 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 + 7.780353204501 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 + 0.000000000000 * i]
Sigma * exp(A):
[ + 12.967255340835 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, + 12.967255340835 + 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 + 0.000000000000 * i]
smeared_force:
[ + 12.967255340835 - 7.780353204501 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, + 12.967255340835 + 7.780353204501 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 + 0.000000000000 * i]
smeared_force after staple derivative:
[ + 12.967255340835 - 8.031377309953 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, + 12.967255340835 + 8.031377309953 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 + 0.000000000000 * i]
After:
[ + 8.031377309953 + 12.967255340835 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, + 8.031377309953 - 12.967255340835 * i, - 0.000000000000 - 0.000000000000 * i]
[ - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 - 0.000000000000 * i, - 0.000000000000 + 0.000000000000 * i]
[DEBUG] Comparison of force calculation at [1][1]!
numerical force <-> analytical force <-> analytical force + smeared
[0]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[1]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[2]  +12.967255338481 <-> +12.967255340835 <-> +12.967255340835
[3]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[4]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[5]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[6]  +0.000000000000 <-> +0.000000000000 <-> -0.000000000000
[7]  +0.000000000000 <-> -0.000000000000 <-> -0.000000000000


I get the same result for the BMW form of the implementation, but there I get that additional contribution in the other Cartan. If we find out why, that's at least one bug in the code. Hopefully its nature will tell us what is going on.

Ah, that at least explains the difference between BMW and PM. The derivative of exp(A) as given by in equation 3.27 of the BMW paper is identical to the derivative of exp(iQ) given in equation 68 of P&M. But BMW never explicitly give the coefficients. The ones we calculate are directly from P&M, but we can only use those if we write the exp(A) as exp(i (-i A)) and use -i A in the polynomial expansion. Alternatively, we can absorb that factor -i in the coefficients or the signs of the expansion.

It's a bit silly I missed this point the first time around. Also a tad sloppy by BMW, though.

Aha... This should make quite a difference, shouldn't it? But it doesn't fix everything, correct?

Yes, it matters quite a bit. I suspected something was wrong with the phase factors in those coefficients. It cleans up part of the differences between P&M and BMW, but not yet for the general case. I'm now going to try and do the latter.

Perhaps a little more progress. There probably needs to be an additional overall factor of i included in the definition of Z for the BMW implementation, coming from a chain derivative of A w.r.t Q.. This leaves the smearing at (0,0) for my test case unaffected, but pushes the other values close to the numerical results. In that case, there might be one remaining issue coming from the staple derivatives.

looks as if you are going to solve it over weekend...! ;)

A progress report, with actually quite a lot of progress to report on!

Within the testing setup I'd been playing with for some time and using the new random gauge rotation routine, I managed to find a case where I could clearly pinpoint the source of the deviations we'd been seeing.

It turns out that there were at least two problems interfering with each other. First, there was indeed still a phase factor missing in the coefficients. That is to say, I know there was at least a factor of i missing in f1. Secondly, there was an issue with the aggregation of the staple derivative term, where C_der in the add_stout_terms_to_forces was overwritten at one point in the loop. Just compensating both those factors got me very near to agreement on this toy configuration.

Even for a full configuration, things are looking much better. There is no perfect agreement yet, but that's to be expected for now. If there was a phase factor missing in f1, the other three parameters are certainly not correct yet. I'll need to tweak this a little still, but wanted to at least get the word out that we're getting close to fixing this.

To demonstrate, here's the comparison with the numerics for a hot start lattice with a realistic smearing coefficient rho of 0.1, so something of a worst case. There are still significant deviations, of course, but before we were seeing terms with opposite signs.

[DEBUG] Comparison of force calculation at (1, 1)!
Involves monomial types: 2
Analytical  vs.  numerical
[0]    +0.1423737 <->   +0.1929715
[1]    +1.5221415 <->   +1.6332804
[2]    +2.0399480 <->   +2.0498113
[3]    -4.4482958 <->   -4.5033163
[4]    +4.9242033 <->   +4.9104359
[5]    -5.0947897 <->   -5.2012666
[6]    -0.1746354 <->   -0.1312299
[7]    -1.0091498 <->   -0.9880012


Some further tweaking (just trial and error actually, because I don't quite understand the systematics behind this one) gives

[DEBUG] Comparison of force calculation at (1, 1)!
Involves monomial types: 2
Analytical  vs.  numerical
[0]    -0.1102262 <->   -0.1103645
[1]    +1.3436790 <->   +1.3437698
[2]    -2.8643168 <->   -2.8639173
[3]    -0.5916673 <->   -0.5918614
[4]    -0.8478323 <->   -0.8476075
[5]    +2.5774264 <->   +2.5774270
[6]    +0.1622520 <->   +0.1623363
[7]    -1.9317261 <->   -1.9318988


So we're down to sub-percent level deviations!

Ah, and I'm apparently getting acceptance! Modifying sample-hmc0.input to include smearing:

L=4
T=4
Measurements = 1000
StartCondition = hot
2KappaMu = 0.177
kappa = 0.177
NSave = 500000
ThetaT = 1
UseEvenOdd = yes
ReversibilityCheck = yes
ReversibilityCheckIntervall = 100
InitialStoreCounter = 0
DebugLevel = 1

BeginSmearing STOUT
Id = 1
Rho = 0.18
Iterations = 1
EndSmearing

BeginMonomial GAUGE
Type = Wilson
beta = 6.00
Timescale = 0
EndMonomial

BeginMonomial DET
Timescale = 1
2KappaMu = 0.177
kappa = 0.177
AcceptancePrecision =  1e-20
ForcePrecision = 1e-12
Name = det
Solver = CG
Smearing = 1
EndMonomial

BeginIntegrator
Type0 = 2MN
Type1 = 2MN
IntegrationSteps0 = 2
IntegrationSteps1 = 6
Tau = 1
Lambda0 = 0.19
Lambda1 = 0.20
NumberOfTimescales = 2
EndIntegrator


I end up getting

# Acceptance rate was 94.90 percent, 949 out of 1000 trajectories accepted.


European Twisted Mass Collaboration member

Excellent!

European Twisted Mass Collaboration member

Secondly, there was an issue with the aggregation of the staple derivative term, where C_der in the add_stout_terms_to_forces

How come?

How come?

I see you already found the culprit. It probably did sneak in at some later point. I think that's one of the risks of prodding at a piece of code for a long time. Either way, it was clearly messing with the debugging effort later on.

Even if things seem to work now, I still want to understand both the exact origin of the phases and the reason for the small deviations from the numerical results we keep seeing. For the latter, I suppose it may even be a precision issue in the numerical approximation. After all, with a non-trivial smearing operation in place, there are significantly more computational steps taken between the introduction of the small rotations and the calculation of its induced effect. But it seems a little complacent to just assume that to be the case.

European Twisted Mass Collaboration member

Differences of order 1e-4 or 1e-5 are to be expected judging from the accuracy of the numerical approximation when not smearing if I remember correctly.

I have already ported your symmetric derivative back into my branch, so the precision is actually a bit better than that. For a run using sample-hmc0.input, the force calculation without smearing gives

[DEBUG] Comparison of force calculation at (1, 1)!
Involves monomial types: 0
Analytical  vs.  numerical
[0]    +0.5305495 <->   +0.5305495
[1]    -0.3578221 <->   -0.3578222
[2]    -0.1945487 <->   -0.1945489
[3]    -0.7197235 <->   -0.7197235
[4]    -0.2632587 <->   -0.2632587
[5]    +0.2094987 <->   +0.2094987
[6]    -1.2052496 <->   -1.2052498
[7]    +1.0051157 <->   +1.0051157


whereas the same run with 3 iterations of stout smearing on the DET monomial gives

[DEBUG] Comparison of force calculation at (1, 1)!
Involves monomial types: 0
Analytical  vs.  numerical
[0]    +0.3234338 <->   +0.3232503
[1]    +0.2263707 <->   +0.2263535
[2]    +0.2771293 <->   +0.2773533
[3]    +0.0161679 <->   +0.0161866
[4]    +0.2589449 <->   +0.2592188
[5]    -1.5009921 <->   -1.5013020
[6]    +0.9801545 <->   +0.9798860
[7]    +0.3049090 <->   +0.3048943


So there is a definite increase in the deviations. It doesn't seem to be a crude numerics issue either, because I can increase the epsilon in the finite difference determination from 5e-6 to 1e-4 and find it only changes the final digits of the force estimate.

European Twisted Mass Collaboration member

I see, there could then indeed be a slight problem. On the other hand, the smearing can potentially produce large deviations even for the O(1e-6) rotation that we use (juding by the fact that changing rho from 0.1 to 0.12 can change the sign on the forces...)

On the other hand, the smearing can potentially produce large deviations even for the O(1e-6) rotation that we use (judging by the fact that changing rho from 0.1 to 0.12 can change the sign on the forces...)

I agree, but then I would rather expect to see this back in the uncertainty on the numerical estimate, too. Still, it's a rather delicate balance in these calculations. Given the near agreement in the case of random fields, virtually excluding artificially suppressed contributions, it's hard to come up with possible sources of error that small. There's two I can think of, at the moment.

First, there could be a subtle interplay of cancellations with the phase factors on the coefficients. So I've done a systematic check of the signs of the coefficients used in the construction of Z now. I can constrain the phase modulo that sign from the symmetry properties of the forces using just the Abelian degrees of freedom. For f1, I can actually fix the phase completely by looking at smearing with a tiny coefficient (rho = 0.0001), because through the multiplications with A the contributions of all other coefficients will be suppressed by rho^2, rather than rho. So that leaves 3 signs to play with.

Changing the signs on B1 and f2 always messes things up badly, with deviations in the tens of percents. That leaves B2, which actually seems to be a naturally small quantity multiplying a term proportional to A^2, so its contribution should be small. Flipping its sign does appear to be a modest improvement overall.

Without flipping the sign on B2:

[DEBUG] Comparison of force calculation at (1, 1)!
Involves monomial types: 0
Analytical  vs.  numerical
[0]    -0.6771144 <->   -0.6770567  (+0.0085255%)
[1]    +0.1116420 <->   +0.1118456  (-0.1820194%)
[2]    -0.4595177 <->   -0.4588569  (+0.1440059%)
[3]    -0.3309691 <->   -0.3304069  (+0.1701586%)
[4]    -0.6524593 <->   -0.6522012  (+0.0395654%)
[5]    -2.1065385 <->   -2.1069305  (-0.0186051%)
[6]    +1.1621339 <->   +1.1620734  (+0.0052077%)
[7]    +1.2684204 <->   +1.2685082  (-0.0069229%)


With flipping the sign on B2:

[DEBUG] Comparison of force calculation at (1, 1)!
Involves monomial types: 0
Analytical  vs.  numerical
[0]    -0.6765216 <->   -0.6765230  (-0.0002046%)
[1]    +0.1125219 <->   +0.1125230  (-0.0009569%)
[2]    -0.4579526 <->   -0.4579581  (-0.0011936%)
[3]    -0.3305511 <->   -0.3305588  (-0.0023394%)
[4]    -0.6514290 <->   -0.6514091  (+0.0030478%)
[5]    -2.1070045 <->   -2.1069671  (+0.0017745%)
[6]    +1.1623589 <->   +1.1623534  (+0.0004716%)
[7]    +1.2671395 <->   +1.2671021  (+0.0029561%)


The deviations are now becoming properly small, even if they're still two orders of magnitude larger than those on the unsmeared forces. But that pretty much fixes things for the phase factors.

The second potential source of errors is the staple derivative itself. Perhaps one of the components is somewhat messed up and a comparison with the P&M formulation could be helpful here. Then again, none of those components is naturally small compared to the others and the total staple derivative certainly isn't a minor contribution either. So it's hard to see why getting something wrong here would have such a limited effect...

The derivation of the phase factors turns out to be trivial, as one would expect.

Comparing the expression in P&M to the one in BMW, there's the straightforward replacement of Q by -i A. We can then absorb that -i into the coefficients multiplying those matrices.

In addition, there's an overall phase factor. It's there, because the P&M derivation obtains the expression from a chain derivative using dQ/dt. In BMW's formulation, that is a chain derivative involving dA/dt. It's rather hidden in the final formula, but all coefficients multiply this derivative factor, so all of them pick up an additional factor of -i from dQ = -i dA.

In the end, that implies we get the following phases between P&M and BMW:
f1: -i
f2: -1
B1: -1
B2: i

I will fold them into the Cayley-Hamilton calculation and the definition of Z.

European Twisted Mass Collaboration member

Oh I see, that all makes sense then. And as input force you simply use the 3x3 representation of the TA-projected force?

And as input force you simply use the 3x3 representation of the TA-projected force?

That depends on how you look at it. It's the current input, but I'm folding in the factor V^dagger at the beginning and U at the end of each smearing iteration.. For a while I thought it could be taken outside of the loop, but that won't work because of the projection after each stage.

European Twisted Mass Collaboration member

Oh I see, yes that makes sense. I still don't fully understand why V^dagger P_{TA}( force ) gives the correct Sigma [1] but I'm glad it does! This then also matches what Istvan's code does, except for being defined in terms of Q rather than A.

[1] because it would appear to me that doing such a multiplication gives sigma a trace, first of all, and also possilbly projects it out of anti-hermiticity

so, how do we proceed here now. I guess we need to compare with another code, right?

I also have a test set-up, where I can test reproducability with different parallelisations after one trajectory.

what still needs to be done:

• test against Istvans code
• test against Chroma!?
• optimisation
• test MPI versions
• openMP with smearing
• rename "Smearing" to "SmearingID"
• add Smearing also to BeginOperator environment
• HEX smearing
• more smearings...
European Twisted Mass Collaboration member

@deuzeman can I consider the MPI implementation to be testable at this point? If so, I will run it though its paces with my high statistics framework. Not sure when I can get this started though...

I guess not... I cannot run the MPI version of the smearing branch.

#5  0x00007fba15ea2a29 in PMPI_Isend () from /usr/lib/libmpi.so.0
#6  0x000000000051c006 in generic_exchange (field_in=<value optimized out>, bytes_per_site=576) at ../../tmLQCD/buffers/utils_generic_exchange.2.inc:1
#7  0x00000000004135f1 in exchange_gauge_field (argc=1, argv=0x7fff8a3dfa18) at /storage/urbach/1dim/../tmLQCD/buffers/utils.h:57
#8  main (argc=1, argv=0x7fff8a3dfa18) at ../tmLQCD/hmc_tm.c:384


Having a look at this, it may not be all that serious. Will get back to you!

Luckily, it was indeed a rather trivial thing. The much maligned icc actually spit out the following during compilation

hmc_tm.c(384): warning #167: argument of type "gauge_field_t" is incompatible with parameter of type "gauge_field_t *"
exchange_gauge_field(g_gf);


Adding the missing ampersand has the code run. A very cursory check (running sample-hmc1.input on four MPI processes for 500 configurations) shows that the code isn't completely borked at least. I'm getting about 95% acceptance and a very reasonable plaquette average.

Since this is a rather obvious bug fix that's clearly not making things worse, I've already pushed the modification to etmc/smearing.

Ah, that was a little too enthusiastic. There's still some issues once one activates smearing, so I'll have to dive into that, too.

Hmm, an interesting problem. I'm getting a segmentation fault when I try to exchange smeared_field in the stout smearing routine. That would suggest that I didn't allocate enough memory for the halo, but its allocation routine is actually identical to the one used for creating g_gf. I'll need to have a look at this using a debugger, I think.

I have written a number of check routines for the exchange routines. Maybe they are helpful?

Indeed, I also still get a segmentation fault...

#29 0x00000000005162bc in stout_smear_forces (control=0x7fff4af93ed0, in=<value optimized out>) at    ../../tmLQCD/smearing/stout_stout_smear_forces.c:24
#30 0x00000000004d18cd in smear_forces (control=0x23756b0, in=0x2b0f4c0) at ../../tmLQCD/smearing/control_smear_forces.c:18
#31 0x000000000043e741 in update_momenta (mnllist=<value optimized out>, step=<value optimized out>, no=<value optimized out>, hf=<value optimized out>)
at ../tmLQCD/update_momenta.c:100
#32 0x000000000042a261 in integrate_2mn (tau=<value optimized out>, S=<value optimized out>, halfstep=<value optimized out>) at ../tmLQCD/integrator.c:125
#33 0x00000000004257e9 in update_tm (plaquette_energy=<value optimized out>, rectangle_energy=<value optimized out>, filename=<value optimized out>,
return_check=<value optimized out>, acctest=<value optimized out>, traj_counter=<value optimized out>) at ../tmLQCD/update_tm.c:159
#34 0x00000000004139c4 in main (argc=1, argv=0x7fff1d336578) at ../tmLQCD/hmc_tm.c:458


Moving the derivative exchange up, before the smearing of the forces, does actually seem to fix the smearing with MPI. Kudos to @urbach for coming up with that on the spot!

added a commit to deuzeman/tmLQCD that referenced this issue Mar 12, 2013
 deuzeman Move derivative exchange up to precede smearing. Fixes the paralleliz… …ation issue reported in issue #198. 3d66cb5
European Twisted Mass Collaboration member

Funny, didn't work for me...

Really? sigh

It made the difference for my test case, at least. But perhaps there's another issue then... I'll do a broader range of tests and see what is going on.

European Twisted Mass Collaboration member

I might well be wrong though, maybe I made a gaffe during compilation or something. I got acceptance for one iteration (hot start so that's not surprising) and then everything broke down. This was with rho=0.2, iterations=1, smearing on a det monomial. Good luck :)

well, at least I do get now agreement among serial, 1-dim, 2-dim, 3-dim and 4-dim parallelisation:

serial
00000001 0.310326953527 -0.249603552061 1.283516e+00 56 894 1 9.796238e-01
1dim
00000001 0.310326953527 -0.249603552056 1.283516e+00 56 894 1 7.191510e-01
2dim
00000001 0.310326953527 -0.249603552059 1.283516e+00 56 894 1 8.066070e-01
3dim
00000001 0.310326953527 -0.249603552060 1.283516e+00 56 894 1 1.900004e+00
4dim
00000001 0.310326953527 -0.249603552060 1.283516e+00 56 894 1 3.457767e+00


so, it should give acceptance parallel, if serial does. This was with one smeared DET monomial. But it should be independent of the monomial types, shouldn't it?

European Twisted Mass Collaboration member

That's good news then. Maybe I had something lingering around in my code which prevented me from getting acceptance after the first trajectory...

I hope it is independent of the monomial type -- if it isn't, we may have a serious problem.

My test was with a DET + DETRAT monomial and three levels of rho=0.18 smearing, but I also get acceptance on the NDRAT sample input with four levels of rho=0.12 smearing. This is with rather low levels of parallelization, though.

European Twisted Mass Collaboration member

Excellent, sorry about the confusion then!

not getting acceptance can also have been due to too large dtau...

So, how is the status of comparison with chroma and Istvan?

European Twisted Mass Collaboration member

Maybe but I would imagine that smearing should increase acceptance, if anything...

I will try to start some high statistics runs for self-consistency checks soon and then I will attempt the OpenMP parallelization. As discussed with Albert in Lugano there should not be any surprises in that respect, but I haven't thought about it for every single function yet.