LapH crashes in latest master #192

Open
urbach opened this Issue Dec 11, 2012 · 12 comments

Comments

Projects
None yet
3 participants
Contributor

urbach commented Dec 11, 2012

With all the changes we introduced LapH doesn't work anymore.

The main reason for this is that it is switched on with a configure switch, and therefore, it will never be tested by anyone. So this needs fixing...! Why is this a configure switch at all?

@ggscorzato: could you please fix this!

Owner

kostrzewa commented Dec 12, 2012

This will also include adding a "repro" mode to the random jacobi field function, otherwise it breaks reproducibilty.

Owner

kostrzewa commented Dec 12, 2012

I think when real interleaving is added this will break even more.

Contributor

urbach commented Dec 12, 2012

yes, its a mess...

We need to clean the geometry stuff and/or remove one of the versions, its not maintainable...!!

Member

ggscorzato commented Dec 13, 2012

LapH was broken because timesplitpar was broken.

This happened because the functions tm_sub_Hopping_Matrix and tm_times_Hopping_Matrix assumed some options of hopping matrix that did not include the time-split one.

I have fixed that.

Member

ggscorzato commented Dec 13, 2012

about the geometry and the LapH, I made some proposals that were rejected. I guess we should organize a phono-conf only for that.

Owner

kostrzewa commented Dec 13, 2012

We will also need to move the random_jacobi_field function into start.c and give it the ability to produce random numbers reproducibly, like the other random_* functions. I have introduced some generalizations into the functions in start.c to reduce code duplication:

#193

In particular I have removed unif_su3_vector and generalized random_su3_vector instead. Those two functions are now also limited in scope to this file so that they cannot be called externally to prevent hiccups of the random number generator in "repro" mode (reproduce_randomnumber_flag = 1).

Contributor

urbach commented Dec 13, 2012

I don't think that we rejected anything, and as far as I understand we were still in a discussion about it. However, I think it was a good decision to do these changes not in parallel to all the stuff we did during the last weeks and are still doing.

But it became also very clear that as it is now it is not maintainable. Many of the new features will not work with timesplitpar and/or indexindepgeometry, and its not even clear to me what of this is still needed. I still don't understand why we need all the gI_L_0_0_0 things, because they are already in g_ipt[][][][]. So, we need a discussion, maybe on a Wiki page first, and then we could have a phone conference. This is at least my opinion.

Owner

kostrzewa commented Dec 13, 2012

I think the gI_* structures could be used to make something truly independent of the content of the Index function by relabelling the coordinate system in the order (first parallelized direction, second parallelized direction, third parallelized direction, fourth parallelized direction).

In this way, for instance for an XYZT parallelization, gI_L_0_0_0 would correspond to a coordinate (0,LX,0,0) in the 'normal' (t,x,y,z) format. For the standard TXYZ parallelization it would correspond to (T,0,0,0). In this way it should be possible to express all communicators in a fully uniform fashion without any sort of arithmetic in the communicator itself.

The neighborhood arrays could then consist of g_nb_[0,1,2,3]_up/dn which correspond to the first, second, third and fourth parallelized directions. The same goes for the "edge" and "rand" memory sizes, gathers etc... So in XYZT parallelization the "01" edge would be XY, in TXYZ it would be TX.

Contributor

urbach commented Dec 13, 2012

The same would work using g_ipt, or what am I missing?

Owner

kostrzewa commented Dec 13, 2012

I need to think about it for a while... I guess you're right though.

Member

ggscorzato commented Dec 13, 2012

I defined gI_* for reasons of efficiency. I was worried that loading the huge g_ipt[][][][] array within the communication routine it would waste the cache.
But I could be wrong. I will test it. Anyway I agree that it would make the code much more readable, and it is probably worth do it just for that.

Contributor

urbach commented Dec 13, 2012

I don't know, I'm just trying to understand. Performance could be an issue...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment