we should make the xchange routines more general. One approach is the one by Luigi, called _INDEX_INDEP_GEOM. I think that could become the default at some point. Lets start a discussion on this.
I have some points and questions:
1. I find the names 'g_1st_y_int_dn' hardly digestable
2. shouldn't we have a loop over all directions that need communication with arrays containing the corresponding information instead of introducing more things like PARALLELXYZ ?
3. how many versions of the communication routines do we actually need?
4. this xchange stuff might be connected to a polishing of the geometry indices, as Bartek and myself were discussing?
g_1st_xt_ext_up stands for the "index of the first point in the 2d slice at constant x and t in the border in the up side".
I found it convenient because it is then easy to understand which pointer you need in the communications. But we can change this, if there are other preferences.
This may be a good idea, but it is a major change. I suggest to clean the present approach first, and then see. It would be necessary anyhow to clean first.
I will open another discussion on this specific topic.
There are at least two sub-issues here:
The function geometry() should be definitely polished. Having _INDEX_INDEP_GEOM as default, and removing LapH will be a first big step. I understood that SF boundary conditions had been removed from the master branch (or not?), but there is still some SF code there. Are there other things to do?
As for the function index(), there are two versions and we can make one by joining them unpretentiously. It will remain a very long function and I do not see a way to simplify it further. Index() basically contains all the complexity of the addressing of the borders, which is largely unavoidable, I think. But at least we can keep all such complexity only in the index() function.
okay, please don't simply continue as it is now...!
There might be a lot of interference, as I said, with what Bartek and myself are doing. For the BG/Q we have written a version of the Dirac operator which interleaves communication and computation. This helps a lot, and I don't see why this cannot be used on Aurora... See my NewHalfspinor branch or Barteks interleaving_hs branch and https://github.com/etmc/tmLQCD/wiki/Blue-Gene-Performance.
Also the halfspinor version has changed significantly, and simply removing the "old" communication routines will not work.
And I'm really in favour of rewriting it in loop form to have less code, which is more general!
which are the relevant files and routines in NewHalfspinor and interleaving_hs where the communications are done? The files xchange_** seems unchanged.
I will think how to organize a loop and propose something.
About LapH, I start another discussion.
in case MPI is used there is indeed no difference. The main difference is a re-ordering in the spinor fields, which is done in init_dirac_halfspinor.c and the interleaving is done in operator/halfspinor_body.c for NewHalfspinor. For interleaving_hs there are new index arrays do the re-ordering, and then again its happening in operator/halfspinor_body.c.
On the BG/Q the communication is then done using the BG/Q SPI, but it also works with MPI, if on the architecture overlapping communication and computation works. Note that in the current version of NewHalfspinor is not yet working in the HMC and inverter, as a re-ordering would be needed. The Version of Bartek is working, but probably slightly less optimal.