-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize clover storage and memory traffic #202
Comments
Ok, on closer inspection I see that the clover term relation isn't quite I wrote above. Since the regular Wilson diagonal term contribution is included in the clover matrix, the actual form is like this:
so the relation we need to relate the upper and lower diagonal parts is slightly more involved and likely means that the only way to do the memory saving for inverse matrix is to invert the clover matrix on the fly. |
Ok understood, so I should get going with fma soon, and then with the Alex El 21/1/2015, a las 20:06, mikeaclark notifications@github.com escribió: Ok, on closer inspection I see that the clover term relation isn't quite I / 1+A B so the relation we need to relate the upper and lower diagonal parts is — |
Can’t you just recover your original form of the matrix using -1 ? |
Yes you can recover it trivially for the regular clover matrix, but for the inverse, it means there is no easy relationship between the upper and lower block diagonals. So I believe inversion on the fly is required (happy to be proven wrong though). |
In any case, there are other strong reasons to push the inversion El 21/1/2015, a las 20:21, mikeaclark notifications@github.com escribió: Yes you can recover it trivially for the regular clover matrix, but for the — |
Well, the inverse was the thing I did not think about. Maybe one should anyhow try to see what happens by the additional one on the diagonal. I remember there are some formulas also for diagonal matrix + ‘something’ but I don’t know remember which restrictions there are for 'something'. |
Addressed by #1213 |
There is symmetry in the clover them that is not exploited that could reduce the memory footprint and reduce the memory traffic in the kernels that use the clover term.
Specifically, if we consider the 2x2 block form each of chiral block of the clover matrix, we have
We presently only store B* and don't double store with B, however, we do double store A and -A. If we only stored A, then the number of real numbers required to store the clover term is reduced from 72 to 54, which represents 25% reduction in both memory footprint and memory traffic.
I suspect this optimization is of most utility for the twisted clover formulation, since I believe it requires two clover matrices, hence is more memory footprint bound.
This symmetry does not hold directly for the clover inverse, so it would only apply to the kernels with the direct clover term. However, clearly we can use only 54 numbers for the clover inverse, since we could just load the direct matrix (using 54 numbers) and invert on the fly. This may be too computationally expensive, but more insight may be gained from considering the 2x2 block form of the inverse.
Anyway, as a first step, we should replace the clover direct matrix using this reduced storage form. After that is done, we can consider reduced storage options for the inverse.
The text was updated successfully, but these errors were encountered: