Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Gpurational #237

Closed
wants to merge 60 commits into
from

Conversation

Projects
None yet
5 participants
Contributor

florian-burger commented Feb 21, 2013

gpu code compilable again

Owner

kostrzewa commented Feb 21, 2013

Oops, my mistake. This seems to need some merging with the current master still.

Florian Burger added some commits Feb 8, 2013

Florian Burger Here the latest GPU related files from my local code are taken over,
as a merge does not seem a good option here.
The functionality added by Falk will be repaired later on
4ddef6c
Florian Burger fixed new allocation of solver fields in all outer solvers 573483a
Florian Burger added GPU support in det and detratio 56b8393
Florian Burger adapted function arguments to allow for hamiltonian field
added gpu support in gauge_monomial
changed dev_gauge_derivative to allow for a multiplicative constant in
momentum calculation
5d2db32
Florian Burger fixed all compile errors due to c99
fought unnecessary warnings
0df793a
Florian Burger both degenerate and ND solvers in EO running again.
Cleaned up a bit the ND solver
5440583
Florian Burger renamed ND Matrix to match name in cpu code
fixed non-compile bug in observables.h with TEMPORALGAUGE
removed lengthy GPU related stuff for TEMPORALGAUGE to function in invert_eo
a1d7898
Florian Burger complex kappa now possible in HoppingMatrix 3884064
Owner

kostrzewa commented Feb 21, 2013

Okay, we rebased onto the current master. @florian-burger had some comments relating to the modenumber computation on GPUs as he essentially removed the C++ in the merge. Maybe Falk and Elena can have a look?

Contributor

urbach commented Feb 21, 2013

I even failed to motivate the two to clean up invert...

Florian Burger added some commits Mar 11, 2013

Florian Burger worked on:
dev_Qtm_pm -> fused kernels to get more out on kepler, old version still available
dev_Qtm_pm_nd -> fused kernels, old version still available
double Versions seem to be working of both deg. and nd eo matrices
still issue with discrepancy of cpu/gpu(double) residues after solve (->
reconstruction issue?)
added first version of mm-solver (in mixed prcision)
7f1e65c
Florian Burger removed finalize temporal gauge routine in invert_doublet which caused
segfaults due to double free
924b5e1
Florian Burger made MPI version compile again
degenerate part is working but ONLY WITHOUT TEMPORALGAUGE -> Why??
f7b84bb
Florian Burger some small changes
added if use_gpuflag in invert.c
9989d3e
Contributor

deuzeman commented Apr 9, 2013

I even failed to motivate the two to clean up invert...

So would they actually be hindered by the update? If all they want to do is run the existing code to get their results, they're perfectly welcome to keep doing that.

Owner

kostrzewa commented Apr 9, 2013

So would they actually be hindered by the update? If all they want to do is run the existing code to get their results, they're perfectly welcome to keep doing that.

Yes, I agree that this should just be pulled in. The broken parts are not maintained anymore and according to Elena they were tested and it was concluded that the GPU implementation is too slow for some reason. We just need to make sure the rest of the codebase is not broken by this.

Florian Burger added some commits Apr 26, 2013

Florian Burger removed all REAL data type define related stuff and changed to float
Carsten's bugfix for not correctly set default of nd_precision flags and
values
75693b0
Florian Burger removed subsequent mem alloc in init_temporalgauge_trafo which lead to
mem leak in eo gpu part -> fixes issue 262
some cosmetics on misleading error message
405259b
Florian Burger added first version of mixed clover tm inversion e6a478a
Florian Burger Merge branch 'master' into gpurational ed3454b
Florian Burger added mpi support for clover
added relativistic basis support for clover
added texture support for clover
f01b09c
Florian Burger Number of gpus per node can now be specified in gpu input section
default is 4 gpus/node
262e6d7
Florian Burger fixed issue with gpu shift solver and added max EV normalization in n…
…d matrix gpu kernels; added gpu mms support in ndratcor monomial
424d485
Florian Burger improved non-EO version of degenerate matrix substantially,
all improvements (TEMPORAL_GAUGE etc.) working
some more minor fixes
2fa0747
Florian Burger added some functionality to limit inner solver iterations in 1+1 1022e3c
Owner

kostrzewa commented Nov 25, 2013

Dear Florian and Carsten,
I would like to pull this in as soon as possible. I had one modification in mind with regards to the code duplication in many sections (nothing in the GPU directory though). One could add a few lines to read_input.l which, when HAVE_GPU is undefined, force usegpu_flag to be 0. I think the cost of one "if test" is negligible compared to the improvement in legibility and maintainability.

The HAVE_GPU ifdefs in all those sections could then also be limited to the lines inside of the (usegpu_flag) if construct, perhaps with a safety line which prints something like "Usegpu_flag == 1 despite GPU support being disabled, something must have gone wrong! Check read_input.l!"

Florian Burger GPU input parameter "DeviceNum" can now also be used to set the number
of the first device in use when compiled with define DEVICE_EQUAL_RANK
switched default of this from -1 (which was supposed to give an error)
to 0
8589b26
Contributor

florian-burger commented Nov 28, 2013

Dear Bartek and Carsten,

sorry for the late answer.

On 11/25/2013 01:07 PM, Bartosz Kostrzewa wrote:

Dear Florian and Carsten,
I would like to pull this in as soon as possible. I had one
modification in mind with regards to the code duplication in many
sections (nothing in the GPU directory though). One could add a few
lines to read_input.l which, when HAVE_GPU is undefined, force
usegpu_flag to be 0. I think the cost of one "if test" is negligible
compared to the improvement in legibility and maintainability.

The HAVE_GPU ifdefs in all those sections could then also be limited
to the lines inside of the (usegpu_flag) if construct, perhaps with a
safety line which prints something like "Usegpu_flag == 1 despite GPU
support being disabled, something must have gone wrong! Check
read_input.l!"

Yes, I think we could do like that. For the hmc monomials it would also
be good to have a wrapper for the solver. This is where most of the ugly
HAVE_GPU ifdefs are actually. Such a wrapper would also be needed for
the pure CPU mixed-precision solver, if we want to use it in the hmc.


Reply to this email directly or view it on GitHub
#237 (comment).

Florian Burger added some commits Jan 31, 2014

Florian Burger working double sequential mms solver on gpu d8c8ce7
Florian Burger fixed incompatibility with cpu deriv_SB that was introduced by changes
in cpu code
1a19130
Florian Burger fixed a bug:
compile w/o GPU_DOUBLE lead to segfault in gpu_deriv_SB, as two fields were not
allocated in this case
6077b08
Florian Burger su3 matrix mult with +=
fixed a bug which lead to loss of dp
f8fb681
Florian Burger added basic double2 tex support via int4 detour afb5a8f
Florian Burger added a double Hopping Matrix aequivalent 91e2fab
Florian Burger bug fix in polynomial initial guess generation -> currently disabled 50e869c
Florian Burger wrapped a test_double_operator by a #ifdef MATRIX_DEBUG 9e03504
Florian Burger working double inverter for nd doublet called sequentially from gpu m…
…ms solver

removed the forced minimum iteration number in ND mixed solver
f790aef
Florian Burger some restructuring to make transition to double2 easier
local dev_spinor_d[6] -> double4[6] such that we only have to adapt
loads and stores from/to global mem
d4cd522
Florian Burger working ndrat ndratcor monomials e10a564
Florian Burger added working TRUE mms solver in pure double
added possibility to specify a maximum number of shifts to be used in
this solver
798d4c0
Florian Burger adapted debug_level for printing out pre/post gauge-fixing data ec2fc3e
Florian Burger fewer output by addign #ifdef LOWOUTPUT 43ba67c
Florian Burger changed some comments 624facb
Florian Burger removed some further output b027e76
Florian Burger fixed a small bug when benchmark is on b6d2be5
Florian Burger fixes a bug that lead mms solver to break too early 60a6518
Florian Burger fixed a wrong pointer type 7c2a8f4
Florian Burger added relativistic basis in double nd operator 36c5455
Florian Burger moved tm and nd-tm operators from the mixedsolver files to a new
(better) place
ed64afc
Florian Burger forcing less iterations in mixed solver f88cd4c
Florian Burger Fixed cudaMemcpyToSymbol which did not work on cuda version > 3.2 as
this api call has changed over cuda versions
fixed Error message which was "no error" also in case of actual error
due to new call to cudaPeekAtLastError
1f5c162
Florian Burger unified the way grid and block sizes are set for solver/matrix related
kernels. Some cleanup work in nd solvers
410d105
Florian Burger Added double mpi light and nd matrix for with mpi
+started reorganizing code

TODO:
single nd matrix not working with relativistic basis
TEMPORALGAUGE broken for mpi

REMOVED:
ALTERNATE.cuh
359bdf5
Florian Burger fixes issue with temporalgauge and mpi ddf5aeb
Florian Burger adds relativistic basis to double light matrix fbfbb5d
Florian Burger code smoothening
* moving nd matrix from ASYNC.cuh to tm_nd_eo.cuh
* call to mpi and non-mpi versions now from a wrapper function preventing
a #define in the solvers
* calling wrappers around blas kernels in dev_cg_eo
b11365d
Florian Burger code rearrangement for more transparency e6c353a
Florian Burger fixed deriv_SB with MPI 18b49aa
Florian Burger added missing xchange + g_debug_level adapted c994751
Florian Burger added debug_level 7f11f1e
Florian Burger fixed a potential bug in dotprod with MPI - abtracted blas a941dc8
Florian Burger started working on clover e9100be
Florian Burger default device_num depending on MPI d81ca62
Florian Burger generic non EO inversion working in light sector 26a3c1c

chjost commented on 26a3c1c Mar 26, 2014

Trying to compile this commit gives the following error:
make: *** No rule to make target invert_noeo.d', needed bydep'. Stop.

Furthermore I cannot find a file invert_neod.* anywhere, if there should be any.

Greetings

Contributor

urbach commented Jun 3, 2014

this is now #282 so I close this one here

@urbach urbach closed this Jun 3, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment