Skip to content

GPU fixes#378

Merged
germasch merged 14 commits intomainfrom
gpu-fixes
Feb 27, 2026
Merged

GPU fixes#378
germasch merged 14 commits intomainfrom
gpu-fixes

Conversation

@germasch
Copy link
Contributor

This fixes build errors when CUDA is enabled.

Presumably, this was done to guarantee a certain minimum version of
Thrust, but it fails with current (2.x) versions of Thrust. Let's
assume that any reasonably recent version of Thrust will work, and
don't try to enforce a specific version.
Without it, the config wasn't included, and `USE_CUDA` wasn not defined.
nvcc was complaining about "half" being redefined. It didn't say
anything about where the original definition was, maybe it's a
conflict with CUDA's `half` type. In any case, renaming the function
to `one_half` seems to fix the issue.
This is kinda hacky, since MparticlesCuda hasn't been fully
adapted, but it compiles, at least.
basically, reviving the hacky way it was handled...
Copy link
Collaborator

@JamesMcClung JamesMcClung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this related to the problem that MparticlesCuda has to be converted before being dumped to HDF5?

I don't understand why this would do anything, unless the operator() were actually being called somewhere other than in perform_diagnostic and actually got some other type of Mparticles. I trust that it compiles, I'm just confused.

(Referring to 09bdf21, gpu: fix hdf5 output with MparticlesCuda.)

@germasch
Copy link
Contributor Author

Yes, I have had a FIXME comment indicating that I took the easy way out by just converting MparticlesCuda to MparticlesSingle first, and then outputting the latter. The dispatch was done by specializing operator() for MparticlesCuda, which didn't work anymore when operator() itself become not templated, but rather used the now class-level Mparticles.

It was an easy fix, though obviously is probably even more opaque now. In the end, this probably should use some accessor rather than conversion anyway, which might improve the situation, as in, eliminates the need for special casing. Or, it could use host mirroring. Whatever, one thing is pretty clear: Having three different ways of achieving the same purpose is not KISS...

@germasch germasch merged commit 78d497e into main Feb 27, 2026
6 checks passed
@JamesMcClung JamesMcClung deleted the gpu-fixes branch February 27, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants