Support for additional vendor GPUs #541

dmcdougall · 2023-10-03T17:08:25Z

Short story

I'd like to contribute support for additional GPUs. I work for AMD, so my priority is to support interfaces for AMD GPUs through the HIP interface, but I see no reason I shouldn't use this opportunity to build a generic enough interface that a different vendor can contribute an implementation for their hardware.

I'm happy to do this, and I'm working with folks involved in the NWChemEx project that would find it extremely useful. And it would give you more users.

Is this contribution something you are interested in and would accept?

Long story

I have been working with some folks adjacent to a scientific code called NWChemEx. There's a lot of dependencies there, but the short story is that one of their dependencies, LibIntX, uses cuda-api-wrappers for marshalling work off to a CUDA backend for running on NVIDIA GPUs. They've requested something similar for AMD GPUs and so I thought I would open an issue here for a few reasons:

To let you know about it;
To ask you if this is a contribution you are happy to receive;
To propose a structure a solution to accommodate this request without breaking your existing users;
To get feedback on all of the above.

I've been interacting with Andrey Asadchev to get his feeling for what would work for him when he's working in LibIntX. I think a good way to think about this is simply to take an existing example. A snippet from the vectorAdd example currently looks like this:

	auto device = cuda::device::current::get();
	auto d_A = cuda::memory::device::make_unique<float[]>(device, numElements);
	auto d_B = cuda::memory::device::make_unique<float[]>(device, numElements);
	auto d_C = cuda::memory::device::make_unique<float[]>(device, numElements);

	cuda::memory::copy(d_A.get(), h_A.get(), size);
	cuda::memory::copy(d_B.get(), h_B.get(), size);

	auto launch_config = cuda::launch_config_builder()
		.overall_size(numElements)
		.block_size(256)
		.build();

	std::cout
		<< "CUDA kernel launch with " << launch_config.dimensions.grid.x
		<< " blocks of " << launch_config.dimensions.block.x << " threads each\n";

	cuda::launch(
		vectorAdd, launch_config,
		d_A.get(), d_B.get(), d_C.get(), numElements
	);

Andrey and I proposed a more generic interface that looks like this:

	auto device = gpu::device::current::get();
	auto d_A = gpu::memory::device::make_unique<float[]>(device, numElements);
	auto d_B = gpu::memory::device::make_unique<float[]>(device, numElements);
	auto d_C = gpu::memory::device::make_unique<float[]>(device, numElements);

	gpu::memory::copy(d_A.get(), h_A.get(), size);
	gpu::memory::copy(d_B.get(), h_B.get(), size);

	auto launch_config = gpu::launch_config_builder()
		.overall_size(numElements)
		.block_size(256)
		.build();

	std::cout
		<< "GPU kernel launch with " << launch_config.dimensions.grid.x
		<< " blocks of " << launch_config.dimensions.block.x << " threads each\n";

	gpu::launch(
		vectorAdd, launch_config,
		d_A.get(), d_B.get(), d_C.get(), numElements
	);

Basically, everything stays the same but the generic interface makes no references to CUDA specifically. The backend can offload to CUDA, HIP, SYCL, and so on, by simply setting a preprocessor macro to a specific value. Perhaps something like #define CAW_BACKEND cuda for CUDA or #define CAW_BACKEND hip for HIP. This approach would mean that your thin interface layer can stay header-only and anybody using your software could simply set up this macro in their build system and pass it to the compiler with the -DCAW_BACKEND=... flag.

As far as file organisation goes, I think it makes sense to put backend-specific code in their own subdirectories. This will be pretty disruptive. Perhaps an example of what I'm thinking of is like this:

cuda-api-wrappers/
  - src/
    - generic/  # generic api stuff here
    - cuda/  # cuda backend
    - hip/  # hip backend
  - examples/ # all examples become generic gpu:: instead of cuda::

This will require some pretty disruptive and invasive changes to the build system, but this structure at least lends itself to putting a CMakeLists.txt file in each vendor-specific backend and having the top-level CMakeLists.txt simply include whichever vendor-specific one the user requested when they ask for, for example, cmake -DCAW_BACKEND=cuda .. or cmake -DCAW_BACKEND=hip.

I'd need to work out the details, but hopefully this paints enough of a picture that you can let me know if this is something you're happy with.

I not only welcome comments and criticisms, but I heavily encourage them.

The text was updated successfully, but these errors were encountered:

eyalroz · 2023-10-03T21:11:53Z

I am interested, in principle, in an expansion of this library - or at least the idea and the way of doing things in this library - to additional vendors' bespoke GPU use APIs. But there are a lot of caveats. Some of them involve (in no particular order):

My hope to expand / adapt this library towards OpenCL support
The limits of my time + who would be willing to commit to maintaining this for at least several years
The high degree of CUDA specificity of the library. Yes, there is a subset which could be common to multiple vendors, but I'm not sure it is all that large
The question of whether it is better to expand or to fork
This library not currently being committed to API stability. For example, if I move ahead with Switch memory functions to only work with regions (and spans) #322 , there will no longer be any unique_ptr's, but rather unique_span's, i.e. variables with unique ownership but including the size information; so the vectorAdd example would be rewritten. Of course one can still use an older release of the library, but those don't get bug fixes.

I think the best thing to do is for us to have a voice/video chat. If that's possible, please e-mail me so we can set one up.

dmcdougall · 2023-10-04T16:07:26Z

Those are good points. Yes, I think it'll be easier to talk on the phone. You're the most knowledgeable about this code, so your insight would be valuable here. I'll include Andrey on the email and that way you will have a user's perspective as well.

eyalroz · 2023-10-10T17:18:51Z

I'm held up at a meeting at work today, and I forget whether we're already at the time we set last week for a meeting. If we are, then - please excuse me, and I'll be in touch ASAP.

dmcdougall · 2023-10-10T17:36:00Z

Not a problem. I just hope that you are safe.

Our meeting is scheduled for tomorrow Oct 11th 2023 at 11:30am UTC-5 (7:30pm UTC+3).

eyalroz · 2023-10-10T17:59:43Z

(sigh) great... see you tomorrow :-)

eyalroz · 2023-10-11T19:05:40Z

So, it looks like the direction for now will be a fork, to be based off of the soon-to-be-released v0.6.5, and covering some of the core functionality - enough to satisfy the needs of the libintx project. If we find there is wide enough interest and potential resource-commitment, we may consider a more ambitious endeavor.

eyalroz added the question label Oct 3, 2023

eyalroz closed this as completed Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for additional vendor GPUs #541

Support for additional vendor GPUs #541

dmcdougall commented Oct 3, 2023 •

edited by eyalroz

eyalroz commented Oct 3, 2023 •

edited

dmcdougall commented Oct 4, 2023

eyalroz commented Oct 10, 2023

dmcdougall commented Oct 10, 2023

eyalroz commented Oct 10, 2023

eyalroz commented Oct 11, 2023

Support for additional vendor GPUs #541

Support for additional vendor GPUs #541

Comments

dmcdougall commented Oct 3, 2023 • edited by eyalroz

Short story

Long story

eyalroz commented Oct 3, 2023 • edited

dmcdougall commented Oct 4, 2023

eyalroz commented Oct 10, 2023

dmcdougall commented Oct 10, 2023

eyalroz commented Oct 10, 2023

eyalroz commented Oct 11, 2023

dmcdougall commented Oct 3, 2023 •

edited by eyalroz

eyalroz commented Oct 3, 2023 •

edited