Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for additional vendor GPUs #541

Closed
dmcdougall opened this issue Oct 3, 2023 · 6 comments
Closed

Support for additional vendor GPUs #541

dmcdougall opened this issue Oct 3, 2023 · 6 comments
Labels

Comments

@dmcdougall
Copy link

dmcdougall commented Oct 3, 2023

Short story

I'd like to contribute support for additional GPUs. I work for AMD, so my priority is to support interfaces for AMD GPUs through the HIP interface, but I see no reason I shouldn't use this opportunity to build a generic enough interface that a different vendor can contribute an implementation for their hardware.

I'm happy to do this, and I'm working with folks involved in the NWChemEx project that would find it extremely useful. And it would give you more users.

Is this contribution something you are interested in and would accept?

Long story

I have been working with some folks adjacent to a scientific code called NWChemEx. There's a lot of dependencies there, but the short story is that one of their dependencies, LibIntX, uses cuda-api-wrappers for marshalling work off to a CUDA backend for running on NVIDIA GPUs. They've requested something similar for AMD GPUs and so I thought I would open an issue here for a few reasons:

  1. To let you know about it;
  2. To ask you if this is a contribution you are happy to receive;
  3. To propose a structure a solution to accommodate this request without breaking your existing users;
  4. To get feedback on all of the above.

I've been interacting with Andrey Asadchev to get his feeling for what would work for him when he's working in LibIntX. I think a good way to think about this is simply to take an existing example. A snippet from the vectorAdd example currently looks like this:

	auto device = cuda::device::current::get();
	auto d_A = cuda::memory::device::make_unique<float[]>(device, numElements);
	auto d_B = cuda::memory::device::make_unique<float[]>(device, numElements);
	auto d_C = cuda::memory::device::make_unique<float[]>(device, numElements);

	cuda::memory::copy(d_A.get(), h_A.get(), size);
	cuda::memory::copy(d_B.get(), h_B.get(), size);

	auto launch_config = cuda::launch_config_builder()
		.overall_size(numElements)
		.block_size(256)
		.build();

	std::cout
		<< "CUDA kernel launch with " << launch_config.dimensions.grid.x
		<< " blocks of " << launch_config.dimensions.block.x << " threads each\n";

	cuda::launch(
		vectorAdd, launch_config,
		d_A.get(), d_B.get(), d_C.get(), numElements
	);

Andrey and I proposed a more generic interface that looks like this:

	auto device = gpu::device::current::get();
	auto d_A = gpu::memory::device::make_unique<float[]>(device, numElements);
	auto d_B = gpu::memory::device::make_unique<float[]>(device, numElements);
	auto d_C = gpu::memory::device::make_unique<float[]>(device, numElements);

	gpu::memory::copy(d_A.get(), h_A.get(), size);
	gpu::memory::copy(d_B.get(), h_B.get(), size);

	auto launch_config = gpu::launch_config_builder()
		.overall_size(numElements)
		.block_size(256)
		.build();

	std::cout
		<< "GPU kernel launch with " << launch_config.dimensions.grid.x
		<< " blocks of " << launch_config.dimensions.block.x << " threads each\n";

	gpu::launch(
		vectorAdd, launch_config,
		d_A.get(), d_B.get(), d_C.get(), numElements
	);

Basically, everything stays the same but the generic interface makes no references to CUDA specifically. The backend can offload to CUDA, HIP, SYCL, and so on, by simply setting a preprocessor macro to a specific value. Perhaps something like #define CAW_BACKEND cuda for CUDA or #define CAW_BACKEND hip for HIP. This approach would mean that your thin interface layer can stay header-only and anybody using your software could simply set up this macro in their build system and pass it to the compiler with the -DCAW_BACKEND=... flag.

As far as file organisation goes, I think it makes sense to put backend-specific code in their own subdirectories. This will be pretty disruptive. Perhaps an example of what I'm thinking of is like this:

cuda-api-wrappers/
  - src/
    - generic/  # generic api stuff here
    - cuda/  # cuda backend
    - hip/  # hip backend
  - examples/ # all examples become generic gpu:: instead of cuda::

This will require some pretty disruptive and invasive changes to the build system, but this structure at least lends itself to putting a CMakeLists.txt file in each vendor-specific backend and having the top-level CMakeLists.txt simply include whichever vendor-specific one the user requested when they ask for, for example, cmake -DCAW_BACKEND=cuda .. or cmake -DCAW_BACKEND=hip.

I'd need to work out the details, but hopefully this paints enough of a picture that you can let me know if this is something you're happy with.

I not only welcome comments and criticisms, but I heavily encourage them.

@eyalroz
Copy link
Owner

eyalroz commented Oct 3, 2023

I am interested, in principle, in an expansion of this library - or at least the idea and the way of doing things in this library - to additional vendors' bespoke GPU use APIs. But there are a lot of caveats. Some of them involve (in no particular order):

  • My hope to expand / adapt this library towards OpenCL support
  • The limits of my time + who would be willing to commit to maintaining this for at least several years
  • The high degree of CUDA specificity of the library. Yes, there is a subset which could be common to multiple vendors, but I'm not sure it is all that large
  • The question of whether it is better to expand or to fork
  • This library not currently being committed to API stability. For example, if I move ahead with Switch memory functions to only work with regions (and spans) #322 , there will no longer be any unique_ptr's, but rather unique_span's, i.e. variables with unique ownership but including the size information; so the vectorAdd example would be rewritten. Of course one can still use an older release of the library, but those don't get bug fixes.

I think the best thing to do is for us to have a voice/video chat. If that's possible, please e-mail me so we can set one up.

@dmcdougall
Copy link
Author

Those are good points. Yes, I think it'll be easier to talk on the phone. You're the most knowledgeable about this code, so your insight would be valuable here. I'll include Andrey on the email and that way you will have a user's perspective as well.

@eyalroz
Copy link
Owner

eyalroz commented Oct 10, 2023

I'm held up at a meeting at work today, and I forget whether we're already at the time we set last week for a meeting. If we are, then - please excuse me, and I'll be in touch ASAP.

@dmcdougall
Copy link
Author

Not a problem. I just hope that you are safe.

Our meeting is scheduled for tomorrow Oct 11th 2023 at 11:30am UTC-5 (7:30pm UTC+3).

@eyalroz
Copy link
Owner

eyalroz commented Oct 10, 2023

(sigh) great... see you tomorrow :-)

@eyalroz
Copy link
Owner

eyalroz commented Oct 11, 2023

So, it looks like the direction for now will be a fork, to be based off of the soon-to-be-released v0.6.5, and covering some of the core functionality - enough to satisfy the needs of the libintx project. If we find there is wide enough interest and potential resource-commitment, we may consider a more ambitious endeavor.

@eyalroz eyalroz closed this as completed Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants