Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with vectorized functions on GPU #174

Closed
pculbertson opened this issue May 11, 2023 · 3 comments
Closed

Issue with vectorized functions on GPU #174

pculbertson opened this issue May 11, 2023 · 3 comments

Comments

@pculbertson
Copy link

Hi, I am using Tullio for a CUDA project and having some trouble using it to slice into CUDA arrays (maybe this is inappropriate).

Here is a minimal example of the issue:

using CUDA, Tullio

# Define function that takes slices of z on the RHS.
batch_maximum(z) = @tullio res[ii] := maximum(z[:, ii])

x_cpu = randn(2,5)
batch_maximum(x_cpu) # Runs fine.

x_gpu = CUDA.randn(2,5)
batch_maximum(x_gpu) # Errors with Reason: unsupported dynamic function invocation

Any help would be very appreciated!

@mcabbott
Copy link
Owner

This is about mapslices-like operations, and I take it that you're not really looking for maximum(z; dims=1) but want other things.

That these work at all in Tullio is a bit of an accident, I thought a bit about this long ago in #20 . Mostly Tullio wants arrays of numbers only, things for which zero(T) makes sense.

I don't think there's any nice way of doing these on GPU arrays really, with or without this package. Something like map(f, eachcol(z)) iterates on the CPU & launches one GPU operation per slice, which will be slow. Tullio instead tries to generate a kernel, but such kernels should get numbers not whole arrays, and this is what the error is trying to say.

What you probably want to do is to alter f to act on the whole array, not ones slice at a time.

@pculbertson
Copy link
Author

Thank you for the detailed (and fast!) reply! Yes, my ideal workflow would be to vmap a function along a batch dimension for a CUDA array which seems extremely difficult/inefficient for CuArrays in Julia.

The longer story is that I want to have a user specify a symbolic function f(x, p) for symbolic vectors x, p, and to then evaluate the function for in parallel for batches of parameters P. If you have any more thoughts on how to do this, they'd be much appreciated, but this seems well outside the package scope -- but thanks for your work on this, Tullio is really great.

@mcabbott
Copy link
Owner

Yes I think many people would like something like vmap. I've sent an invitation to something which may interest you...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants