You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not certain whether this issue is most appropriately situated in the CUDArt or CUDAdrv repositories or both. I'm posting in both, but will remove it from one or the other if advised so.
I am interesting in having the ability to compile ptx modules that include external functions in them and then import those as functions to use/launch from within Julia. The particular example I was recently working with was for CUBLAS functions, but the principal is far wider. I inquired about the issue on Stack Overflow here. I had thought that it would be relatively manageable, but from the answer I got, it actually sounds like it is relatively complex and involved. On the plus side, it does appear that there are precedents for establishing this kind of capability, e.g. with the JCUDA framework for Java.
I could potentially assist with such an implementation, but I doubt I'd be well positioned to take it on all myself.
Thoughts?
The text was updated successfully, but these errors were encountered:
This would be very nice because it would allow one to write a gpuMap function which applies an external function on a GPU array. That would reduce the complexity of GPU programming immensely if one could use vectorized functions to solve the problem.
I'm not certain whether this issue is most appropriately situated in the CUDArt or CUDAdrv repositories or both. I'm posting in both, but will remove it from one or the other if advised so.
I am interesting in having the ability to compile ptx modules that include external functions in them and then import those as functions to use/launch from within Julia. The particular example I was recently working with was for CUBLAS functions, but the principal is far wider. I inquired about the issue on Stack Overflow here. I had thought that it would be relatively manageable, but from the answer I got, it actually sounds like it is relatively complex and involved. On the plus side, it does appear that there are precedents for establishing this kind of capability, e.g. with the JCUDA framework for Java.
I could potentially assist with such an implementation, but I doubt I'd be well positioned to take it on all myself.
Thoughts?
The text was updated successfully, but these errors were encountered: