-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Are cuModules shared between kernels from same program #61
Comments
No, currently they are not shared, each kernel instantiation has its own cuModule, so the addresses will be different (I confirmed with a test). This is arguably a design flaw in the Jitify API, and I'd been wondering if/when it would become a problem. I'd be interested to know how important it is for your application. A (hypothetical) new Jitify API that better matched the underlying CUDA APIs would allow (/require) you to provide multiple name expressions for a single program (e.g., template instantiations of multiple kernels, globals etc.), then compile it once to a single module and extract all of the kernels and global addresses. This is doable, but would take a bit of refactoring and would be a slightly less intuitive API for common use-cases. Let us know if you think something like this would be of value. |
Thanks for the reply @benbarsdell. This is certainly an issue for us, particularly when it comes to constant memory. We have a number of large constant and statically sized device symbols which we can compile within the same unit but which need to be accessed by separate kernels in the same compilation unit. Your suggestion would be very helpful for our use case but also for any use case where there are multiple kernels in the same compilation unit. Would it not be possible to simply change the internals so that the cuModule was created by the program and shared with each kernel object? We can work around the device symbols but I cant see a clear way to work around our use of constant memory. Although I am unclear if the constant memory limitations are per module/context/device. |
For the constants, could this be a good use jitify's new found linking ability: declare the |
@maddyscientist Yes this might work so long as you can link multiple kernels against the same module (containing the constant definition). Presumably this is fine as they are in the same context? |
I think linking will have the same issue because there will still be multiple modules, unless I'm misunderstanding.
The problem is that we currently have:
but what we would need is (roughly speaking):
In particular, the call to |
@benbarsdell Yes I imagine that you are right as after linking there would be multiple modules with duplicate definitions of the constant. To set the constant value would require doing this for each instaciation. I see now how this would be a significant change (but one which I would very much support!). Could you support both options? E.g.
Supposedly this would then support things like.
Which would solve all of my problems... What I am currently still unclear on is how constant memory is allocate don the device. The following SO question points to the ISA docs suggesting "There is an additional 640 KB of constant memory, organized as ten independent 64 KB regions. The driver may allocate and initialize constant buffers in these regions and pass pointers to the buffers as kernel function parameters.". Does this mean I could have a maximum of 10 jiffy kernels/modules each using 64KB of constant space, or could I have any number and some driver magic would take take or mapping these to regions at kernel launch? |
@benbarsdell We have a work around for this for now but it would be a nice feature to enable instantiation of multiple kernels from the same module. |
I.e. If I create mutiple jitify kernels from the same program which have a shared device symbol does get_global_ptr return the same address for each?
Would be good to know before I do some refactoring of some code.
The text was updated successfully, but these errors were encountered: