-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Linking RTC code with executing program #46
Comments
Have just found I've modified On passing the 3 This is not documented as a possible error from that method, so I'm not sure if their lack of support is undocumented or I've done something incorrectly. I've tested this using Windows, Visual Studio 2015 & 2017, CUDA 10.2, Titan X Pascal (Building for My changes to the above example, and modified Jitify header can be found here, potentially of note I forked the header from the |
Unfortunately it is not possible for runtime code to gain access to symbols in the host program. Jitify does support two related things that may be useful:
I'm not very familiar with the CU_JIT_GLOBAL_SYMBOL flags, but it looks like they're used to make extern symbols in the runtime code use host addresses instead of looking for device symbols during linking. Is there a reason this is useful for your application vs. say using non-extern symbols in your runtime code and accessing them via get_constant_ptr? |
I'm working with @mondus. We're developing a new framework for developing complex system simulations, whilst abstracting away most of the CUDA (FLAMEGPU2, to improve on usability over FLAMEGPU). He's also interested in developing a runtime compiled Python interface to the library, to make it more accessible, which is where this comes in. (I'm not sure if he told you this much the other day)
Several common reused items/pointers are stored behind One potential downside of this would be that duplicating symbols would inflate constant cache usage, as we're using a large portion of constant cache for a read-only hashtable central to accessing variables within models. For potentially large models with many RTC device functions, this could lead to capacity issues. Edit: Given we'd be duplicating part/all of that hash table, this could actually fill up constant cache very fast. (The hash table is currently given half the constant cache, simply because it can't be scaled at runtime. We'd need to do some parsing of the RTC code to adapt the size per kernel presumably. This obviously requires some more though on our part.) Edit2: I suppose we could actually not have the hash table in the main runtime (for a special Python/RTC build), and cat all of the RTC code so it builds as a single unit so we're not duplicating constant memory!
At this point I'm not sure how viable it would be to load the entire library from PTX at runtime, as from my understanding that would necessitate require changing large parts of the codebase that interact with CUDA. I'm assuming, if we built the library a second time to ptx/cubin, and linked against that, it wouldn't also link against the runtime. Edit 3: I suppose the more technically adept approach would be to combined your suggestions, and build a users RTC kernel's separately and link them to achieve similar to edit 2 above. Thanks for the advice. |
@benbarsdell Seems too minor of an issue for it's own ticket, (and the fact it's a deprecated IDE). |
If the executing program has been compiled with
-rdc=true
, is it possible to link RTC compiled code with this so that it has access to shared symbols (__device__
,__constant__
)?Small example of what I mean here: (Tested in Visual Studio 2015)
https://gist.github.com/Robadob/b2b7704a36e2f679942e854a0f41082a
I've found that
cuLinkAddFile()
would need to be used.However it's not clear how that could be used to link with the executing program (and whether also building the executing program to fatbin/similar for linking would be useful).
The text was updated successfully, but these errors were encountered: