[CIR][CUDA] FIx CUDA host compilation on kernel launch #1906
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements some missing blocks that allow us to effectively allow us to launch kernels from the host. All of the tests stated in this commit are now resolved.
I spent half a day figuring the following:
I tried experiementing performing host compilation(
-fcuda-is-device
) with target triple:nvptx64-nvidia-cuda
but was getting a module verification error that, to keep it simple looked like:error: 'cir.call' op calling convention mismatch: expected ptx_kernel, but provided c
.I thought that was expected given that we're essentially using the device to compile on the host, which doesn't make a lot of sense. until I tried to replicate the same in OG and didn't really run into any problem in that regard. Are the calling conventions enforced in CIR much more strict as compared to OG? Or is that simply a bug from OG?