New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Implement IR-only pipline for CUDA (was "Pull NativeLowering pass out into pipeline") #6728
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Instead, leave it up to the caller - that way, callers that just want the IR can avoid materializing the module.
…s pass, the global kernel one finds unexpected func_retval
fixes numba.cuda.tests.cudapy.test_array_args.TestCudaArrayArg.test_array_ary breaks libdevice compile tests
Problems remaining: - Atomics - Exceptions - Debug info
A little fixup needed for intrinsic_ir in native_lowering. Skip a test where I'm not sure what the eventual intended outcome will be.
gmarkall
changed the title
[WIP] Pull NativeLowering pass out into pipeline
[WIP] Implement IR-only pipline for CUDA (was "Pull NativeLowering pass out into pipeline")
Feb 17, 2021
The NVVM IR version metadata needs to be present in all modules passed to NVVM. The IR version was only set when kernel wrapper functions were generated, so device functions never had the IR version added. This commit remedies this by adding the IR version to all modules in the CUDA target, rather than relying on the kernel wrapper generation function to do it. Fixes numba#6719.
The previous commit caused libnvvm.so to be loaded at import time for various reasons, some of which could be resolved by reworking imports in the CUDA target, but the @cuda.jit-decoration of functions in numba.cuda.intrinsic_wrapper would force the load of libnvvm.so through an eventual call to compile_device_template. compile_device_template imports the CUDA target descriptor, which creates an empty module and needs to load NVVM to determine which version it is for the metadata addition. This commit resolves the issue by modifying the CUDA target descriptor such that it only initializes the typing and target contexts when they are required - compile_device_template only needs the typing context, so the target context initialization (and therefore the load of libnvvm.so) are avoided at import time. The modifications to the descriptor also bring it into line with the idiom used for the CPU and ufunc targets, which also construct a single instance of the target class rather than having class variables for the typing and target contexts. Unfortunately it is not yet possible to move the imports of the CUDA target descriptor up to the module level in numba.cuda.compiler, as this creates a circular import. This may be solvable with more effort :-)
CUDACodeLibrary should not call CodeLibrary's add_ir_module, this results in binding layer stuff happening. `inspect_llvm` needed a fix because it was using the finalized IR from CodeLibrary.
type annotation is required.
Error is: ``` TypeError: can only assign string to GUFunc.__name__, not 'property' ```
…ss without skipping
gmarkall
commented
Feb 17, 2021
@@ -11,6 +11,12 @@ | |||
|
|||
|
|||
class CUDACodeLibrary(CodeLibrary): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: needs some commentary on the differences between it and other CodeLibrary implementations, in particular how it only handles IR from llvmlite.ir, not entities from llvmlite.binding.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.