Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement IR-only pipline for CUDA (was "Pull NativeLowering pass out into pipeline") #6728

Closed
wants to merge 28 commits into from

Conversation

gmarkall
Copy link
Member

No description provided.

Instead, leave it up to the caller - that way, callers that just want
the IR can avoid materializing the module.
@gmarkall gmarkall changed the title [WIP] Pull NativeLowering pass out into pipeline [WIP] Implement IR-only pipline for CUDA (was "Pull NativeLowering pass out into pipeline") Feb 17, 2021
The NVVM IR version metadata needs to be present in all modules passed
to NVVM. The IR version was only set when kernel wrapper functions were
generated, so device functions never had the IR version added.

This commit remedies this by adding the IR version to all modules in the
CUDA target, rather than relying on the kernel wrapper generation
function to do it.

Fixes numba#6719.
The previous commit caused libnvvm.so to be loaded at import time for
various reasons, some of which could be resolved by reworking imports in
the CUDA target, but the @cuda.jit-decoration of functions in
numba.cuda.intrinsic_wrapper would force the load of libnvvm.so through
an eventual call to compile_device_template. compile_device_template
imports the CUDA target descriptor, which creates an empty module and
needs to load NVVM to determine which version it is for the metadata
addition.

This commit resolves the issue by modifying the CUDA target descriptor
such that it only initializes the typing and target contexts when they
are required - compile_device_template only needs the typing context, so
the target context initialization (and therefore the load of libnvvm.so)
are avoided at import time. The modifications to the descriptor also
bring it into line with the idiom used for the CPU and ufunc targets,
which also construct a single instance of the target class rather than
having class variables for the typing and target contexts.

Unfortunately it is not yet possible to move the imports of the CUDA
target descriptor up to the module level in numba.cuda.compiler, as this
creates a circular import. This may be solvable with more effort :-)
CUDACodeLibrary should not call CodeLibrary's add_ir_module, this
results in binding layer stuff happening.

`inspect_llvm` needed a fix because it was using the finalized IR from
CodeLibrary.
type annotation is required.
Error is:

```
TypeError: can only assign string to GUFunc.__name__, not 'property'
```
@@ -11,6 +11,12 @@


class CUDACodeLibrary(CodeLibrary):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: needs some commentary on the differences between it and other CodeLibrary implementations, in particular how it only handles IR from llvmlite.ir, not entities from llvmlite.binding.

@gmarkall
Copy link
Member Author

Closing this now that the work in it has gone into #6731 and #6735.

@gmarkall gmarkall closed this Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants