[WIP] Implement IR-only pipline for CUDA (was "Pull NativeLowering pass out into pipeline") #6728

gmarkall · 2021-02-16T10:18:53Z

No description provided.

Instead, leave it up to the caller - that way, callers that just want the IR can avoid materializing the module.

…s pass, the global kernel one finds unexpected func_retval

…ping is OK

fixes numba.cuda.tests.cudapy.test_array_args.TestCudaArrayArg.test_array_ary breaks libdevice compile tests

Problems remaining: - Atomics - Exceptions - Debug info

A little fixup needed for intrinsic_ir in native_lowering. Skip a test where I'm not sure what the eventual intended outcome will be.

The NVVM IR version metadata needs to be present in all modules passed to NVVM. The IR version was only set when kernel wrapper functions were generated, so device functions never had the IR version added. This commit remedies this by adding the IR version to all modules in the CUDA target, rather than relying on the kernel wrapper generation function to do it. Fixes numba#6719.

…us bits

The previous commit caused libnvvm.so to be loaded at import time for various reasons, some of which could be resolved by reworking imports in the CUDA target, but the @cuda.jit-decoration of functions in numba.cuda.intrinsic_wrapper would force the load of libnvvm.so through an eventual call to compile_device_template. compile_device_template imports the CUDA target descriptor, which creates an empty module and needs to load NVVM to determine which version it is for the metadata addition. This commit resolves the issue by modifying the CUDA target descriptor such that it only initializes the typing and target contexts when they are required - compile_device_template only needs the typing context, so the target context initialization (and therefore the load of libnvvm.so) are avoided at import time. The modifications to the descriptor also bring it into line with the idiom used for the CPU and ufunc targets, which also construct a single instance of the target class rather than having class variables for the typing and target contexts. Unfortunately it is not yet possible to move the imports of the CUDA target descriptor up to the module level in numba.cuda.compiler, as this creates a circular import. This may be solvable with more effort :-)

CUDACodeLibrary should not call CodeLibrary's add_ir_module, this results in binding layer stuff happening. `inspect_llvm` needed a fix because it was using the finalized IR from CodeLibrary.

type annotation is required.

Error is: ``` TypeError: can only assign string to GUFunc.__name__, not 'property' ```

… the CPU target)

…ss without skipping

gmarkall · 2021-02-17T21:23:29Z

numba/cuda/codegen.py

@@ -11,6 +11,12 @@


 class CUDACodeLibrary(CodeLibrary):


Note to self: needs some commentary on the differences between it and other CodeLibrary implementations, in particular how it only handles IR from llvmlite.ir, not entities from llvmlite.binding.

gmarkall · 2021-02-19T14:26:17Z

Closing this now that the work in it has gone into #6731 and #6735.

gmarkall added 3 commits February 16, 2021 10:18

Pull NativeLowering pass out into pipeline

53e2a7a

Add NativeLowering pass to test pipelines

ff17a30

Don't implicitly materialize LLVM modules in lowering

cc4bfce

Instead, leave it up to the caller - that way, callers that just want the IR can avoid materializing the module.

stuartarchibald added the 2 - In Progress label Feb 16, 2021

gmarkall added 11 commits February 16, 2021 14:17

Add IR module in object mode lowering

24aa60c

[WIP] Adding cuda compiler pipeline and backend. 3 of 4 compiler test…

95c6e36

…s pass, the global kernel one finds unexpected func_retval

[WIP] A little progress with cudapy test suite. Maybe the kernel wrap…

102328c

…ping is OK

Link IR modules withNVVM

2c4a97c

fixes numba.cuda.tests.cudapy.test_array_args.TestCudaArrayArg.test_array_ary breaks libdevice compile tests

WIP: Can get through entire test suite now.

234ad21

Problems remaining: - Atomics - Exceptions - Debug info

[WIP] Fix atomics tests - needed slightly different replacement patterns

5401c46

[WIP] Fix detection of cooperative kernels

790e785

[WIP] fix debuginfo - DISPFlags transformation not needed

e548652

[WIP] Fix a few small things

89853f9

[WIP] All tests pass (but 2 skipped)

f4dbb98

A little fixup needed for intrinsic_ir in native_lowering. Skip a test where I'm not sure what the eventual intended outcome will be.

add ir_result to _LowerResult in object mode

f3c9534

gmarkall changed the title ~~[WIP] Pull NativeLowering pass out into pipeline~~ [WIP] Implement IR-only pipline for CUDA (was "Pull NativeLowering pass out into pipeline") Feb 17, 2021

gmarkall added 13 commits February 17, 2021 10:09

[WIP] Undo the unnecessary ir_module changes and remove some extraneo…

af2de66

…us bits

Tidy up a bit

135fa29

CUDACodeLibrary should not call CodeLibrary's add_ir_module, this results in binding layer stuff happening. `inspect_llvm` needed a fix because it was using the finalized IR from CodeLibrary.

Fix flake8, some other small changes

9111481

Remove dummy bits of state

5e92fa1

type annotation is required.

Some more tidy up

a5f3a59

Some more tidying up

1cfe4b5

Undo a small inconsequential change

4f67ee0

Attempt to fix typeguard error

6412622

Error is: ``` TypeError: can only assign string to GUFunc.__name__, not 'property' ```

Undo some now-unnecessary changes

9a9ad01

Make things a bit more like the original structure (more in line with…

558b83d

… the CPU target)

(Hopefully) correct setting of linkage of functions. All tests now pa…

ff97a97

…ss without skipping

gmarkall commented Feb 17, 2021

View reviewed changes

Add link to PR 890 discussion

acada28

gmarkall closed this Feb 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Implement IR-only pipline for CUDA (was "Pull NativeLowering pass out into pipeline") #6728

[WIP] Implement IR-only pipline for CUDA (was "Pull NativeLowering pass out into pipeline") #6728

gmarkall commented Feb 16, 2021

gmarkall Feb 17, 2021

gmarkall commented Feb 19, 2021

[WIP] Implement IR-only pipline for CUDA (was "Pull NativeLowering pass out into pipeline") #6728

[WIP] Implement IR-only pipline for CUDA (was "Pull NativeLowering pass out into pipeline") #6728

Conversation

gmarkall commented Feb 16, 2021

gmarkall Feb 17, 2021

Choose a reason for hiding this comment

gmarkall commented Feb 19, 2021