Remove macro expansion and replace uses with FE typing + BE lowering #5465

gmarkall · 2020-03-31T14:26:06Z

As title. WIP PR to test on CI, and for testing the change to the ROC backend (I don't have a device to test). This will need a bit of tidying up and refactoring before I remove the WIP tag.

Also includes the fix to #5408, for testing convenience.

@stuartarchibald are you able to try this on a ROC device please?

…ause of the way module globals are treated as constants

A simple demo: ``` from numba import cuda import sys import numpy as np @cuda.jit def k(x): i = cuda.grid(1) tid = cuda.threadIdx x[i] = tid.x x = np.zeros(32, dtype=np.int32) k[1, 32](x) print(x) ``` prints: ``` [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31] ``` There is no change in the final PTX for this simple example.

This is consistent with Numba master at present - however, in CUDA C/C++ these values are unsigned. Making them unsigned seems to cause issues in reduce kernels, which requires investigation.

The test_set_registers_* tests were a little too strict in what they checked - the max_registers option should limit the number of registers used, but does not guarantee that as many registers as the limit are used. When linking for some devices, fewer than the maximum are used. To remedy this, the tests now check that there are less than or equal to the max registers used. An additional test is added, to ensure that the maximum register count would otherwise have been exceeded.

gmarkall · 2020-03-31T14:37:27Z

Also, it looks like there's no use for the ir.Intrinsic node anymore - https://github.com/numba/numba/blob/master/numba/core/ir.py#L1025 - it seemed to only be used in macro expansion. There is numba.core.extending._Intrinsic ( https://github.com/numba/numba/blob/master/numba/core/extending.py#L258 ) but I think it is separate / different.

Any thoughts on the removal of ir.Intrinsic along with the macro expansion?

esc · 2020-04-01T07:44:38Z

@gmarkall thanks for submitting this, I have marked it as in-progress for now.

gmarkall · 2020-04-01T08:40:12Z

@esc Thanks - I'm going to split it in two as Siu suggested so that the CUDA changes can go in without needing to be blocked on testing the one change to the ROC target. I'll keep this PR for the removal of all macro expansion code, and make a new one with just CUDA.

stuartarchibald · 2020-04-02T10:16:50Z

/AzurePipelines run

azure-pipelines · 2020-04-02T10:16:58Z

Azure Pipelines successfully started running 1 pipeline(s).

gmarkall · 2020-04-02T12:25:09Z

A CUDA-only version of this PR is in #5481, which is ready for review - once that is approved and/or merged, I will amend this PR accordingly.

…m size

gmarkall · 2020-07-24T15:36:20Z

Per recent discussions (in the dev meeting, IIRC) it's not urgent or likely to get this into 0.51, so I've bumped the milestone to 0.52.

stuartarchibald · 2020-07-24T17:01:57Z

Per recent discussions (in the dev meeting, IIRC) it's not urgent or likely to get this into 0.51, so I've bumped the milestone to 0.52.

Thanks @gmarkall

This is modelled on the implementation in the CUDA target, which should work similarly.

numba/roc/hsaimpl.py

numba/cuda/tests/cudapy/test_lang.py

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

stuartarchibald · 2020-09-02T15:48:00Z

For 5daa538 the things I'd expect to work ok on ROCm in light of these changes still work, so I think it's ok on that hardware target. The patch itself looks good to me, @sklam did you want to take another look? If not, this can go through the build farm and be merged if it passes. Thanks again for the large refactoring effort, very pleased to see Macro removed!

sklam · 2020-09-02T15:53:11Z

It will just need to be tested on real ROC hw in the farm to pick up anything else we missed by eyeballing.

stuartarchibald · 2020-09-02T15:54:23Z

It will just need to be tested on real ROC hw in the farm to pick up anything else we missed by eyeballing.

I've run this on real ROC hw locally, the things I'd expect to work ok are working.

stuartarchibald

Thanks for working on this and for all the fixes etc, especially guessing for non-local hardware! Looks good!

esc · 2020-09-03T11:15:50Z

BFID: numba_smoketest_cuda_98

esc · 2020-09-03T17:21:56Z

Buildfarm passed.

gmarkall added 16 commits March 30, 2020 13:10

[WIP] Typing of threadIdx seems to work a bit but lowering does not

56c0702

Typing works, closer to what is needed, but lowering is a problem bec…

f7e86f9

…ause of the way module globals are treated as constants

Implement tid.y and tid.z

924851a

Make dim3 signed

de56e63

This is consistent with Numba master at present - however, in CUDA C/C++ these values are unsigned. Making them unsigned seems to cause issues in reduce kernels, which requires investigation.

Implement blockIdx, blockDim and gridDim without macros

c1d1d63

Implement laneid and warpsize without macro

f959857

Implement grid without a macro

2b44afc

Implement gridsize without macro

5809893

Implement shared without macro

7d5e9c9

Implement local array without macro

b490ffc

Implement const.array_like without macros

194f1c9

Rename TestMacro to TestSharedMemoryCreation

5c5364a

Blind attempt to replace macro in HSA backend

9b04dec

Remove macro expansion pass and associated types

c390252

esc added the 2 - In Progress label Apr 1, 2020

esc added this to the Numba 0.50RC milestone Apr 1, 2020

stuartarchibald modified the milestones: Numba 0.50 RC, Numba 0.51 RC May 14, 2020

gmarkall added 4 commits June 4, 2020 09:29

Merge remote-tracking branch 'numba/master' into grm-remove-macros

dd488af

Pull some verions of files from master that were missed in the merge

b79c191

Remove import of Macro from CUDA stubs

a2f14ed

Use parse_shape and parse_dtype in HSA, only accept literals for shme…

b860e5c

…m size

sklam added 4 - Waiting on author Waiting for author to respond to review ROC ROC related issue/PR CUDA CUDA related issue/PR and removed 3 - Ready for Review labels Jul 3, 2020

stuartarchibald assigned stuartarchibald and sklam Jul 8, 2020

gmarkall modified the milestones: Numba 0.51 RC, Numba 0.52 RC Jul 24, 2020

Attempt to fix lowering of roc.shared.array

a9a3246

This is modelled on the implementation in the CUDA target, which should work similarly.

stuartarchibald reviewed Sep 2, 2020

View reviewed changes

numba/roc/hsaimpl.py Show resolved Hide resolved

Add missing import

98f987a

stuartarchibald reviewed Sep 2, 2020

View reviewed changes

numba/cuda/tests/cudapy/test_lang.py Outdated Show resolved Hide resolved

gmarkall and others added 2 commits September 2, 2020 16:40

Update numba/cuda/tests/cudapy/test_lang.py

6afafcf

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

Update skip message for test on simulator

5daa538

stuartarchibald added the Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm label Sep 2, 2020

stuartarchibald approved these changes Sep 2, 2020

View reviewed changes

stuartarchibald added 4 - Waiting on CI Review etc done, waiting for CI to finish and removed 4 - Waiting on author Waiting for author to respond to review labels Sep 2, 2020

esc added BuildFarm Passed For PRs that have been through the buildfarm and passed and removed Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Sep 3, 2020

stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on CI Review etc done, waiting for CI to finish labels Sep 7, 2020

stuartarchibald merged commit f88c3c5 into numba:master Sep 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove macro expansion and replace uses with FE typing + BE lowering #5465

Remove macro expansion and replace uses with FE typing + BE lowering #5465

gmarkall commented Mar 31, 2020

gmarkall commented Mar 31, 2020

esc commented Apr 1, 2020

gmarkall commented Apr 1, 2020

stuartarchibald commented Apr 2, 2020

azure-pipelines bot commented Apr 2, 2020

gmarkall commented Apr 2, 2020

gmarkall commented Jul 24, 2020

stuartarchibald commented Jul 24, 2020

stuartarchibald commented Sep 2, 2020

sklam commented Sep 2, 2020 •

edited

stuartarchibald commented Sep 2, 2020

stuartarchibald left a comment

esc commented Sep 3, 2020

esc commented Sep 3, 2020

Remove macro expansion and replace uses with FE typing + BE lowering #5465

Remove macro expansion and replace uses with FE typing + BE lowering #5465

Conversation

gmarkall commented Mar 31, 2020

gmarkall commented Mar 31, 2020

esc commented Apr 1, 2020

gmarkall commented Apr 1, 2020

stuartarchibald commented Apr 2, 2020

azure-pipelines bot commented Apr 2, 2020

gmarkall commented Apr 2, 2020

gmarkall commented Jul 24, 2020

stuartarchibald commented Jul 24, 2020

stuartarchibald commented Sep 2, 2020

sklam commented Sep 2, 2020 • edited

stuartarchibald commented Sep 2, 2020

stuartarchibald left a comment

Choose a reason for hiding this comment

esc commented Sep 3, 2020

esc commented Sep 3, 2020

sklam commented Sep 2, 2020 •

edited