Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove macro expansion and replace uses with FE typing + BE lowering #5465

Merged
merged 25 commits into from Sep 7, 2020

Conversation

gmarkall
Copy link
Member

As title. WIP PR to test on CI, and for testing the change to the ROC backend (I don't have a device to test). This will need a bit of tidying up and refactoring before I remove the WIP tag.

Also includes the fix to #5408, for testing convenience.

@stuartarchibald are you able to try this on a ROC device please?

…ause of the way module globals are treated as constants
A simple demo:

```
from numba import cuda
import sys

import numpy as np

@cuda.jit
def k(x):
    i = cuda.grid(1)
    tid = cuda.threadIdx
    x[i] = tid.x

x = np.zeros(32, dtype=np.int32)

k[1, 32](x)
print(x)
```

prints:

```
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31]
 ```

There is no change in the final PTX for this simple example.
This is consistent with Numba master at present - however, in CUDA C/C++
these values are unsigned. Making them unsigned seems to cause issues in
reduce kernels, which requires investigation.
The test_set_registers_* tests were a little too strict in what they
checked - the max_registers option should limit the number of registers
used, but does not guarantee that as many registers as the limit are
used. When linking for some devices, fewer than the maximum are used.

To remedy this, the tests now check that there are less than or equal to
the max registers used. An additional test is added, to ensure that the
maximum register count would otherwise have been exceeded.
@gmarkall
Copy link
Member Author

Also, it looks like there's no use for the ir.Intrinsic node anymore - https://github.com/numba/numba/blob/master/numba/core/ir.py#L1025 - it seemed to only be used in macro expansion. There is numba.core.extending._Intrinsic ( https://github.com/numba/numba/blob/master/numba/core/extending.py#L258 ) but I think it is separate / different.

Any thoughts on the removal of ir.Intrinsic along with the macro expansion?

@esc
Copy link
Member

esc commented Apr 1, 2020

@gmarkall thanks for submitting this, I have marked it as in-progress for now.

@esc esc added this to the Numba 0.50RC milestone Apr 1, 2020
@gmarkall
Copy link
Member Author

gmarkall commented Apr 1, 2020

@esc Thanks - I'm going to split it in two as Siu suggested so that the CUDA changes can go in without needing to be blocked on testing the one change to the ROC target. I'll keep this PR for the removal of all macro expansion code, and make a new one with just CUDA.

@stuartarchibald
Copy link
Contributor

/AzurePipelines run

@azure-pipelines
Copy link
Contributor

Azure Pipelines successfully started running 1 pipeline(s).

@gmarkall
Copy link
Member Author

gmarkall commented Apr 2, 2020

A CUDA-only version of this PR is in #5481, which is ready for review - once that is approved and/or merged, I will amend this PR accordingly.

@sklam sklam added 4 - Waiting on author Waiting for author to respond to review ROC ROC related issue/PR CUDA CUDA related issue/PR and removed 3 - Ready for Review labels Jul 3, 2020
@gmarkall gmarkall modified the milestones: Numba 0.51 RC, Numba 0.52 RC Jul 24, 2020
@gmarkall
Copy link
Member Author

Per recent discussions (in the dev meeting, IIRC) it's not urgent or likely to get this into 0.51, so I've bumped the milestone to 0.52.

@stuartarchibald
Copy link
Contributor

Per recent discussions (in the dev meeting, IIRC) it's not urgent or likely to get this into 0.51, so I've bumped the milestone to 0.52.

Thanks @gmarkall

This is modelled on the implementation in the CUDA target, which should
work similarly.
gmarkall and others added 2 commits September 2, 2020 16:40
Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>
@stuartarchibald
Copy link
Contributor

For 5daa538 the things I'd expect to work ok on ROCm in light of these changes still work, so I think it's ok on that hardware target. The patch itself looks good to me, @sklam did you want to take another look? If not, this can go through the build farm and be merged if it passes. Thanks again for the large refactoring effort, very pleased to see Macro removed!

@stuartarchibald stuartarchibald added the Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm label Sep 2, 2020
@sklam
Copy link
Member

sklam commented Sep 2, 2020

It will just need to be tested on real ROC hw in the farm to pick up anything else we missed by eyeballing.

@stuartarchibald
Copy link
Contributor

It will just need to be tested on real ROC hw in the farm to pick up anything else we missed by eyeballing.

I've run this on real ROC hw locally, the things I'd expect to work ok are working.

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this and for all the fixes etc, especially guessing for non-local hardware! Looks good!

@stuartarchibald stuartarchibald added 4 - Waiting on CI Review etc done, waiting for CI to finish and removed 4 - Waiting on author Waiting for author to respond to review labels Sep 2, 2020
@esc
Copy link
Member

esc commented Sep 3, 2020

BFID: numba_smoketest_cuda_98

@esc
Copy link
Member

esc commented Sep 3, 2020

Buildfarm passed.

@esc esc added BuildFarm Passed For PRs that have been through the buildfarm and passed and removed Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Sep 3, 2020
@stuartarchibald stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on CI Review etc done, waiting for CI to finish labels Sep 7, 2020
@stuartarchibald stuartarchibald merged commit f88c3c5 into numba:master Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed CUDA CUDA related issue/PR ROC ROC related issue/PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants