Skip to content

CUDA: Add accessors for function attributes beyond just register use per thread #6448

@gmarkall

Description

@gmarkall

PR #6447 adds a public API to get the maximum number of registers per thread (numba.cuda.Dispatcher.get_regs_per_thread()). There are other attributes that might be nice to provide - shared memory per block, local memory per thread, const memory usage, maximum block size.

These are all available in the FuncAttr named tuple: https://github.com/numba/numba/blob/master/numba/cuda/cudadrv/driver.py#L2023, set by _read_func_attr_all: https://github.com/numba/numba/blob/master/numba/cuda/cudadrv/driver.py#L2067

Cross ref: adding these would provide some of the info in the ptxas output requested in #3482.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CUDACUDA related issue/PRgood first issueA good issue for a first time contributor

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions