Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable setting of kernel attributes via cuFuncSetAttribute #7897

Open
gmarkall opened this issue Mar 9, 2022 · 0 comments
Open

Enable setting of kernel attributes via cuFuncSetAttribute #7897

gmarkall opened this issue Mar 9, 2022 · 0 comments
Labels
CUDA CUDA related issue/PR feature_request

Comments

@gmarkall
Copy link
Member

gmarkall commented Mar 9, 2022

Presently we support getting some attributes of a kernel e.g. via Function.read_func_attr_all:

def read_func_attr_all(self):
nregs = self.read_func_attr(enums.CU_FUNC_ATTRIBUTE_NUM_REGS)
cmem = self.read_func_attr(enums.CU_FUNC_ATTRIBUTE_CONST_SIZE_BYTES)
lmem = self.read_func_attr(enums.CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES)
smem = self.read_func_attr(enums.CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES)
maxtpb = self.read_func_attr(
enums.CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK)
return FuncAttr(regs=nregs, const=cmem, local=lmem, shared=smem,
maxthreads=maxtpb)

However we don't provide a way to set function attributes, which would be useful for example for setting the shared memory carveout (this is an evolution of the functionality provided by cache_config() which allows for finer-grained control over the level of carveout. From the Ampere Tuning Guide:

In the NVIDIA Ampere GPU architecture, the portion of the L1 cache dedicated to shared memory (known as the carveout) can be selected at runtime as in previous architectures such as Volta, using cudaFuncSetAttribute() with the attribute cudaFuncAttributePreferredSharedMemoryCarveout. The NVIDIA A100 GPU supports shared memory capacity of 0, 8, 16, 32, 64, 100, 132 or 164 KB per SM. GPUs with compute capability 8.6 support shared memory capacity of 0, 8, 16, 32, 64 or 100 KB per SM.

(cc @gavinpotter)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA CUDA related issue/PR feature_request
Projects
Development

No branches or pull requests

2 participants