[ROCm] support torch._C._set_sm_carveout_experimental - Parity with Nvidia

### 🐛 Describe the bug

Hi @hliuca

On Nvidia, they support `torch._C._set_sm_carveout_experimental` for better compute-comms overlapping. this is useful during bwd pass of DDP and fwd/bwd pass of FSDP to ensure there is enough available `SM/CUs` for the rccl comms kernels to not be blocked by compute kernels that use up all the `SM/CUs`

Furthermore, it is useful towards benchmarking real world GEMMs that occurs in the backwards pass when the GEMM is unavailable to take up all the available `SM/CUs` due to rccl comms kernels occupying some of the `SM/CUs`

related to #147966

I was looking into implementing this myself but it seems like it isn't as simple as calling `hipblasLtMatmulDescSetAttribute` as it requires changes to `hipblaslt` itself since unlike `cublasLtMatmulDescSetAttribute`, `HIPBLASLT_MATMUL_DESC_CU_COUNT_TARGET` is not an option for `hipblasLtMatmulDescSetAttribute` function which  takes in enum of `hipblasLtMatmulDescAttributes_t` at least according to the AMD docs

https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/datatypes.html#_CPPv431hipblasLtMatmulDescAttributes_t

```cpp
computeDesc.setAttribute<int32_t>(
        CUBLASLT_MATMUL_DESC_SM_COUNT_TARGET,
        at::cuda::getCurrentDeviceProperties()->multiProcessorCount -
            at::globalContext()._SMCarveout_EXPERIMENTAL().value());
  }
```

### Versions

any rocm torch version

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] support torch._C._set_sm_carveout_experimental - Parity with Nvidia #149280

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ROCm] support torch._C._set_sm_carveout_experimental - Parity with Nvidia #149280

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions