Turns off fabric for non-cuda0 multi-GPU runs to avoid mGPU errors in USDRT#4959
Turns off fabric for non-cuda0 multi-GPU runs to avoid mGPU errors in USDRT#4959kellyguo11 merged 3 commits intoisaac-sim:developfrom
Conversation
Greptile SummaryThis PR fixes multi-GPU hangs in Isaac Lab by disabling Fabric mode for any device that is not Key changes:
Confidence Score: 2/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[XFormPrimView.__init__] --> B{fabricEnabled in settings?}
B -- No --> C[_use_fabric = False\nUSD path]
B -- Yes --> D{device == 'cpu'?}
D -- Yes --> E[_use_fabric = False\nWarning logged]
D -- No --> F{device NOT in\n'cuda' or 'cuda:0'?}
F -- Yes\ncuda:1, cuda:2, etc. --> G[_use_fabric = False\nWarning logged\nFalls back to USD]
F -- No\ncuda or cuda:0 --> H[_use_fabric = True\nFabric path enabled]
H --> I[_initialize_fabric called lazily]
I --> J[Normalize fabric_device\nto 'cuda:0']
J --> K[SelectPrims on cuda:0]
K --> L[_view_to_fabric on fabric_device\n_fabric_to_view fabricarray]
L --> M[wp.launch kernels\ndevice=fabric_device]
|
… USDRT (isaac-sim#4959) ## Description USDRT select prim currently requires cuda:0. the fix for this will be available in the next Kit version. For now, we will turn off fabric for non-cuda:0 devices to avoid the error in USDRT, which in turn will cause a hang in multi-GPU runs. ## Type of change <!-- As you go through the list, delete the ones that are not applicable. --> - Bug fix (non-breaking change which fixes an issue) ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task -->
… USDRT (isaac-sim#4959) USDRT select prim currently requires cuda:0. the fix for this will be available in the next Kit version. For now, we will turn off fabric for non-cuda:0 devices to avoid the error in USDRT, which in turn will cause a hang in multi-GPU runs. <!-- As you go through the list, delete the ones that are not applicable. --> - Bug fix (non-breaking change which fixes an issue) - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task -->
Description
USDRT select prim currently requires cuda:0. the fix for this will be available in the next Kit version.
For now, we will turn off fabric for non-cuda:0 devices to avoid the error in USDRT, which in turn will cause a hang in multi-GPU runs.
Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there