Skip to content

Conversation

@msimberg
Copy link
Collaborator

This is a starting point for docs about configuring slurm for GH200. It covers single and multiple ranks per GPU. It mentions the default process mode which can be a big footgun, and recommends use of MPS. The MPS wrapper script is the one from the current knowledge base.

I expect some refactoring might be useful if/when there are MI300 docs for slurm as some of it might be similar, but I haven't attempted to put that in a generic section yet at this point.

@msimberg msimberg requested a review from bcumming February 17, 2025 12:53
@msimberg msimberg force-pushed the slurm-gh200 branch 2 times, most recently from 7922cd8 to 3211a0b Compare February 17, 2025 16:21
@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/12

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/12

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/12

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/12

@msimberg msimberg requested a review from RMeli February 19, 2025 10:20
@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/12

@bcumming bcumming merged commit a2be996 into main Feb 20, 2025
1 check passed
@RMeli RMeli deleted the slurm-gh200 branch February 20, 2025 08:37
@RMeli RMeli mentioned this pull request Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants