Can I limit the distributed GPUs? #8

bermeitinger-b · 2021-10-21T11:36:23Z

I'm using your script on a machine with 16 GPUs. For my tasks, I want specific GPUs to not be used or rather select which GPUs are used.

For example, I want GPUs 0-8 to be available to ts but 9-15 be left alone. Is this something that can be done?

The text was updated successfully, but these errors were encountered:

justanhduc · 2021-10-22T08:05:06Z

Hi @bermeitinger-b. If I understand it correctly then ts is not possible to auto select GPUs from a subset as it uses NVML, which will discover all possible GPUs. You can try to use -g in this case to manually specify the GPU ID. May I ask about your specific use case why you want to do that?

bermeitinger-b · 2021-10-22T08:29:20Z

Thanks, let me clarify.

I'm running experiments on a machine that has 16 GPUs. I'm running a lot of tasks and use ts with -S 24 to schedule the tasks such that they are distributed among the GPUs. That is working very well.
But there are other users on the machine that also require access to the GPUs. So, I want to limit my user's capabilities to specific GPUs (e.g. 0-8). (Of course with a reduced -S)

Fixing which job runs on which GPU with -g would make ts useless in my case. If I would know beforehand which task should run on which GPU, I could use simple shell scripts for each GPU with the tasks one after another, right?
That's the beauty of ts: As soon as one tasks finishes, it will check which GPU is free and run it there, so there are no idle GPUs.

justanhduc · 2021-10-22T09:28:28Z

I got it. I sometimes got complained about selecting GPUs randomly too. I guess limiting the visibility of GPUs to ts can be useful. If you don't mind waiting I can work on it in a few days.

justanhduc · 2021-10-22T09:39:18Z

For now you can use -g together with -D or -W to queue jobs for a specific GPU ID.

justanhduc · 2021-10-26T05:47:31Z

Hi @bermeitinger-b. I prototyped this feature in this branch. Basically, you can set an env var like TS_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 before starting ts for the first time. If ts is already up, you can use the flag --setenv TS_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 to set the env var. Please see the Readme for more details.

bermeitinger-b · 2021-10-27T10:28:33Z

Thank you very much. I tried the branch and it seems it working exactly as intended.

bermeitinger-b · 2021-10-28T11:23:41Z

Does this new limit also the number of concurrent tasks?

My current approach to use GPUs 0-8 with 16 concurrent jobs (so 2 per GPU) would be to

ts -K (to make sure)
TS_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ts -S 16
for i in $(seq 1 16); do ts -G 1 ./job.sh; done

However, I'm not seeing 16 concurrent jobs but only 8. (The jobs are small enough that the 90% full-rate is not reached.)

justanhduc · 2021-10-29T03:32:50Z

Hi @bermeitinger-b. This is intended. -G queues jobs until there is free GPU to run. A GPU is deemed free if its memory is more than 90% free. The only solution for running multiple processes on a GPU is using -g so far.
I have thought about having an option to manually set the free percentage value but never materialized. If you could wait a couple of days, I can make a quick patch for this option.

justanhduc · 2021-11-01T09:03:27Z

Hi @bermeitinger-b. You can pull to get the new features. To run multiple processes in a GPU, you can set the free memory threshold (in percentage) appropriately via ts --set_gpu_free_perc.

bermeitinger-b changed the title ~~Can I limit the distributed GPUs~~ Can I limit the distributed GPUs? Oct 21, 2021

justanhduc closed this as completed Nov 7, 2021

bermeitinger-b mentioned this issue May 30, 2024

When is the GPU memory measured when running a large list #59

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I limit the distributed GPUs? #8

Can I limit the distributed GPUs? #8

bermeitinger-b commented Oct 21, 2021 •

edited

Loading

justanhduc commented Oct 22, 2021

bermeitinger-b commented Oct 22, 2021

justanhduc commented Oct 22, 2021

justanhduc commented Oct 22, 2021

justanhduc commented Oct 26, 2021

bermeitinger-b commented Oct 27, 2021

bermeitinger-b commented Oct 28, 2021

justanhduc commented Oct 29, 2021

justanhduc commented Nov 1, 2021

Can I limit the distributed GPUs? #8

Can I limit the distributed GPUs? #8

Comments

bermeitinger-b commented Oct 21, 2021 • edited Loading

justanhduc commented Oct 22, 2021

bermeitinger-b commented Oct 22, 2021

justanhduc commented Oct 22, 2021

justanhduc commented Oct 22, 2021

justanhduc commented Oct 26, 2021

bermeitinger-b commented Oct 27, 2021

bermeitinger-b commented Oct 28, 2021

justanhduc commented Oct 29, 2021

justanhduc commented Nov 1, 2021

bermeitinger-b commented Oct 21, 2021 •

edited

Loading