Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: single-GPU setups, improving worker 0 utilization #15

Closed
simra opened this issue Oct 10, 2022 · 1 comment
Closed

RFC: single-GPU setups, improving worker 0 utilization #15

simra opened this issue Oct 10, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@simra
Copy link
Contributor

simra commented Oct 10, 2022

This issue is to discuss a known limitation which is that FLUTE expects a minimum of two GPUs for any CUDA-based training. There must always be a Worker 0 GPU and then at least one more for client training. It would be valuable to be able to specify arbitrary mappings so that, say, Worker 0 and Worker 1 share the same GPU. From a memory standpoint this should be ok because they never need the GPU at the same time. I'm not sure that torch.distributed can support arbitrary mappings (note: CUDA_VISIBLE_DEVICES=0,0 doesn't work as a solution). Alternatively if we could assign worker 0 to cpu and worker 1+ to GPUs that might be a reasonable solution- relatively speaking, model aggregation is less expensive and could potentially be done on CPU.

Thoughts?

@simra simra added the question Further information is requested label Oct 10, 2022
@Mirian-Hipolito Mirian-Hipolito added enhancement New feature or request and removed question Further information is requested labels Jan 3, 2023
@Mirian-Hipolito
Copy link
Contributor

Hi Rob, this issue has been addressed in the latest commit 43e1530. We have removed the hard constraint of a minimum number of GPUs available in FLUTE by allowing to instantiate Server and Clients in the same worker device. For more documentation about how to run an experiments using a single GPU, please refer to the README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants