Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it necessary to bind only one GPU to one slave pod? #10

Closed
ilyee opened this issue Jan 28, 2021 · 3 comments
Closed

Is it necessary to bind only one GPU to one slave pod? #10

ilyee opened this issue Jan 28, 2021 · 3 comments

Comments

@ilyee
Copy link
Contributor

ilyee commented Jan 28, 2021

Hello, I am elihe from Zhihu. I have seen your article in Zhihu before. After reading your code I have a question:
Why is each slave pod bound to only one GPU in GetAvailableGPU method of pkg/util/gpu/allocator/allocator.go?
As far as I'm concerned, in a large-scale cluster, this will bring additional load to the master node (there will be a larger number of pod creation requests); And the creation of multiple single-card pods may cause two competing GPU mount requests all failing (for example There are 4 available GPUs and two requests to mount 4 cards. One request successfully created slave pods 1 and 2, and the other created slave pods 3 and 4. They will all be unable to obtain more resources.)
If you agree with me, can I submit a merge request to optimize this?

@pokerfaceSad
Copy link
Owner

Hi @ilyee , thanks for your issue.

In my opinion, it is really a trade off to bind only one GPU to one slave pod. Because if we request all GPUs by one slave pod, it will be complicated to unmount.

In current implementation, we just need to delete a slave pod if we will to unmount a GPU. But if we request all GPUs by only one slave pod, during a unmount operation, it will be complicated to tell kubelet and kube-scheduler the unmounted GPU is free (May be need some hack).

So I think it is really a trade off.

Please feel free to correct me if my opinion is unreasonable.

@ilyee
Copy link
Contributor Author

ilyee commented Jan 28, 2021

I fully get it, your opinion is more reasonable.

@ilyee ilyee closed this as completed Jan 28, 2021
@pokerfaceSad
Copy link
Owner

Hope more communication the future if possible.
I have sent you my WeChat ID via Zhihu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants