-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[Feature] Allow user to specify a fraction of the GPU memory. #48172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit ada1960 (more details on the Dr. CI page):
codecov.io: 1 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 29 times. |
torch/cuda/memory.py
Outdated
@@ -72,6 +72,33 @@ def caching_allocator_delete(mem_ptr): | |||
torch._C._cuda_cudaCachingAllocator_raw_delete(mem_ptr) | |||
|
|||
|
|||
def set_memory_fraction(fraction, device: Union[Device, int] = None) -> None: | |||
r"""Set memory fraction for a device. | |||
The fraction is used to limit allocated memory on a CUDA device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statement is incorrect as you are only limiting memory used for caching allocator, so running tow processes with fraction of 0.5 not going to be possible (but expected as per description) as some of the memory going to be used by operators and cuda context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin This does not work for multiprocess limiting, becasue all subproesses will get its own space and do not share info to each other. However, in most cases, one can use it to split memory, if he/she knows put the setting in subprocess. So should I just change the statement or do some more, such as to deal with the multiprocess situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin hi, I've just changed the statment and the name of this function, to make it more clearly.
58c87d1
to
2296fde
Compare
Codecov Report
@@ Coverage Diff @@
## master #48172 +/- ##
==========================================
- Coverage 81.26% 81.25% -0.01%
==========================================
Files 1840 1840
Lines 198865 198875 +10
==========================================
- Hits 161598 161588 -10
- Misses 37267 37287 +20 |
9ad2288
to
ada1960
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@VitalyFedyunin merged this pull request in 47aa253. |
@VitalyFedyunin hi, I noticed the issue #58466 and tried to fix it. However, I could not find a way to solve it with only a little change. A way I currently found is to use nvml lib to get the process info and modify the allocator total memory data. Then, user's max memory would be limited on a GPU, but it could not work when in docker env. because, the host process PID does not equal the container one if user does not open docker PID mapping. |
Add a new function, torch.cuda.set_per_process_memory_fraction(fraction, device), to torch.cuda. Related: #18626
The fraction (float type, from 0 to 1) is used to limit memory of cashing allocator on GPU device . One can set it on any visible GPU. The allowed memory equals total memory * fraction. It will raise an OOM error when try to apply GPU memory more than the allowed value. This function is similar to Tensorflow's per_process_gpu_memory_fraction
Note, this setting is just limit the cashing allocator in one process. If you are using multiprocess, you need to put this setting in to the subprocess to limit its GPU memory, because subprocess could have its own allocator.
usage
In some cases, one needs to split a GPU device as two parts. Can set limitation before GPU memory using.
Eg. device: 0, each part takes half memory, the code as follows:
There is an example to show what it is.