Skip to content

Conversation

CalvinXKY
Copy link
Contributor

@CalvinXKY CalvinXKY commented Nov 18, 2020

Add a new function, torch.cuda.set_per_process_memory_fraction(fraction, device), to torch.cuda. Related: #18626
The fraction (float type, from 0 to 1) is used to limit memory of cashing allocator on GPU device . One can set it on any visible GPU. The allowed memory equals total memory * fraction. It will raise an OOM error when try to apply GPU memory more than the allowed value. This function is similar to Tensorflow's per_process_gpu_memory_fraction
Note, this setting is just limit the cashing allocator in one process. If you are using multiprocess, you need to put this setting in to the subprocess to limit its GPU memory, because subprocess could have its own allocator.

usage

In some cases, one needs to split a GPU device as two parts. Can set limitation before GPU memory using.
Eg. device: 0, each part takes half memory, the code as follows:

torch.cuda.set_per_process_memory_fraction(0.5, 0)

There is an example to show what it is.

import torch
torch.cuda.set_per_process_memory_fraction(0.5, 0)
torch.cuda.empty_cache()
total_memory = torch.cuda.get_device_properties(0).total_memory
# less than 0.5 will be ok:
tmp_tensor = torch.empty(int(total_memory * 0.499), dtype=torch.int8, device='cuda')
del tmp_tensor
torch.cuda.empty_cache()
# this allocation will raise a OOM:
torch.empty(total_memory // 2, dtype=torch.int8, device='cuda')

"""
It raises an error as follows: 
RuntimeError: CUDA out of memory. Tried to allocate 5.59 GiB (GPU 0; 11.17 GiB total capacity; 0 bytes already allocated; 10.91 GiB free; 5.59 GiB allowed; 0 bytes reserved in total by PyTorch)
"""

@dr-ci
Copy link

dr-ci bot commented Nov 18, 2020

💊 CI failures summary and remediations

As of commit ada1960 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

codecov.io: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 29 times.

@zhangguanheng66 zhangguanheng66 added the module: cuda Related to torch.cuda, and CUDA support in general label Nov 18, 2020
@zhangguanheng66 zhangguanheng66 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 18, 2020
@@ -72,6 +72,33 @@ def caching_allocator_delete(mem_ptr):
torch._C._cuda_cudaCachingAllocator_raw_delete(mem_ptr)


def set_memory_fraction(fraction, device: Union[Device, int] = None) -> None:
r"""Set memory fraction for a device.
The fraction is used to limit allocated memory on a CUDA device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement is incorrect as you are only limiting memory used for caching allocator, so running tow processes with fraction of 0.5 not going to be possible (but expected as per description) as some of the memory going to be used by operators and cuda context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin This does not work for multiprocess limiting, becasue all subproesses will get its own space and do not share info to each other. However, in most cases, one can use it to split memory, if he/she knows put the setting in subprocess. So should I just change the statement or do some more, such as to deal with the multiprocess situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin hi, I've just changed the statment and the name of this function, to make it more clearly.

@CalvinXKY CalvinXKY force-pushed the master branch 4 times, most recently from 58c87d1 to 2296fde Compare November 19, 2020 09:26
@codecov
Copy link

codecov bot commented Nov 19, 2020

Codecov Report

Merging #48172 (ada1960) into master (df0ae24) will decrease coverage by 0.00%.
The diff coverage is 10.00%.

@@            Coverage Diff             @@
##           master   #48172      +/-   ##
==========================================
- Coverage   81.26%   81.25%   -0.01%     
==========================================
  Files        1840     1840              
  Lines      198865   198875      +10     
==========================================
- Hits       161598   161588      -10     
- Misses      37267    37287      +20     

@CalvinXKY CalvinXKY force-pushed the master branch 3 times, most recently from 9ad2288 to ada1960 Compare November 20, 2020 08:37
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in 47aa253.

@CalvinXKY
Copy link
Contributor Author

CalvinXKY commented Jan 11, 2022

@VitalyFedyunin hi, I noticed the issue #58466 and tried to fix it. However, I could not find a way to solve it with only a little change. A way I currently found is to use nvml lib to get the process info and modify the allocator total memory data. Then, user's max memory would be limited on a GPU, but it could not work when in docker env. because, the host process PID does not equal the container one if user does not open docker PID mapping.
Through this way to satisfy part of cases and only need to import "nvml.h" lib and change less than 100 lines code. Is this OK?
Another method is to change function name and clarify its using situation in doc. Any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: cuda Related to torch.cuda, and CUDA support in general open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants