Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add AvoidOOM to avoid OOM #7434

Merged
merged 21 commits into from
May 25, 2022
Merged

Conversation

BIGWangYuDong
Copy link
Collaborator

@BIGWangYuDong BIGWangYuDong commented Mar 17, 2022

First, trying to change torch.mm to torch.einsum to avoid OOM:
before change, mAP: 0.331

inter_matrix = torch.mm(flatten_masks, flatten_masks.transpose(1, 0))

after change, mAP: 0.331

inter_matrix = torch.einsum('ik, kj -> ij', flatten_masks,  flatten_masks.transpose(1, 0))

But found it cannot save GPU memory.

torch.mm torch.einsum
process time (s) 4.69856 4.5713
GPU memory (MiB) 807 807

To avoid OOM, we add a class, which can try to convert inputs to FP16 and CPU if got a PyTorch's CUDA Out of Memory error.
It will do the following steps:

  1. first retry after calling torch.cuda.empty_cache().
  2. If that still fails, it will then retry by converting inputs to FP16.
  3. If that still fails try to convert inputs to CPUs. In this case, it expects the function to dispatch to the CPU implementation.

TODO:

  • Add docs in FAQ

Close: #6908

mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
@jbwang1997
Copy link
Collaborator

If we can only set keep_dtype to make users decide whether the outputs are still the same as the input data type or are converted to the avoid OOM data type (fp16, cpu).

mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Mar 17, 2022

Codecov Report

Merging #7434 (929ea0d) into dev (151a803) will decrease coverage by 0.58%.
The diff coverage is 41.02%.

@@            Coverage Diff             @@
##              dev    #7434      +/-   ##
==========================================
- Coverage   65.09%   64.50%   -0.59%     
==========================================
  Files         357      360       +3     
  Lines       28852    29233     +381     
  Branches     4891     4954      +63     
==========================================
+ Hits        18782    18858      +76     
- Misses       9061     9370     +309     
+ Partials     1009     1005       -4     
Flag Coverage Δ
unittests 64.49% <41.02%> (-0.61%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmdet/utils/memory.py 40.25% <40.25%> (ø)
mmdet/utils/__init__.py 100.00% <100.00%> (ø)
mmdet/models/detectors/__init__.py 100.00% <0.00%> (ø)
mmdet/models/dense_heads/__init__.py 100.00% <0.00%> (ø)
mmdet/models/dense_heads/solo_head.py 65.18% <0.00%> (ø)
mmdet/models/dense_heads/solov2_head.py 9.83% <0.00%> (ø)
mmdet/models/detectors/solov2.py 83.33% <0.00%> (ø)
mmdet/core/bbox/assigners/max_iou_assigner.py 73.68% <0.00%> (+1.31%) ⬆️
mmdet/models/roi_heads/test_mixins.py 52.85% <0.00%> (+2.14%) ⬆️
mmdet/models/dense_heads/dense_test_mixins.py 43.20% <0.00%> (+2.46%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 151a803...929ea0d. Read the comment docs.

mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
docs/en/faq.md Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
docs/en/faq.md Outdated Show resolved Hide resolved
docs/zh_cn/faq.md Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
docs/en/faq.md Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Show resolved Hide resolved
@BIGWangYuDong BIGWangYuDong requested a review from chhluo May 13, 2022 03:27
docs/en/faq.md Outdated Show resolved Hide resolved
docs/en/faq.md Outdated Show resolved Hide resolved
docs/en/faq.md Outdated Show resolved Hide resolved
mmdet/utils/memory.py Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
@BIGWangYuDong
Copy link
Collaborator Author

Update the logic in AvoidOOM, which defaults to return source type and device without any interface. This makes the codes look simpler.

mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
mmdet/utils/memory.py Outdated Show resolved Hide resolved
docs/en/faq.md Show resolved Hide resolved
@ZwwWayne ZwwWayne merged commit 7b03639 into open-mmlab:dev May 25, 2022
ZwwWayne pushed a commit that referenced this pull request Jul 18, 2022
* [Feature] Add AvoidOOM to avoid OOM

* support multiple outputs

* add docs in faq

* add docs in faq

* fix logic

* minor fix

* minor fix

* minor fix

* minor fix

* add the tutorials of using avoidoom as a decorator

* minor fix

* add convert tensor type test unit

* minor fix

* minor fix
ZwwWayne pushed a commit to ZwwWayne/mmdetection that referenced this pull request Jul 19, 2022
* [Feature] Add AvoidOOM to avoid OOM

* support multiple outputs

* add docs in faq

* add docs in faq

* fix logic

* minor fix

* minor fix

* minor fix

* minor fix

* add the tutorials of using avoidoom as a decorator

* minor fix

* add convert tensor type test unit

* minor fix

* minor fix
ZwwWayne pushed a commit to ZwwWayne/mmdetection that referenced this pull request Jul 19, 2022
* [Feature] Add AvoidOOM to avoid OOM

* support multiple outputs

* add docs in faq

* add docs in faq

* fix logic

* minor fix

* minor fix

* minor fix

* minor fix

* add the tutorials of using avoidoom as a decorator

* minor fix

* add convert tensor type test unit

* minor fix

* minor fix
@BIGWangYuDong BIGWangYuDong deleted the avoidoom branch July 20, 2022 05:09
SakiRinn pushed a commit to SakiRinn/mmdetection-locount that referenced this pull request Mar 17, 2023
* [Feature] Add AvoidOOM to avoid OOM

* support multiple outputs

* add docs in faq

* add docs in faq

* fix logic

* minor fix

* minor fix

* minor fix

* minor fix

* add the tutorials of using avoidoom as a decorator

* minor fix

* add convert tensor type test unit

* minor fix

* minor fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants