Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tweak the arena mr to reduce fragmentation #845

Merged
merged 8 commits into from
Sep 9, 2021

Conversation

rongou
Copy link
Contributor

@rongou rongou commented Aug 17, 2021

We're seeing out-of-memory errors in some cases when there are plenty of free space available, probably due to memory fragmentation. This PR attempts to address the issue:

  • Added an option to dump the state when running out of memory. For now just logging the number of free blocks and total size, can be expanded later if needed.
  • Limit the number of free blocks in each per-thread arena. This seems to be helpful with the producer-consumer allocation pattern where a large number of small buffers accumulate in an arena.
  • Got rid of the allocation tracking code.

With these changes I can run those TPC-DS queries successfully with very high concurrency that previously would run out of memory. Need to do some more benchmarking to verify the behavior.

@abellina

@rongou rongou requested a review from a team as a code owner August 17, 2021 23:08
@github-actions github-actions bot added the cpp Pertains to C++ code label Aug 17, 2021
@rongou rongou added 3 - Ready for review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 17, 2021
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job. I think the byte formatting can be turned into a useful utility for the rest of RMM.

include/rmm/mr/device/arena_memory_resource.hpp Outdated Show resolved Hide resolved
include/rmm/mr/device/detail/arena.hpp Outdated Show resolved Hide resolved
include/rmm/mr/device/detail/arena.hpp Outdated Show resolved Hide resolved
include/rmm/mr/device/detail/arena.hpp Outdated Show resolved Hide resolved
include/rmm/mr/device/detail/arena.hpp Show resolved Hide resolved
include/rmm/mr/device/arena_memory_resource.hpp Outdated Show resolved Hide resolved
Co-authored-by: Mark Harris <mharris@nvidia.com>
@abellina
Copy link

Also noticed copyrights need to be updated to include 2021.

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need to update copyrights.

include/rmm/mr/device/detail/arena.hpp Show resolved Hide resolved
include/rmm/logger.hpp Show resolved Hide resolved
@caryr35 caryr35 added this to PR-WIP in v21.10 Release via automation Aug 31, 2021
@caryr35 caryr35 moved this from PR-WIP to PR-Needs review in v21.10 Release Aug 31, 2021
@harrism
Copy link
Member

harrism commented Sep 1, 2021

@rongou are you still working on this?

@rongou
Copy link
Contributor Author

rongou commented Sep 3, 2021

Got busy with some other things. Will try to get back to this next week.

Copy link

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thank you for adding the defragment case on complete failure to allocate. @jlowe fyi.

@rongou rongou requested a review from harrism September 8, 2021 21:51
v21.10 Release automation moved this from PR-Needs review to PR-Reviewer approved Sep 9, 2021
Copy link
Contributor

@codereport codereport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Couple small comments.

include/rmm/mr/device/detail/arena.hpp Outdated Show resolved Hide resolved
Comment on lines 379 to 382
std::size_t total_upstream{};
for (auto const& b : upstream_blocks_) {
total_upstream += b.size();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use accumulate same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

include/rmm/mr/device/detail/arena.hpp Show resolved Hide resolved
@rongou rongou self-assigned this Sep 9, 2021
@rongou
Copy link
Contributor Author

rongou commented Sep 9, 2021

@codereport please take another look.

Copy link
Contributor

@codereport codereport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@codereport
Copy link
Contributor

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 548a8ee into rapidsai:branch-21.10 Sep 9, 2021
v21.10 Release automation moved this from PR-Reviewer approved to Done Sep 9, 2021
@rongou rongou deleted the arena-tweaks branch October 8, 2021 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for review Ready for review by team cpp Pertains to C++ code improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants