Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTEST/UCT: Limit MEMIC allocation to prevent out-of-memory test failures #9892

Merged
merged 1 commit into from
May 27, 2024

Conversation

gleon99
Copy link
Contributor

@gleon99 gleon99 commented May 21, 2024

What

Cap MEMIC allocations, to reduce out-of-memory failures

Why ?

Memory is very limited, and some might already be allocated.

@@ -411,7 +411,8 @@ UCS_TEST_SKIP_COND_P(test_md, alloc,
ucs_for_each_bit(mem_type, md_attr().alloc_mem_types) {
for (unsigned i = 0; i < iterations; ++i) {
size = orig_size = ucs::rand() %
ucs_min(65536, md_attr().max_alloc);
ucs_min(65536,
ucs_max(md_attr().max_alloc / 8, 1024));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. if 1024 > max_alloc this can allocate too much
  2. simplify using helper vars

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re (1) - see line 416 below. no need to add extra checks, it's already there..

test/gtest/uct/test_atomic_key_reg_rdma_mem_type.cc Outdated Show resolved Hide resolved
@gleon99 gleon99 requested a review from yosefe May 21, 2024 16:05
@gleon99 gleon99 force-pushed the lgenkin/silence-memic-failures branch from f72c0c7 to 833bfa4 Compare May 21, 2024 16:07
@@ -384,6 +384,7 @@ UCS_TEST_SKIP_COND_P(test_md, alloc,
uct_md_h md_ref = md();
uct_alloc_method_t method = UCT_ALLOC_METHOD_MD;
unsigned num_alloc_failures = 0;
const auto max_alloc = ucs_max(md_attr().max_alloc / 8, 1024);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if max_alloc < 1024, we may allocate 1024 which is wrong
it should be

const auto max_alloc = md_attr().max_alloc;
const auto max_size = ucs_max(max_alloc / 8, ucs_min(max_alloc, 1024))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think line 417 prevents it. Anyway, I'll refactor a bit. Push shortly.

@gleon99
Copy link
Contributor Author

gleon99 commented May 21, 2024

@yosefe I made some refactoring to remove excessive ops in each iter. wdyt?

@@ -408,12 +412,13 @@ UCS_TEST_SKIP_COND_P(test_md, alloc,
pack_params.flags = UCT_MD_MKEY_PACK_FLAG_INVALIDATE_RMA |
UCT_MD_MKEY_PACK_FLAG_INVALIDATE_AMO;

max_size = ucs_min(max_size, 63356);
max_size = ucs_rounddown_pow2(max_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe do ucs_align_down_pow2(max_size, sizeof(size_t)) ?

@gleon99 gleon99 requested a review from yosefe May 22, 2024 07:48
@@ -8,6 +8,12 @@

class uct_atomic_key_reg_rdma_mem_type : public uct_amo_test {
protected:
void init() override
{
modify_config("DM_COUNT", 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it should be IB_DM_COUNT
pls validate it really works..

@gleon99 gleon99 requested a review from yosefe May 23, 2024 06:44
@gleon99 gleon99 force-pushed the lgenkin/silence-memic-failures branch from 7198c61 to d7d57e1 Compare May 23, 2024 07:08
@gleon99 gleon99 enabled auto-merge May 23, 2024 07:09
@gleon99 gleon99 merged commit 56582c1 into openucx:master May 27, 2024
142 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants