Skip to content

MEMHEAP/BASE: removed hard segments limit #10478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

hoopoepg
Copy link
Contributor

  • enabled dynamic segment's info allocation
  • removed hard limit for segments count

@hoopoepg hoopoepg force-pushed the topic/oshmem-dynamic-segment-allocation branch from 9b5d882 to ad59721 Compare June 17, 2022 11:53
@hoopoepg hoopoepg force-pushed the topic/oshmem-dynamic-segment-allocation branch from f0200bc to 8309ddf Compare June 20, 2022 10:15
return NULL;
}

if (map->n_segments >= map->capacity) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this conflicts with the assert above (line 109)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this conflicts with the assert above (line 109)

no: when capacity == 0 n_segments must be 0 too, in other cases must be (n_segments < capacity). assert is not strongly valid but covers cases when memory corruption happened

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not checking for 0 here instead of comparing n_segments and capacity?
currently it looks quite confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed

Comment on lines 70 to 73
MEMHEAP_ERROR("failed to allocate segment");
return OSHMEM_ERR_OUT_OF_RESOURCE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz make a cleanup of already allocated ones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz make a cleanup of already allocated ones

ok, done

@hoopoepg
Copy link
Contributor Author

@yosefe ok to squash?

@hoopoepg hoopoepg force-pushed the topic/oshmem-dynamic-segment-allocation branch from 4aa50cb to 87fd4b4 Compare June 27, 2022 12:33
if (map->n_segments == map->capacity) {
capacity = opal_min(mca_memheap_base_max_segments,
opal_max(map->capacity * 2,
MCA_MEMHEAP_MAX_SEGMENTS));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think need to rename MCA_MEMHEAP_MAX_SEGMENTS to MCA_MEMHEAP_MIN_SEGMENTS
since this is actually the initial/minimal value of number of segments

@@ -36,6 +36,7 @@ int mca_memheap_base_key_exchange = 1;
opal_list_t mca_memheap_base_components_opened = {{0}};
int mca_memheap_base_already_opened = 0;
mca_memheap_map_t mca_memheap_base_map = {{{{0}}}};
int mca_memheap_base_max_segments = MCA_MEMHEAP_MAX_SEGMENTS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need to limit it by default to 32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need to limit it by default to 32?

used as allowed number of segments to allocate, when limit is reached - warning prompted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rename it (and respective configuration) to something like mca_memheap_num_segments_warn
since it's not a hard limit anymore, but just threshold for showing a warning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, macro is removed, variable is renamed

@@ -59,6 +60,13 @@ static int mca_memheap_base_register(mca_base_register_flag_t flags)
MCA_BASE_VAR_SCOPE_LOCAL,
&mca_memheap_base_config.device_nic_mem_seg_size);

mca_base_var_register("oshmem", "memheap", "base", "max_segments",
"Maximum number of segments to register per process",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Maximum number of shared data segments per process",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Maximum number of shared data segments per process",

updated

void* start;
void* end;
} mem_segs[MCA_MEMHEAP_MAX_SEGMENTS];
memheap_static_segment_t *mem_segs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just allocate mca_memheap_base_max_segments up front?
this struct is small enough anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just allocate mca_memheap_base_max_segments up front?
this struct is small enough anyway

removed static segments allocation

@hoopoepg
Copy link
Contributor Author

hoopoepg commented Jul 4, 2022

@yosefe ok to squash?

@hoopoepg
Copy link
Contributor Author

@yosefe ???

@awlauria
Copy link
Contributor

awlauria commented Aug 2, 2022

@yosefe @janjust please review

@awlauria awlauria requested a review from janjust August 2, 2022 16:38
@@ -36,6 +36,7 @@ int mca_memheap_base_key_exchange = 1;
opal_list_t mca_memheap_base_components_opened = {{0}};
int mca_memheap_base_already_opened = 0;
mca_memheap_map_t mca_memheap_base_map = {{{{0}}}};
int mca_memheap_base_max_segments = MCA_MEMHEAP_MAX_SEGMENTS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rename it (and respective configuration) to something like mca_memheap_num_segments_warn
since it's not a hard limit anymore, but just threshold for showing a warning

@hoopoepg hoopoepg force-pushed the topic/oshmem-dynamic-segment-allocation branch from 0a8c5e3 to 6dc1101 Compare August 11, 2022 12:52
@@ -61,11 +61,12 @@ static int mca_memheap_base_register(mca_base_register_flag_t flags)
&mca_memheap_base_config.device_nic_mem_seg_size);

mca_base_var_register("oshmem", "memheap", "base", "max_segments",
"Maximum number of shared data segments per process",
"Display a warning if the number of segments of "
"shared heap exceeds this value",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor:
Display a warning if the number of segments of the shared memheap exceeds this value"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-worded

@yosefe
Copy link
Contributor

yosefe commented Aug 18, 2022

@hoopoepg pls squash

- enabled dynamic segment's info allocation
- removed hard limit for segments count
- fixed error handling on no enough memory
- refactpring for static segments initialization: removed
  middle segment allocation
- fixed poterntial issue in spml_ucx - incorrect error
  handling which may lead to crash

Signed-off-by: Sergey Oblomov <sergeyo@nvidia.com>
@hoopoepg hoopoepg force-pushed the topic/oshmem-dynamic-segment-allocation branch from b516a4d to 1b26bbd Compare August 18, 2022 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants