-
Notifications
You must be signed in to change notification settings - Fork 911
MEMHEAP/BASE: removed hard segments limit #10478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MEMHEAP/BASE: removed hard segments limit #10478
Conversation
hoopoepg
commented
Jun 17, 2022
- enabled dynamic segment's info allocation
- removed hard limit for segments count
9b5d882
to
ad59721
Compare
f0200bc
to
8309ddf
Compare
return NULL; | ||
} | ||
|
||
if (map->n_segments >= map->capacity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this conflicts with the assert above (line 109)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this conflicts with the assert above (line 109)
no: when capacity == 0 n_segments must be 0 too, in other cases must be (n_segments < capacity). assert is not strongly valid but covers cases when memory corruption happened
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not checking for 0
here instead of comparing n_segments
and capacity
?
currently it looks quite confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fixed
MEMHEAP_ERROR("failed to allocate segment"); | ||
return OSHMEM_ERR_OUT_OF_RESOURCE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz make a cleanup of already allocated ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz make a cleanup of already allocated ones
ok, done
@yosefe ok to squash? |
4aa50cb
to
87fd4b4
Compare
if (map->n_segments == map->capacity) { | ||
capacity = opal_min(mca_memheap_base_max_segments, | ||
opal_max(map->capacity * 2, | ||
MCA_MEMHEAP_MAX_SEGMENTS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think need to rename MCA_MEMHEAP_MAX_SEGMENTS to MCA_MEMHEAP_MIN_SEGMENTS
since this is actually the initial/minimal value of number of segments
@@ -36,6 +36,7 @@ int mca_memheap_base_key_exchange = 1; | |||
opal_list_t mca_memheap_base_components_opened = {{0}}; | |||
int mca_memheap_base_already_opened = 0; | |||
mca_memheap_map_t mca_memheap_base_map = {{{{0}}}}; | |||
int mca_memheap_base_max_segments = MCA_MEMHEAP_MAX_SEGMENTS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need to limit it by default to 32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need to limit it by default to 32?
used as allowed number of segments to allocate, when limit is reached - warning prompted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's rename it (and respective configuration) to something like mca_memheap_num_segments_warn
since it's not a hard limit anymore, but just threshold for showing a warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, macro is removed, variable is renamed
@@ -59,6 +60,13 @@ static int mca_memheap_base_register(mca_base_register_flag_t flags) | |||
MCA_BASE_VAR_SCOPE_LOCAL, | |||
&mca_memheap_base_config.device_nic_mem_seg_size); | |||
|
|||
mca_base_var_register("oshmem", "memheap", "base", "max_segments", | |||
"Maximum number of segments to register per process", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Maximum number of shared data segments per process",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Maximum number of shared data segments per process",
updated
void* start; | ||
void* end; | ||
} mem_segs[MCA_MEMHEAP_MAX_SEGMENTS]; | ||
memheap_static_segment_t *mem_segs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just allocate mca_memheap_base_max_segments up front?
this struct is small enough anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just allocate mca_memheap_base_max_segments up front?
this struct is small enough anyway
removed static segments allocation
@yosefe ok to squash? |
@yosefe ??? |
@@ -36,6 +36,7 @@ int mca_memheap_base_key_exchange = 1; | |||
opal_list_t mca_memheap_base_components_opened = {{0}}; | |||
int mca_memheap_base_already_opened = 0; | |||
mca_memheap_map_t mca_memheap_base_map = {{{{0}}}}; | |||
int mca_memheap_base_max_segments = MCA_MEMHEAP_MAX_SEGMENTS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's rename it (and respective configuration) to something like mca_memheap_num_segments_warn
since it's not a hard limit anymore, but just threshold for showing a warning
0a8c5e3
to
6dc1101
Compare
@@ -61,11 +61,12 @@ static int mca_memheap_base_register(mca_base_register_flag_t flags) | |||
&mca_memheap_base_config.device_nic_mem_seg_size); | |||
|
|||
mca_base_var_register("oshmem", "memheap", "base", "max_segments", | |||
"Maximum number of shared data segments per process", | |||
"Display a warning if the number of segments of " | |||
"shared heap exceeds this value", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor:
Display a warning if the number of segments of the shared memheap exceeds this value"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-worded
@hoopoepg pls squash |
- enabled dynamic segment's info allocation - removed hard limit for segments count - fixed error handling on no enough memory - refactpring for static segments initialization: removed middle segment allocation - fixed poterntial issue in spml_ucx - incorrect error handling which may lead to crash Signed-off-by: Sergey Oblomov <sergeyo@nvidia.com>
b516a4d
to
1b26bbd
Compare