Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zhash_insert()/zhashx_insert() return EEXIST #5217

Merged
merged 3 commits into from Jun 7, 2023

Conversation

chu11
Copy link
Member

@chu11 chu11 commented May 30, 2023

Problem: zhash_insert() and zhashx_insert() return -1 when a duplicate entry exists, NOT when the process is out of memory. However, a number of locations assume it returns when out of memory and set errno = ENOMEM as a result.

Do not assume zhash_insert()/zhashx_insert() fail on memory allocation errors. Always ssume duplicate entry. When necessary set/return errno = EEXIST instead of ENOMEM.

Fixes #5216

Notes:

My initial reason for doing this was just to make things consistent b/c there was inconsistency in flux-core. Don't know how necessary this is ... Maybe this was just a pointless excercise ... but since I did it, here's the PR :P

There were obviously borderline cases. Maybe another approach would be to set errno = ENOMEM when a zhash_insert() is done directly after a zhash_lookup()?

@garlick
Copy link
Member

garlick commented May 30, 2023

Did you see my comment in #5216 and look at the referenced czmq issue? I was hoping that would spark a discussion about the right way to fix this.

These are simple containers that do not change much upstream and that we use quite a bit. Maybe it would be OK to change our copy to not assert on allocation error?

@chu11
Copy link
Member Author

chu11 commented May 30, 2023

Yeah, I saw it ... I guess I had started as an exercise to see how pervasive the issue was and just continued until I finished since it didn't take long, figuring "consistency" was better than nothing if we can't get to fixing it for real. Lets chat more in that issue.

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In several cases simply changing the code to (void)zhashx_insert() is appropriate since the key is known not to be in the hash.

errno = ENOMEM;
errno = EEXIST;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inside a block that has already checked whether the group name is in the hash, so I think the insert return code can be ignored.

Comment on lines 435 to 436
if (zhashx_insert (r->entries, name, entry) < 0) {
errno = EEXIST;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, just ignore the error

Comment on lines 163 to 164
if (zhash_insert (sh->services, name, svc) < 0) {
errno = ENOMEM;
errno = EEXIST;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, ignore the error

Comment on lines 1613 to 1615
if (zhashx_insert (properties, name, ids) < 0) {
idset_destroy (ids);
errno = ENOMEM;
errno = EEXIST;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ignore return value and delete the comment above that explains the check.

Comment on lines 176 to 177
if (zhashx_insert (dcon->hash, key, dmsg) < 0) {
errno = EEXIST;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside block that already checked. Just ignore return value.

Comment on lines 96 to 97
if (zhashx_insert (queues, name, &duration) < 0) { // dups duration
errno = ENOMEM;
errno = EEXIST;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix the caller too - it hardwires "out of memory".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh good catch on this one and one below. Having the queues_insert() just return void is probably for best.

Comment on lines 167 to 168
if (zhashx_insert (queues, name, limits) < 0) { // dups limits
errno = ENOMEM;
errno = EEXIST;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix queues_insert caller error message too

Comment on lines 93 to 97
if (zhashx_insert (perilog_config.processes, &proc->id, proc) < 0) {
errno = EEXIST;
free (proc);
return NULL;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set errno after free to avoid it getting clobbered.

Comment on lines 463 to 467
if (zhashx_insert (queue->named, name, q) < 0) {
jobq_destroy (q);
errprintf (error, "error adding new queue");
errno = EEXIST;
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already checked with zhashx_lookup. Just ignore return value of insert.

Comment on lines 929 to 933
if (zhash_insert (ctx->namespaces, ns, nsm) < 0) {
namespace_destroy (nsm);
errno = EEXIST;
return NULL;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already checked, ignore return value.

@chu11 chu11 force-pushed the issue5216_zhashx_insert branch 2 times, most recently from 6f61be5 to 075578d Compare May 31, 2023 18:58
@chu11
Copy link
Member Author

chu11 commented May 31, 2023

re-pushed per @garlick comments above, mostly updating to ignore return from zhash_insert() after zhash_lookup(). In some job-manager plugins, the result was making a queue_insert() function return. I found one other case of zhash_insert() after a zhash_lookup() that I corrected in flux-exec.

I also noticed a dumb issue in cmd/flux-exec at the same time and fixed it with a commit on top.

Comment on lines 120 to 121
if (!zhashx_freefn (exitsets, buf, idset_destroy_wrapper))
log_err_exit ("zhashx_freefn");
log_msg_exit ("zhashx_freefn");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this return value should be ignored, since zhashx_freefn() only returns NULL when the item doesn't exist, but we just inserted it so we know it does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, tweaked that last commit to ignore the return value instead.

@garlick
Copy link
Member

garlick commented Jun 6, 2023

Sorry I guess I didn't get back to this for a final approval. Want to rebase and fix that conflict and then I'll give it a quick final pass?

@chu11
Copy link
Member Author

chu11 commented Jun 6, 2023

@garlick doh! i had missed that there was a conflict given a recent merge, thanks for the heads up. rebased and re-pushed. I just realized I re-pushed before doing a run of the testsuite locally. Hopefully I did not re-push too aggressively :P

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@chu11 chu11 force-pushed the issue5216_zhashx_insert branch 2 times, most recently from bab0da9 to 5d9bdfe Compare June 6, 2023 23:41
chu11 added 3 commits June 6, 2023 18:57
Problem: A comment noted an invalid errno to be returned to the
user.
Problem: zhash_insert() and zhashx_insert() return -1 when a duplicate
entry exists, NOT when the process is out of memory.  However, a number
of locations assume it returns when out of memory and set errno = ENOMEM
as a result.

Do not assume zhash_insert()/zhashx_insert() fail on memory allocation errors.
Always assume duplicate entry.  When necessary set/return errno = EEXIST
instead of ENOMEM.  When a hash insert comes directly after a hash lookup,
assume EEXIST isn't possible and do not check return value of insert call.

Fixes flux-framework#5216
Problem: zhashx_freefn() only returns an error on an invalid
hash entry.  However, we know a successful insert just occurred
so we don't have to check for an error.

Ignore return value from zhashx_freefn().
@codecov
Copy link

codecov bot commented Jun 7, 2023

Codecov Report

Merging #5217 (017b9bf) into master (df3d812) will increase coverage by 0.03%.
The diff coverage is 63.88%.

@@            Coverage Diff             @@
##           master    #5217      +/-   ##
==========================================
+ Coverage   83.72%   83.75%   +0.03%     
==========================================
  Files         448      448              
  Lines       75876    75855      -21     
==========================================
+ Hits        63527    63533       +6     
+ Misses      12349    12322      -27     
Impacted Files Coverage Δ
src/common/libkvs/kvs_txn_compact.c 70.80% <0.00%> (ø)
src/common/libsubprocess/command.c 69.81% <ø> (ø)
src/common/libsubprocess/local.c 83.98% <0.00%> (-0.41%) ⬇️
src/common/libsubprocess/remote.c 79.22% <0.00%> (-0.26%) ⬇️
src/modules/job-exec/job-exec.c 77.00% <0.00%> (-0.11%) ⬇️
src/modules/job-manager/plugins/perilog.c 76.27% <0.00%> (-0.36%) ⬇️
src/modules/kvs/kvsroot.c 70.28% <0.00%> (-0.52%) ⬇️
src/shell/output.c 76.97% <0.00%> (ø)
src/modules/job-list/job_state.c 74.24% <33.33%> (-0.10%) ⬇️
src/modules/job-manager/jobtap.c 85.72% <40.00%> (-0.26%) ⬇️
... and 14 more

... and 6 files with indirect coverage changes

@mergify mergify bot merged commit c624084 into flux-framework:master Jun 7, 2023
31 of 32 checks passed
@chu11 chu11 deleted the issue5216_zhashx_insert branch June 7, 2023 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

zhashx_insert: cannot return ENOMEM
2 participants