Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kvs cleanup error paths #1119

Merged
merged 13 commits into from Jul 23, 2017

Conversation

Projects
None yet
4 participants
@chu11
Copy link
Contributor

chu11 commented Jul 22, 2017

I was in the process of doing a lot of random cleanup (remove oom() calls, remove use of xzmalloc(), etc.) and found a number of circumstances where there were cleanup errors along error/cleanup
paths. Here are descriptions of some of the bigger ones:

17d1764 - if RPCs are already in flight when an error occurs, stall and let them finish, before returning error to user. This was in older KVS code and I broke it while refactoring.

a41bfed - Way back in commit 0cbeacc, there was a relatively harmless logic error (i.e. had no effect except for possibly a tad less optimal). But with the commit API refactoring, it actually mattered a bit more b/c it complicated some logic. This logic has been corrected. The key is that it is not necessary to wait twice for a dirty cache entry to become flushed. Once is sufficient.

7fb6a41 - This was actually a pretty bad mistake on my part while refactoring. If an error occurred while flushing a list of dirty cache entries, then what do you do with the dirty cache entries that were not flushed? They should be removed from the cache, otherwise they will sit there being dirty forever and un-expireable. It may seem strange that I create both a cache_entry_clear_dirty() function and a cache_remove_entry() function to deal with this cleanup. But I felt it was important that the latter only work on non-dirty entries (and could be used elsewhere in the future). So it obviated the need for a "clear dirty" function as well.

@chu11 chu11 force-pushed the chu11:kvs_cleanup_error_paths branch from 4306d76 to 8b83735 Jul 22, 2017

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jul 22, 2017

Codecov Report

Merging #1119 into master will decrease coverage by 0.02%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##           master    #1119      +/-   ##
==========================================
- Coverage   77.98%   77.96%   -0.03%     
==========================================
  Files         157      157              
  Lines       25988    26035      +47     
==========================================
+ Hits        20268    20297      +29     
- Misses       5720     5738      +18
Impacted Files Coverage Δ
src/modules/kvs/cache.c 93.24% <100%> (+0.59%) ⬆️
src/modules/kvs/kvs.c 75.49% <26.66%> (-1.96%) ⬇️
src/modules/kvs/commit.c 85.13% <87.09%> (-0.19%) ⬇️
src/common/libflux/attr.c 90.81% <0%> (-2.05%) ⬇️
src/common/libkvs/kvs_watch.c 86.76% <0%> (-0.99%) ⬇️
src/common/libflux/future.c 87.7% <0%> (-0.54%) ⬇️
src/common/libflux/message.c 81.92% <0%> (+0.47%) ⬆️
src/broker/modservice.c 80.19% <0%> (+0.99%) ⬆️
src/common/libkvs/jansson_dirent.c 83.33% <0%> (+2.77%) ⬆️
@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 22, 2017

Coverage Status

Coverage decreased (-0.04%) to 78.215% when pulling 8b83735 on chu11:kvs_cleanup_error_paths into 1e1480f on flux-framework:master.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 22, 2017

The diff target percent is pretty low, which may not be too surprising. A lot of code changes along error paths.

@chu11 chu11 force-pushed the chu11:kvs_cleanup_error_paths branch from 8b83735 to 52c792b Jul 22, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 22, 2017

Coverage Status

Coverage decreased (-0.04%) to 78.212% when pulling 52c792b on chu11:kvs_cleanup_error_paths into 1e1480f on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 22, 2017

OK, looks good to me. Thanks @chu11.

chu11 added some commits Jul 21, 2017

modules/kvs: Fix json decref in error path
The callers of store_initial_rootdir() and store_cache() expect
the function to decref the passed in object or place it in the
cache.  In the event of an error, the object must be decref'ed
regardless.
modules/kvs: Add commit_get/set_aux_errnum ().
Add convenience function commit_get_aux_errnum () and
commit_set_aux_errnum ().

These functions are for convenience for the user.  This is
primarily used by the user to indicate an error has occured
but must stall to wait for RPCs already in transit to complete.

Add unit tests appropriately.
modules/kvs: Fix stall corner case
If commit_iter_missing_refs() or commit_iter_dirty_cache_entries()
errors out, RPCs may already by in flight.  Stall for them to
complete before returning an error to user.  Use new commit_get_aux_errnum()
and commit_set_aux_errnum() to detect if prior stall was due to
an error.
modules/kvs: Fix dirty cache logic
In earlier versions of the KVS, code logic could allow
cache_entry_wait_notdirty() to be called multiple times
on the same cache entry.  By itself, introduced in
0cbeacc, this was harmless.

This code logic was carried over spiritually into the
commit API and its callback commit_iter_dirty_cache_entries().
This required the addition of flags in the cache to determine
if a cache entry had been flushed to the content store yet or not.

With the commit API and its dirty cache list, it is not longer
necessary nor reasonable that a cache entry be added to the list
multiple times.  Changing this logic will clean up the code and allow
future error handling corner cases to be handled more easily.

Adjust commit unit tests accordingly.
modules/kvs: Remove cache content_store_flag
Remove cache_entry_get_content_store_flag() and
cache_entry_set_content_store_flag(), as they are no longer used.

Adjust unit tests appropriately.
modules/kvs: Fix cache cleanup corner case
If a content store flush fails, cleanup the cache entry that
just had a json object added to it.  Do not want future callers
to believe that entry is valid.
modules/kvs: Add cache_entry_clear_dirty()
Add cache_entry_clear_dirty() to aid in cleanup paths by clearing
dirty bit if appropriate.

Add appropriate unit tests.
modules/kvs: Add cache_remove_entry()
Add cache_remove_entry() to aid in cleanup paths by removing cache
entry if appropriate.

Add appropriate unit tests.
modules/kvs: Add several error checks & logs
Add error checks and logs to content_store_get() and
content_load_completion() just in case.
modules/kvs: Fix cleanup of dirty cache entries
In commit API, from commit_unroll() or commit_iter_dirty_cache_entries(),
destroy any entries lingering on the dirty cache list that
have not yet been processed in error paths of code.

In kvs, on error with the commit_cache_cb(), cleanup cache entries that
have been passed in.

Add unit tests appropriately.
modules/kvs: Empty missing refs list on error
To prevent potential mistakes, empty missing refs from list on
error.

@chu11 chu11 force-pushed the chu11:kvs_cleanup_error_paths branch from 52c792b to de06bb7 Jul 23, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 23, 2017

rebased on master

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 23, 2017

Coverage Status

Coverage decreased (-0.03%) to 78.227% when pulling de06bb7 on chu11:kvs_cleanup_error_paths into 1e1480f on flux-framework:master.

@garlick garlick merged commit 1b8d54b into flux-framework:master Jul 23, 2017

2 of 4 checks passed

codecov/patch 66.66% of diff hit (target 77.98%)
Details
codecov/project 77.96% (-0.03%) compared to 1e1480f
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.03%) to 78.227%
Details

@grondo grondo referenced this pull request Aug 23, 2017

Closed

0.8.0 Release #1160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.