Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add KVS checkpoints #2783

Merged
merged 8 commits into from Mar 7, 2020
Merged

Conversation

garlick
Copy link
Member

@garlick garlick commented Feb 28, 2020

Thought I'd park this although it needs further work.

This adds a table to the content-sqlite database that can store arbitrary key-value pairs, then modifies the KVS to read its root blobref from this table (if available) during startup, and write its final root blobref during shutdown. In the future, the table could accommodate multiple keys for live namespaces if we want to go that way (but right now we don't need to support that).

If an instance has the content.backing-path attribute set to the path to a sqlite file when the content-sqlite module is loaded, then it is used (existing content preserved). If it is set but the file does not exist, it is created empty, and retained on exit.

The net result is the systemd unit file could set this attribute to say /var/lib/flux/content.sqlite and then the KVS is preserved across service restarts.

I got rid of the persist-filesystem and persist-directory attributes. Hopefully they will not be missed.

Couple loose ends

  • If the hash method changes (from sha1 to sha256 say) this is not handled
  • Need to check but I think sha256 hash digests are being truncated to sha1 size right now in the backing store, which would be a big gaping security hole for multi-user
  • Maybe it would be nice to have control over "restart" behavior independent of the path setting
  • Ultimately this should be handled via config not attributes, but this was easier to get launched
  • More testing required

@garlick
Copy link
Member Author

garlick commented Mar 3, 2020

Rebased on current master and forced a push.

@grondo
Copy link
Contributor

grondo commented Mar 3, 2020

Without actually looking at the changes 😛, does this make it impossible for multiple instances to share the same content backing store? Not sure if that was even a use case, but it felt like before this was actually a possibility, but with non-unique metadata, no longer so.

(Sorry if a dumb question)

@@ -149,7 +149,6 @@ static void runlevel_cb (runlevel_t *r, int level, int rc, double elapsed,
static void runlevel_io_cb (runlevel_t *r, const char *name,
const char *msg, void *arg);

static int create_persistdir (attr_t *attrs, uint32_t rank);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit - add "persist-filesystem" and "persist-directory" into commit message, as its something people may search on (maybe applies to a few other commit messages)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized I spread the removal of this over several commits. I'll try to consolidate and include a good, searchable commit message.

if (!(dir = flux_attr_get (h, "broker.rundir"))) {
flux_log_error (h, "broker.rundir");
/* If 'content.backing-path' attribute is already set, then:
* - use value is the sqlite backing file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"value is" -> "value of"?

@@ -446,14 +443,6 @@ static int content_sqlite_opendb (struct content_sqlite *ctx)
log_sqlite_error (ctx, "preparing store stmt");
goto error;
}
if (sqlite3_prepare_v2 (ctx->db,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like dump_stmt just wasn't used anymore? Perhaps should be different commit than this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I removed the code that used it in #2786 but forgot to remove the statement. I'll break it out to a separate commit.

NULL,
},
{ "metaput",
"KEY VAL",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency to flux kvs should input be "key=val" instead of "key val"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up dropping this from flux-content(1) and moving to a test program. I retained the existing usage but I assume it's no big deal in a test program.

@@ -36,12 +36,23 @@ const char *sql_load = "SELECT object,size FROM objects"
const char *sql_store = "INSERT INTO objects (hash,size,object) "
" values (?1, ?2, ?3)";

const char *sql_create_table_meta = "CREATE TABLE if not exists meta("
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message, "intented" -> "intended"?

@garlick
Copy link
Member Author

garlick commented Mar 4, 2020

Thanks @chu11 for the review comments, really helpful! I'll fix those tomorrow.

does this make it impossible for multiple instances to share the same content backing store? Not sure if that was even a use case, but it felt like before this was actually a possibility, but with non-unique metadata, no longer so.

Uhh yeah. True. I don't think it's a near term use case, but we do have a nice clean design that permits it, and this messes it up. Let me ponder whether there's a way we can do this that's looks less like a malignant growth on a perfectly good abstraction.

@garlick
Copy link
Member Author

garlick commented Mar 4, 2020

Just forced a push that refactors/rewords some of the commits and hopefully addresses all @chu11's review comments.

I went ahead and implemented a separate service called kvs-checkpoint for the checkpoint metadata. It is still implemented in the content-sqlite module at the moment, to share the same database, but I've tried to indicate that it's not part of the content store per se, and that this was done for expediency. Also, I removed the metaget/metaput subcommands from flux-content(1) and instead just implemented those in a test program since really this wasn't intended for general purpose use. Hopefully that at least makes its mixing with the content-sqlite service less of a design violation.

@garlick garlick changed the title WIP: add KVS checkpoints add KVS checkpoints Mar 4, 2020
@garlick
Copy link
Member Author

garlick commented Mar 5, 2020

This is probably ready for a review. I might be able to increase the diff coverage in a couple of cases, but a lot of the uncovered code is in hard to reach error paths.

Copy link
Member

@chu11 chu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM, only noticed this one thing

@@ -376,27 +375,31 @@ static void content_sqlite_closedb (struct content_sqlite *ctx)
if (sqlite3_finalize (ctx->store_stmt) != SQLITE_OK)
log_sqlite_error (ctx, "sqlite_finalize store_stmt");
}
if (ctx->load_stmt) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a rebase error, where ctx->load_stmt got moved (in your "content-sqlite: handle content.backing-path attr " commit)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Fixing...

Problem: The 'sql_dump' prepared statement is unused,
left over from code removed in flux-framework#2786.

Remove unused prepared statement.
Drop code that validates a 'persist-directory' attribute,
if set, or that sets 'persist-directory' to a unique
directory in 'persist-filesystem', if set.

Drop code that sets the broker 'log-filename' attribute
to a location in 'persist-directory', if the former is
unset and the latter is set.

Drop code from rc3 that attempts to write out a
KVS snapshot if 'persist-directory', if set.

Drop relevant sharness tests from t0001-basic.t.
Drop all tests from t2010-kvs-snapshot-restore.t for now.
Drop these attributes from flux-broker-attributes(7) man page.
Drop the code that manipulates the location and persistence
of the sqlite db file based on the 'persist-directory'
attribute, and replace it with the following:

If 'content.backing-path' attribute is set on entry,
use its value as the sqlite db file.  sqlite creates
this file if it doesn't exist.  On exit, it is preserved.

If 'content.backing-path' is not set on entry, use
'${rundir}/content.sqlite' as the sqlite db file.
sqlite creates this file if it doesn't exist(*).
On exit it is removed.  Set the 'content.backing-path'
attribte so the location of this file.

Add a 'flux content flush' command to rc3 just before
content-sqlite unload, to ensure that any dirty cache
entries are flushed to the backing store before content-sqlite
is unloaded.

_______
(*) This file is not expected to exist initially given that
rundir is a temporary directory, but it _could_ exit if the
content-sqlite module is reloaded.  Reusing its content
is appropriate in that case.
Add a new sqlite table to store KVS checkpoint data,
and offer a new service, 'kvs-checkpoint', with get/put
methods.  The table has the following properties:
- keys and values are strings of any length
- keys are unique
- empty string key is not allowed
- put to existing key replaces value

Note:  piggybacking this service/table onto content-sqlite is
a temporary short cut to get something working quickly.
It should probably be split out to its own module eventually,
as implementing it here will make it cumbersome to replace the
content backing module.
On rank 0 startup, try to get a 'kvs-primary' key from the
kvs-checkpoint service.  If the service is available and the
key is found, use it as the blobref for the initial KVS root.
If unavailable or not found, start with an empty directory as
before.

On rank 0 unload, try to put 'kvs-primary' to the kvs-checkpoint
service.  If the service is unavailable, don't treat this as
an error, since the instance may be purposefully running without one.
Add a test program for exercising kvs-snapshot.get
and kvs-snapshot.put service.
Describe the new content.backing-path attribute.
@chu11
Copy link
Member

chu11 commented Mar 7, 2020

Good to go now? I'll let ya set merge-when-passing if ya think so.

@garlick
Copy link
Member Author

garlick commented Mar 7, 2020

Thanks!

Restarted a builder that failed like this:

There was an error in the .travis.yml file from which we could not recover.
Unfortunately, we do not know much about this error.
Please review https://docs.travis-ci.com, or contact us at support@travis-ci.com with the error ID: cee1e4cd24984025927555fb39ff1d89

@codecov-io
Copy link

Codecov Report

Merging #2783 into master will decrease coverage by 0.05%.
The diff coverage is 63.55%.

@@            Coverage Diff             @@
##           master    #2783      +/-   ##
==========================================
- Coverage   81.03%   80.98%   -0.06%     
==========================================
  Files         250      250              
  Lines       39424    39497      +73     
==========================================
+ Hits        31949    31988      +39     
- Misses       7475     7509      +34
Impacted Files Coverage Δ
src/broker/broker.c 73.28% <ø> (-0.03%) ⬇️
src/broker/log.c 71.17% <ø> (+0.49%) ⬆️
src/modules/content-sqlite/content-sqlite.c 55.66% <59.3%> (+0.66%) ⬆️
src/modules/kvs/kvs.c 66.81% <75%> (+0.44%) ⬆️
src/common/libpmi/simple_client.c 86.33% <0%> (-3.28%) ⬇️
src/common/libflux/handle.c 83.7% <0%> (-2.01%) ⬇️
src/common/libpmi/pmi.c 91.5% <0%> (-1.89%) ⬇️
src/common/libsubprocess/local.c 80.06% <0%> (-0.35%) ⬇️
src/cmd/flux-kvs.c 81.67% <0%> (-0.23%) ⬇️
... and 5 more

@mergify mergify bot merged commit 02ef011 into flux-framework:master Mar 7, 2020
@garlick garlick deleted the checkpoint_restart branch March 7, 2020 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants