content: require backing store for checkpoint put #6251

chu11 · 2024-08-31T14:42:34Z

As noted in #6242, there appears to be an inconsitency in the content module

a content.flush will fail if a backing store is not loaded
a content.checkpoint-put will succeed if a backing store is not loaded

this appears to be inconsistent with each other. And since normally we want to content.flush before doing content.checkpoint-put, it's not clear how to handle this inconsistency.

It's not 100% clear to me why it was done this way, but my theory is it's b/c we can write data to the content module and content.flush later on when the backing module is loaded. So the idea was to support that same idea with checkpoints. When a backing module is loaded, the checkpoints stored in memory are flushed to the backing store. However, no checkpoint "flush" equivalent RPC exists.

IMO there are two things we could do to make this consistent:

A) return ENOSYS on checkpoint-put if the backing store does not exist. In /etc/rc1 and /etc/rc3 we already do special handling to ensure that backing store necessary things are not done if the none backing module is used. This would also allow fixes in #6242 and #6240 to be smoother.

B) support a content-checkpoint-flush equivalent to content.flush. So we can store checkpoints in memory and flush later on. (alternately content.flush could do both, but that requires writing a checkpoint before calling content.flush. Or alternately a new "do both" flush could be done.)

I like option A better. I think conceptually users will not think that a checkpoint should work only in memory, it works b/c a backing store exists.

AFAICT, the only thing this would change is a bunch of tests that currently test that checkpoint-put works without a backing store. In rc1 we already load the backing module before doing a content restore. So I think all cases that assume a backing store exists for checkpointing already checks that a backing store is desired.

The text was updated successfully, but these errors were encountered:

Problem: A backing store is required for content.flush but it is not required for content.checkpoint-put. This is inconsistent and can lead to checkpointing problems done the line. Require content.checkpoint-put to only work if there is a backing store available. As a consequence, remove code that handled "cached" checkpoints when a backing store is not available. Fixes flux-framework#6251

chu11 self-assigned this Aug 31, 2024

chu11 mentioned this issue Sep 3, 2024

WIP: kvs: call content.flush before checkpoint #6240

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

content: require backing store for checkpoint put #6251

content: require backing store for checkpoint put #6251

chu11 commented Aug 31, 2024 •

edited

Loading

content: require backing store for checkpoint put #6251

content: require backing store for checkpoint put #6251

Comments

chu11 commented Aug 31, 2024 • edited Loading

chu11 commented Aug 31, 2024 •

edited

Loading