You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As noted in #6242, there appears to be an inconsitency in the content module
a content.flush will fail if a backing store is not loaded
a content.checkpoint-put will succeed if a backing store is not loaded
this appears to be inconsistent with each other. And since normally we want to content.flush before doing content.checkpoint-put, it's not clear how to handle this inconsistency.
It's not 100% clear to me why it was done this way, but my theory is it's b/c we can write data to the content module and content.flush later on when the backing module is loaded. So the idea was to support that same idea with checkpoints. When a backing module is loaded, the checkpoints stored in memory are flushed to the backing store. However, no checkpoint "flush" equivalent RPC exists.
IMO there are two things we could do to make this consistent:
A) return ENOSYS on checkpoint-put if the backing store does not exist. In /etc/rc1 and /etc/rc3 we already do special handling to ensure that backing store necessary things are not done if the none backing module is used. This would also allow fixes in #6242 and #6240 to be smoother.
B) support a content-checkpoint-flush equivalent to content.flush. So we can store checkpoints in memory and flush later on. (alternately content.flush could do both, but that requires writing a checkpoint before calling content.flush. Or alternately a new "do both" flush could be done.)
I like option A better. I think conceptually users will not think that a checkpoint should work only in memory, it works b/c a backing store exists.
AFAICT, the only thing this would change is a bunch of tests that currently test that checkpoint-put works without a backing store. In rc1 we already load the backing module before doing a content restore. So I think all cases that assume a backing store exists for checkpointing already checks that a backing store is desired.
The text was updated successfully, but these errors were encountered:
Problem: A backing store is required for content.flush but it
is not required for content.checkpoint-put. This is inconsistent
and can lead to checkpointing problems done the line.
Require content.checkpoint-put to only work if there is a backing
store available. As a consequence, remove code that handled
"cached" checkpoints when a backing store is not available.
Fixesflux-framework#6251
chu11
added a commit
to chu11/flux-core
that referenced
this issue
Sep 3, 2024
Problem: A backing store is required for content.flush but it
is not required for content.checkpoint-put. This is inconsistent
and can lead to checkpointing problems done the line.
Require content.checkpoint-put to only work if there is a backing
store available. As a consequence, remove code that handled
"cached" checkpoints when a backing store is not available.
Fixesflux-framework#6251
chu11
added a commit
to chu11/flux-core
that referenced
this issue
Sep 4, 2024
Problem: A backing store is required for content.flush but it
is not required for content.checkpoint-put. This is inconsistent
and can lead to checkpointing problems done the line.
Require content.checkpoint-put to only work if there is a backing
store available. As a consequence, remove code that handled
"cached" checkpoints when a backing store is not available.
Fixesflux-framework#6251
As noted in #6242, there appears to be an inconsitency in the content module
content.flush
will fail if a backing store is not loadedcontent.checkpoint-put
will succeed if a backing store is not loadedthis appears to be inconsistent with each other. And since normally we want to
content.flush
before doingcontent.checkpoint-put
, it's not clear how to handle this inconsistency.It's not 100% clear to me why it was done this way, but my theory is it's b/c we can write data to the content module and
content.flush
later on when the backing module is loaded. So the idea was to support that same idea with checkpoints. When a backing module is loaded, the checkpoints stored in memory are flushed to the backing store. However, no checkpoint "flush" equivalent RPC exists.IMO there are two things we could do to make this consistent:
A) return ENOSYS on
checkpoint-put
if the backing store does not exist. In/etc/rc1
and/etc/rc3
we already do special handling to ensure that backing store necessary things are not done if thenone
backing module is used. This would also allow fixes in #6242 and #6240 to be smoother.B) support a
content-checkpoint-flush
equivalent tocontent.flush
. So we can store checkpoints in memory and flush later on. (alternatelycontent.flush
could do both, but that requires writing a checkpoint before callingcontent.flush
. Or alternately a new "do both" flush could be done.)I like option
A
better. I think conceptually users will not think that acheckpoint
should work only in memory, it works b/c a backing store exists.AFAICT, the only thing this would change is a bunch of tests that currently test that
checkpoint-put
works without a backing store. Inrc1
we already load the backing module before doing a content restore. So I think all cases that assume a backing store exists for checkpointing already checks that a backing store is desired.The text was updated successfully, but these errors were encountered: