Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-pack: Strengthen crash and SWMR consistencies #1690

Closed
wants to merge 1 commit into from

Conversation

Ngoguey42
Copy link
Contributor

Should it be mentioned in the changelog ? Under bugfix?

@Ngoguey42 Ngoguey42 force-pushed the more-swmr-consistency branch 4 times, most recently from 0f52db9 to a653481 Compare December 29, 2021 15:37
@@ -17,7 +17,7 @@
module type S = sig
include Irmin.Atomic_write.S

val flush : t -> unit
val sync : t -> unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to keep it calling flush for write instances.

Copy link
Contributor Author

@Ngoguey42 Ngoguey42 Dec 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed flush and added sync.

Flush was a no-op because set_entry always flushes at the end. https://github.com/mirage/irmin/blob/main/src/irmin-pack/atomic_write.ml#L82

Sync was automatically called before each unsafe_find, which is unsafe because it may catch branches that point to a non-flushed index

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with flush, good catch.

For the sync do you mean that the index/pack.store is maybe not flushed when the branch store is? I'm wondering if the problem isn't that set does not force the pack file to flush (which will force the index).
set calls W.notify. I'm not sure how watches are used, but don't we need to guarantee that both the branch tag and the commit reach the disk? (cc @zshipko ).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now irmin-pack watches don't listen for changes on disk (which is something I'd like to improve at some point). Watchers are directly notified when the process they created the watch in does a set/remove operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we use some words in unusual ways. I would say that sync is one such word. It tends to be linked in the mind of a programmer to fsync. However, in our codebase, sync tends to be a special function that is used by RO instances to detect changes in underlying files, and somehow force a reload of in-memory structures. So, the flow of information is somehow the opposite of the normal fsync. Whatever, I think we need to clearly document what "sync" is, what it is supposed to do etc. If it is for RO instances, we should make this clear. Then we can decide whether the implementation actually does what it is supposed to do.

Copy link
Member

@samoht samoht Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use push/pull instead of flush/sync? or flush/fetch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find push, pull and fetch very connotated to interactions with a remote server.

flush/reload?

[clear] the order should be: branch, index, pack and dict.

In case an rw instance concurrently flushes, the [sync] the order
should be: branch, index, pack and dict.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why sync has to be done in reverse order. Why is this different from clear and flush?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that's a very good point, read-only sync should be done in reverse order and I believe we are not doing this properly at the moment.

Copy link
Contributor Author

@Ngoguey42 Ngoguey42 Dec 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is flush that is done in the opposite order than clear and sync.

I can attempt another explanation. Consider this dependency graph:

(top)
branch
  |
  v
index
  |
  v
pack
  |
  v
dict
(bot)

crash during flush

flush top to bot (not this PR): RO gets dangling dict references

RW: flush branches/index/pack  <crash>    
RO:                                     sync all

flush bot to top (this PR). RO gets no gandling dict references

RW: flush dict  <crash>   
RO:                      sync all

concurrent sync (RW hangs) during flush

flush top to bot (not this PR): RO gets dangling dict references

RW: flush branches/index/pack            flush dict
RO:                            sync all

flush bot to top (this PR): RO gets no gangling dict references

RW: flush dict            flush index/pack/branches
RO:             sync all

concurrent sync (RO hangs) during flush

sync bot to top (not this PR): RO gets dangling dict references.

RW:            flush all
RO: sync dict             sync index/pack/branches

sync top to bot (this PR): RO gets no dangling dict references.

RW:                           flush all
RO: sync branches/index/pack             sync dict                    

crash during clear

clear bot to top (not this PR). RO gets gandling dict references

RW: clear dict  <crash>
RO:                      sync all

clear top to bot (this PR): RO gets no dangling dict references

RW: clear branches/index/pack  <crash>
RO:                                     sync all

concurrent sync (RW hangs) during clear

clear bot to top (not this PR): RO gets gangling dict references

RW: clear dict            clear pack/index/branches
RO:             sync all

clear top to bot (this PR): RO gets no dangling dict references

RW: clear branches/index/pack            clear dict
RO:                            sync all

concurrent sync (RO hangs) during clear ⚠️ ⚠️

sync bot to top (not this PR): RO gets no dangling dict references ⚠️ ⚠️

RW:            clear all
RO: sync dict             sync index/pack/branches

sync top to bot (this PR): RO gets dangling dict references ⚠️ ⚠️

RW:                           clear all
RO: sync branches/index/pack             sync dict

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now believe that sync can't be consistent with both clear and flush

Copy link
Member

@samoht samoht Dec 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is flush that is done in the opposite order than clear and sync

this doesn't sound right: clear and flush are write operations, they should be done in the same order. Sync is a read operation and should be done in the reverse order to be consistent.

Copy link
Contributor Author

@Ngoguey42 Ngoguey42 Dec 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear isn't a regular write operation, it is the only operation that removes data from the store. If writing and reading should have opposite orders, so does adding and removing data.

Changing clear to use the same order as flush wouldn't change the fact that sync can't be consistent with both clear and flush at the same time. Clearing from top to bot has the advantage of being crash consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ngoguey42 for this analysis, I agree that sync cannot be consistent with both clear and flush.

But we don't support clear anymore, the only place is still used is when opening a RW instance with fresh:true while a RO instance is running (which is anyway not very well supported #955). So I propose we move the comments about clear in the pack_store (where it is used), and not deal with the clear/sync issues for now. Wdyt?

src/irmin-pack/ext.ml Outdated Show resolved Hide resolved
@Ngoguey42 Ngoguey42 added this to the 3.0.0 milestone Jan 5, 2022
@Ngoguey42 Ngoguey42 mentioned this pull request Jan 5, 2022
33 tasks
let former_offset = IO.offset t.pack.block in
let offset = IO.force_offset t.pack.block in
if offset > former_offset then (
Dict.sync t.pack.dict;
Index.sync t.pack.index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was to prevent an unnecessary IO for index - which I think needs to read the offset from disk to determine if it needs to sync. But if we don't update the pack, there is no need to do so for the index, no?

Comment on lines 521 to 525
(* It is expected that a flush was performed before [close]. Otherwise
data is lost. (ext takes care about it).

This function does not flush because index is expected to already be
closed. *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* It is expected that a flush was performed before [close]. Otherwise
data is lost. (ext takes care about it).
This function does not flush because index is expected to already be
closed. *)
(* This function does not flush because: 1/ index is expected to already be closed and 2/ the pack file was flushed upstream by [ext]. *)

@@ -17,7 +17,7 @@
module type S = sig
include Irmin.Atomic_write.S

val flush : t -> unit
val sync : t -> unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with flush, good catch.

For the sync do you mean that the index/pack.store is maybe not flushed when the branch store is? I'm wondering if the problem isn't that set does not force the pack file to flush (which will force the index).
set calls W.notify. I'm not sure how watches are used, but don't we need to guarantee that both the branch tag and the commit reach the disk? (cc @zshipko ).

@icristescu icristescu added this to In progress in Irmin - Roadmap Jan 12, 2022
@icristescu icristescu removed this from the 3.0.0 milestone Jan 20, 2022
@@ -93,9 +93,9 @@ module Make (IO : IO.S) : S = struct
t

let close t =
(* If called from ext, flush was already called *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say: it is much better if code correctness does not depend on how it is used by others; after all, we don't know how others will use the code in the future; and verifying that it is used in a particular way can be a very expensive operation requiring traversing the entire codebase. So I would prefer correctness arguments that can be made "locally" if possible. In this particular case, my expectations surrounding close are: all associated caches are expunged, the underlying file-descr is closed, etc. Now, this implementation of close doesn't do that. It only does something if open_instances becomes 0. It doesn't even cause a flush of the IO. So already this is an extremely hard function to reason about. Again, I think writing some clear documentation about "close" is supposed to do, in dict_intf.ml, will show up all sorts of problems. Aside: I really don't like the caching of IO instances in the code, for various reasons that I can maybe explain sometime.

- the dict references nothing.

In case of crash (or an ro instance that concurrently syncs), the
[clear] the order should be: branch, index, pack and dict.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the [clear] the order ? Do you mean: in order to avoid problems when the system crashes, the order should be: ...? And then, shouldn't we flush the dict first, then the pack, etc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I really appreciate that you tried to clarify things with this documentation. I think we really need more documentation, especially in areas where there is "tricky" code.

In case of crash (or an ro instance that concurrently syncs), the
[clear] the order should be: branch, index, pack and dict.

In case an rw instance concurrently flushes, the [sync] the order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the [sync] the order mean? I don't understand this part. As above, if A references B, then doesn't B need to flush before A, to avoid any "dangling" references from A?

In case of new data in the 4 components that reference one another
and in case of crash (or an ro instance that concurrently syncs),
the [flush] order should be: dict, pack, index and branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is the right order, given what you said earlier, if we want to avoid dangling references. But don't we also need to make an argument about async flushes etc? Maybe each component is periodically writing data out anyway (which then asynchronously flushes to disk), and don't these need to be co-ordinated too, not just what happens on sync/flush?

@tomjridge
Copy link
Contributor

I added a few comments, but I would say that I don't really have a good feel for what is going on. There are 4 components involved: branch, index, pack and dict. It is not clear to me what guarantees each component gives as far as syncing data to disk is concerned: if each can write data out to disk asynchronously, then things are trickier than if they only write out on sync/flush etc. Looking at the simplest component, dict, the code uses the interface IO.S, and when I tried recently to document this interface, it seemed to me that there were quite a few buggy corner cases. So it is hard to be confident about code that uses an IO. And so on.

@@ -127,6 +127,8 @@ module Maker (Config : Conf.S) = struct
let branch_t t = t.branch

let batch t f =
let readonly = Conf.readonly t.config in
if readonly then failwith "Can't batch an RO pack store";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At various points, some particular exception is thrown "RO_not_allowed" or similar. Perhaps we should standardise on that? In general, if a function is intended for RW only, why not add a dummy argument batch ~rw_only:() t f to mark that the interface should not be called in RO instances? Similarly ~ro_only could mark those interfaces (sync?) that are for RO instances only.

@samoht
Copy link
Member

samoht commented Mar 7, 2022

@tomjridge the way the components interact is described in https://tarides.com/blog/2020-09-01-introducing-irmin-pack - do you have a more precise question in mind? Happy to answer those of course.

Regarding:

here were quite a few buggy corner case

Could you point out which corner case? It would be useful to either open issues about those so that we don't forget and even better write small reproduction tests so that we can make sure any fix would be definite.

More generally, it would be helpful to have a more systematic way to address those issues, for instance using something like https://github.com/CraigFe/ocaml-fiu

@tomjridge
Copy link
Contributor

Regarding:

here were quite a few buggy corner case

Could you point out which corner case? It would be useful to either open issues about those so that we don't forget and even better write small reproduction tests so that we can make sure any fix would be definite.

More generally, it would be helpful to have a more systematic way to address those issues, for instance using something like https://github.com/CraigFe/ocaml-fiu

I was referring to the IO_intf documentation PR #1758 , particularly the comments on line 92 and following where there seems to be at least the possibility of a bug, and probably more than one (at least, I also hit another problem with this code when trying to read the full data in the file - if the file size is reported as n bytes, you can actually only read n-16 bytes, which causes issues with a simple copy loop where you expect to read n bytes). And if we wrote down the actual invariants this code was supposed to satisfy, I think we would find some more. I think we are still processing this PR - ie we are tracking the issue and not just dropping it on the floor.

@samoht
Copy link
Member

samoht commented Mar 8, 2022

particularly the comments on line 92 and following where there seems to be at least the possibility of a bug,

Line 92 is truncate which is used by clear which not is used by Tezos (and only used by the tests so far), so it's not critical it seems?

@tomjridge
Copy link
Contributor

particularly the comments on line 92 and following where there seems to be at least the possibility of a bug,

Line 92 is truncate which is used by clear which not is used by Tezos (and only used by the tests so far), so it's not critical it seems?

Yes precisely that! So Craig and I were not overly worried even if there was a bug there. And I think clear/truncate is maybe going away anyway so perhaps the whole thing sorts itself out. At least, it seemed sensible not to jump immediately to bug fighting mode.

@icristescu icristescu mentioned this pull request Mar 17, 2022
@Ngoguey42
Copy link
Contributor Author

Superseded by #1865

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Irmin - Roadmap
In progress
Development

Successfully merging this pull request may close these issues.

None yet

6 participants