Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: add eventual consistency test coverage #1832

Closed
garlick opened this issue Nov 12, 2018 · 6 comments
Closed

kvs: add eventual consistency test coverage #1832

garlick opened this issue Nov 12, 2018 · 6 comments
Assignees

Comments

@garlick
Copy link
Member

garlick commented Nov 12, 2018

The kvs makes several guarantees for eventual consistency:

  • Causal consistency: If process A communicates with process B that it has updated a data item (passing a store version in that message), a subsequent access by process B will return the updated value.
  • Read-your-writes consistency: A process having updated a data item, never accesses an older value.
  • Monotonic read consistency: If a process has seen a particular value for an object, any subsequent accesses will never return previous values.

We should have tests for these specific guarantees.

Since the relaxed consistency is mainly due to the open loop setroot event, we might consider adding test hooks that allow the setroot event publishing to be delayed until some triggering event, so that non-racy tests can be written to validate these guarantees.

@chu11
Copy link
Member

chu11 commented Dec 12, 2018

Replying to a comment from @garlick in #1863.

Thoughts on testing read-your-writes e.g. as suggested in #1832 ?

My initial thought was creating some type of pause / unpause of all setroot events. When setroots are paused, queue them up, then send them out on an unpause.

So imagine we can do tests like this:

flux kvs put a=0
flux kvs setroot pause
flux exec -r 1 flux kvs put a=1
flux exec -r 2 flux kvs getroot // this should get the root of a=0
flux exec -r 1 flux kvs getroot // this should get the root of a=1
flux kvs setroot unpause
flux exec -r 2 flux kvs getroot // gets the root of a=1

i believe the above test effectively tests read-your-writes (maybe they should be flux kvs gets instead, but the main idea is the same). A variant should be able to test causal I think.

Dunno about monotonic, I think that might be implicitly taken care of with many of the stress / racy tests?

@chu11 chu11 self-assigned this Dec 20, 2018
@chu11
Copy link
Member

chu11 commented Dec 26, 2018

My initial thought was creating some type of pause / unpause of all setroot events. When setroots are
paused, queue them up, then send them out on an unpause.

When I initially thought about this, I thought about pausing the "send" of setroot events. It ends up this won't work. I had forgotten that setroot events also contain the fence names used during a transaction. If a rank can't receive those names, then transactions will never complete.

What may work is actually pause / unpause of "received" setroot events. So the test would be a bit different.

flux kvs put a=0
flux exec -r 2 flux kvs setroot pause // pause on rank 2 only
flux exec -r 1 flux kvs put a=1
flux exec -r 2 flux kvs get a // this should get 0
flux exec -r 1 flux kvs get a // this should get 1
flux exec -r 2 flux kvs setroot unpause // process received setroot events
flux exec -r 2 flux kvs wait version // to make sure setroot processed
flux exec -r 2 flux kvs get a // gets 1

@garlick
Copy link
Member Author

garlick commented Dec 26, 2018

Oh, because the flux kvs put a=0 command will hang? I didn't think about that...

This approach seems equivalent so sounds good to me!

@chu11
Copy link
Member

chu11 commented Dec 26, 2018

Oh, because the flux kvs put a=0 command will hang? I didn't think about that...

I think you have the right idea, but not the initial "a=0" line. In my original example far above, the:

flux kvs setroot pause
flux exec -r 1 flux kvs put a=1

Would have hung.

With the new approach, hangs can still happen. So the tests just have to be careful which ranks they pause and which ones they don't.

@chu11
Copy link
Member

chu11 commented Jan 7, 2019

shall we consider this issue closed? #1907 added read-your-writes. I believe monotonic consistency is handled via normal tests and causal is handled via all of the "wait for version" tests

Or perhaps all that has to be done is to document that those tests test these conditions?

@garlick
Copy link
Member Author

garlick commented Jan 7, 2019

I think we're good for now. Thanks!

@garlick garlick closed this as completed Jan 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants