Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs.err[0]: XXX encountered old commit state=4 fence_name=lwj.1.0 #729

Closed
garlick opened this Issue Jul 12, 2016 · 6 comments

Comments

Projects
None yet
2 participants
@garlick
Copy link
Member

garlick commented Jul 12, 2016

There appears to be a problem with kz/kvs handling of stderr from jobs launched by wreckrun on master 78980c6

I added a little debug to libpmi-client:

diff --git a/src/common/libpmi-client/pmi.c b/src/common/libpmi-client/pmi.c
index 9234dc5..f66961b 100644
--- a/src/common/libpmi-client/pmi.c
+++ b/src/common/libpmi-client/pmi.c
@@ -110,6 +110,7 @@ pmi_t *pmi_create_guess (void)

 int pmi_init (pmi_t *pmi, int *spawned)
 {
+    fprintf (stderr, "%s\n", __FUNCTION__);
     if (!pmi->init)
         return PMI_FAIL;
     return pmi->init (pmi->impl, spawned);
@@ -223,6 +224,7 @@ int pmi_kvs_get_value_length_max (pmi_t *pmi, int *length)
 int pmi_kvs_put (pmi_t *pmi,
         const char *kvsname, const char *key, const char *value)
 {
+    fprintf (stderr, "%s\n", __FUNCTION__);
     if (!pmi->kvs_put)
         return PMI_FAIL;
     return pmi->kvs_put (pmi->impl, kvsname, key, value);
@@ -231,6 +233,7 @@ int pmi_kvs_put (pmi_t *pmi,

 int pmi_kvs_commit (pmi_t *pmi, const char *kvsname)
 {
+    fprintf (stderr, "%s\n", __FUNCTION__);
     if (!pmi->kvs_commit)
         return PMI_FAIL;
     return pmi->kvs_commit (pmi->impl, kvsname);
@@ -239,6 +242,7 @@ int pmi_kvs_commit (pmi_t *pmi, const char *kvsname)
 int pmi_kvs_get (pmi_t *pmi,
         const char *kvsname, const char *key, char *value, int len)
 {
+    fprintf (stderr, "%s\n", __FUNCTION__);
     if (!pmi->kvs_get)
         return PMI_FAIL;
     return pmi->kvs_get (pmi->impl, kvsname, key, value, len);

Which seems to cause kz to go off the rails pretty reliably when launching flux with wreckrun:

$ ./flux wreckrun -N4 ./flux start flux getattr size
pmi_init
pmi_init
pmi_init
pmi_init
pmi_kvs_put
pmi_kvs_commit
pmi_kvs_put
2016-07-12T22:30:29.363645Z lwj.7.err[2]: kz_put (pmi_kvs_commit
): Invalid argument
2016-07-12T22:30:29.363722Z lwj.7.err[1]: kz_put (pmi_kvs_commit
): Invalid argument
pmi_kvs_put
2016-07-12T22:30:29.363965Z kvs.err[0]: XXX encountered old commit state=4 fence_name=lwj.7.0
2016-07-12T22:30:29.364063Z kvs.err[0]: XXX encountered old commit state=4 fence_name=lwj.7.0
pmi_kvs_put
pmi_kvs_commit
pmi_kvs_get
@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jul 12, 2016

Strangely not easy to reproduce this any other way.

It does look like this is strictly a KVS problem

  • put_request_cb() is detecting that this client already has a commit in progress and is returning EINVAL to the RPC.
  • commit_request_cb() is complaining that the client attempting to do a commit has a fence (state=4) already in progress

Uh oh, we added the split fence calls (kvs_fence_begin / kvs_fence_finish). That may have violated an invariant about not having more than one commit and/or fence in progress at a time per handle...

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jul 13, 2016

Uhoh, this will give me nightmares tonight!

Another reason to push output somewhere other than the kvs? Or is this a more general problem with asynchronous kvs commit?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jul 13, 2016

This is my fault and should not be that hard to fix. Don't worry!

On Jul 12, 2016 5:21 PM, "Mark Grondona" notifications@github.com wrote:

Uhoh, this will give me nightmares tonight!

Another reason to push output somewhere other than the kvs? Or is this a
more general problem with asynchronous kvs commit?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#729 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAKX27ptkZeOAoS_yKGCmqFqKqlPGwdRks5qVC99gaJpZM4JK5Tf
.

garlick added a commit to garlick/flux-core that referenced this issue Jul 13, 2016

modules/kvs: send 'put' data with 'commit' request
Consolidate the "put" and "commit" request messages in the client
code, so that the commit request includes an ordered list of
(key, dirent) pairs representing put, unlink, symlink, etc. calls,
which now return immediately, unless a large value is stored.

If kvs_put() receives a value whose JSON encoding is longer than
the size of its blobref, the value is synchronously written to the
content store, and its blobref is referenced in the dirent.

Note that the format of the commit message did not need to change,
since "internal" commits already carried the array of (key, dirent)
operations - they were just empty when the handle code sent a commit.

In addition to (probably) improving performance since many KVS put
operations will now return immediately, this avoids a race where
the new asynchronous fence calls are used, and a put is sent before
an in-progress fence (on the same handle) has completed.

Fixes flux-framework#729.

garlick added a commit to garlick/flux-core that referenced this issue Jul 13, 2016

test/kvs: add test for async fence
Exercise kvs_fence_begin() / kvs_fence_finish().
Verify that the following sequence works (issue flux-framework#729):
   put a
   fence_begin
   put b
   fence_end
   get a
   (get b fails)
   fence
   get b

@garlick garlick self-assigned this Jul 13, 2016

@garlick garlick added the review label Jul 13, 2016

@grondo grondo closed this in #732 Jul 13, 2016

@grondo grondo removed the review label Jul 13, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jul 13, 2016

Oh bummer, I just rebased pr #713 on master after this was merged and I can still reproduce this problem. I may have assumed too much about what was going on, e.g. maybe there really are multiple fences stacking up. Sigh.

@garlick garlick reopened this Jul 13, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jul 13, 2016

Oh duh, the PMI trace causes it because it emits a line that's committed by kz when the PMI barrier is called, which is right where the fence is entered. I fixed only the first half of the problem.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jul 14, 2016

There are actually a couple of tricky problems here:

  • only one commit/fence can be in flight at a time (per handle)
  • operations (put, unlink, mkdir, symlink) are not associated with a particular commit/fence

I'm trying to find a way to fix this without redesigning the API or forcing the use of multiple handles.

garlick added a commit to garlick/flux-core that referenced this issue Jul 15, 2016

modules/kvs: add kvs_fence_set_context()
If kvs_fence_set_context() has been called with the name of an
upcoming fence operation, all subsequent operations (put, unlink,
mkdir, symlink) are queued for that fence, until kvs_fence_clear_context()
is called.  Operations not queued when a fence context is set are queued
anonymously, as before.

A kvs_commit() call flushes out the handle's pending anonymous operations.

A kvs_fence() flushes out only the pending operations that match
its name.  If no operations match its name, then it flushes anonymous
operations, as before.

Consistency note:  operations associated with a fence may complete
out of order with respect to other operations on the handle.

API note:  it probably would be better in the future to implement a
commit handle abstraction, where a commit is opened, operations are
applied to a particular commit, then the commit handle can be passed
to either commit or fence.  For now, though, we just need to support
asynchronous fences co-mingled with synchronous commits and want to
perturb the API as little as possible.

Partially fixes flux-framework#729

garlick added a commit to garlick/flux-core that referenced this issue Jul 15, 2016

modules/wreck: wrexecd PMI sets KVS fence context
When wrexecd is running a PMI server, it uses asynchronous
kvs_fence().  At the same time, it may be handling standard
I/O output and writing it to the KVS.

Have the PMI server call kvs_fence_set_context () so that the
fence only applies to PMI_KVS_Put() data, not the stdio or other data.

This solves a potential problem.  If the following sequence occurs:
-  stdio data is kvs_put()
-  PMI kvs_fence_begin()
-  stdio kvs_commit()

the kvs_commit() is expected to return only after stdio data
is committed to the kvs namespace, but since the fence ran
in between, the stdio data is not applied until the PMI fence completes.
Since libkz applies a sequence number to each KVS key it "appends"
this likely is resolved quickly without I/O sequencing issues; however,
it is egregiously incorrect KVS behavior.  (Note that this behavior
was introduced with async fence in ae2eb13 three weeks ago).

Partially fixes flux-framework#729

garlick added a commit to garlick/flux-core that referenced this issue Jul 15, 2016

modules/kvs: reimplement fence
Reimplement fence, this time decoupled from commit message handling.
KVS fence works as follows:

A set of clients call kvs_fence() with identical name, nprocs arguments.
Each sends a kvs.fence request containing a stack of operations to the
local kvs module.

The local kvs module stores a copy of the kvs.fence request message,
then sends a kvs.relayfence request to rank 0, containing a copy of
the kvs.fence payload.  No response to the kvs.relayfence request
is expected - the fence will be completed with a multicast event.

The rank 0 kvs, receiving a combination of kvs.fence and kvs.relayfence
request messages, concatenates the operations from all requests, and
builds a count of messages received.  Once the count reaches nprocs,
the operations are applied as one commit.  An extra "fence" field
containing the fence name is added to the kvs.setroot event that updates
the root hash across the instance.

Upon receipt of a kvs.setroot event with embedded fence name, all kvs
ranks look up the fence name, send responses to the fence requestors,
and destroy the fence state.  Since the local kvs modules update their
root hashes before responding to the fence requestors, the property of
read-your-writes consistency[Vogel] is retained.

Since fence state is no longer hashed by kvs.fence sender uuid, a client
may have any number of fences in progress, theoretically possible with
the addition of the asynchronous kvs_fence_begin(), kvs_fence_finish()
calls.  In addition, fence state is kept separate from commit state,
so while commit still has the old one-commit-per-client limitation,
fences may be going on at the same time as regular commits without
a problem, so long as the user keeps the "fence context" straight
with kvs_fence_set_context(), kvs_fence_clear_context().

Fixes flux-framework#729

@garlick garlick added the review label Jul 15, 2016

garlick added a commit to garlick/flux-core that referenced this issue Jul 18, 2016

modules/kvs: reimplement fence
Reimplement fence, this time decoupled from commit message handling.
KVS fence works as follows:

A set of clients call kvs_fence() with identical name, nprocs arguments.
Each sends a kvs.fence request containing a stack of operations to the
local kvs module.

The local kvs module stores a copy of the kvs.fence request message,
then sends a kvs.relayfence request to rank 0, containing a copy of
the kvs.fence payload.  No response to the kvs.relayfence request
is expected - the fence will be completed with a multicast event.

The rank 0 kvs, receiving a combination of kvs.fence and kvs.relayfence
request messages, concatenates the operations from all requests, and
builds a count of messages received.  Once the count reaches nprocs,
the operations are applied as one commit.  An extra "fence" field
containing the fence name is added to the kvs.setroot event that updates
the root hash across the instance.

Upon receipt of a kvs.setroot event with embedded fence name, all kvs
ranks look up the fence name, send responses to the fence requestors,
and destroy the fence state.  Since the local kvs modules update their
root hashes before responding to the fence requestors, the property of
read-your-writes consistency[Vogel] is retained.

Since fence state is no longer hashed by kvs.fence sender uuid, a client
may have any number of fences in progress, theoretically possible with
the addition of the asynchronous kvs_fence_begin(), kvs_fence_finish()
calls.  In addition, fence state is kept separate from commit state,
so while commit still has the old one-commit-per-client limitation,
fences may be going on at the same time as regular commits without
a problem, so long as the user keeps the "fence context" straight
with kvs_fence_set_context(), kvs_fence_clear_context().

Fixes flux-framework#729

@grondo grondo closed this in #733 Jul 20, 2016

@grondo grondo removed the review label Jul 20, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.