Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: add namespace parameter to most kvs api functions #1977

Merged
merged 12 commits into from Feb 2, 2019

Conversation

@chu11
Copy link
Contributor

commented Feb 1, 2019

Add a namespace parameter to most kvs functions, such as flux_kvs_lookup(), flux_kvs_fence(), flux_kvs_commit(), flux_kvs_get_version(), flux_kvs_wait_version(), and a variety of flux_kvs_watch functions. Long ago, this was avoided to not break ABI. But recent changes, including the use of namespaces in flux_kvs_copy/move() showed the necessity of finally biting the bullet on this.

As a side effect of this, remove flux_kvs_set_namespace(), which is no longer needed as namespaces can be passed directly to all these functions. Now callers pass in a namespace to a function, or set it via an environment variable, or they get the default of primary. There is no namespace attached to a flux handle anymore.

As a side effect of removing flux_kvs_set_namespace(), the --namespace option in flux-kvs no longer works. We instead make the --namespace option in flux-kvs an option for all subcommands that can use it. In my opinion, this is much better. The --namespace option only applies to those subcommands that can use it (i.e. specify --namespace for get/put, but there is no option for dropcache). In addition, I think the flow of the command makes a lot more sense. Instead of:

flux kvs --namespace=FOO get a.key

it's now

flux kvs get --namespace=FOO get a.key

The namespace option should be specified to the "right" of the subcommand.

Also fixed a few dinky things along the way.

I will be sending a PR to update flux-sched as well on this.

@chu11 chu11 force-pushed the chu11:issue1435 branch from 53f590e to 4028ae3 Feb 1, 2019
@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2019

repushed, python formatting error

@chu11 chu11 force-pushed the chu11:issue1435 branch from 4028ae3 to 4406535 Feb 1, 2019
@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2019

hmmmm error in t2000-wreck hit.

2019-02-01T01:51:33.581578Z connector-local.err[0]: send kvs.watch response to client CF36A: Broken pipe
2019-02-01T01:51:37.390247Z job.err[0]: job16: wrexecd says: wrexecd: ev.c:3875: void ev_io_start(struct ev_loop *, ev_io *): Assertion `("libev: ev_io_start called with negative fd", fd >= 0)' failed.
2019-02-01T01:56:29.033020Z broker.err[0]: runlevel 2 timeout, sending SIGTERM
2019-02-01T01:56:29.033409Z broker.err[0]: Run level 2 Unknown signal 143 (rc=143) 300.0s
flux-broker: src/zframe.c:154: zframe_send: Assertion `dest' failed.
flux-broker: src/zframe.c:154: zframe_send: Assertion `dest' failed.
flux-broker: src/zframe.c:154: zframe_send: Assertion `dest' failed.
flux-start: 0 (pid 16508) exited with rc=143
flux-start: 3 (pid 16514) Aborted
flux-start: 1 (pid 16509) Aborted
flux-start: 2 (pid 16510) Aborted

concerning b/c I have course modified code in area of kvs.watch. Possible a bug in connector.local? Mentioned in #1641, #1386 (which points to #542).

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2019

concerning b/c I have course modified code in area of kvs.watch

hmmm. While I have modified code in libkvs/kvs_watch, the changes are pretty simple / stupid overall. It's hard to see how I could have introduced a mem-corruption or something similarly terrible.

@garlick

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

I wonder if wrexecd is calling flux_log() to log an error after it has already closed the flux_t handle? If so, then it could be related to this PR if some KVS operation is now failing and sending wrexecd down a heretofore untraveled error path...

Not saying that's what's happening, but maybe worth giving the KVS functions used by wrexecd one more visual check (like the flux_kvsdir_*() functions for example)

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2019

I wonder if wrexecd is calling flux_log() to log an error after it has already closed the flux_t handle? If so, then it could be related to this PR if some KVS operation is now failing and sending wrexecd down a heretofore untraveled error path...

If the flux_t handle were closed, hopefully wrexecd isn't doing any kvs calls? :-)

The removal of flux_kvs_set_namespace() is a behavior change, but I don't believe any code was using it except for the kvs command. wrexecd is only operating on the primary kvs?

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2019

The removal of flux_kvs_set_namespace() is a behavior change, but I don't believe any code was using it except for the kvs command. wrexecd is only operating on the primary kvs?

Actually, I forgot to point this out originally. The removal of flux_kvs_set_namespace() means that a large number of kvs operations (most notable all kvs classic functions) can only work on the primary namespace or the namespace specified via the environment variable. If some code were to mix kvs-new and kvs-classic API calls, there could be issues. For example:

f = flux_kvs_lookup (h, A_NAMESPACE, key);
...
flux_kvs_get (h, key, &json_str);

the kvs-classic get cannot be passed A_NAMESPACE. So the only way to deal with this is via the environment variable. But again, the only use of flux_kvs_set_namespace() was the flux-kvs command, so I don't believe this to be an issue.

@garlick

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

If the flux_t handle were closed, hopefully wrexecd isn't doing any kvs calls? :-)

I was suggesting a kvs call fails in a new way, then handle is closed, then a use after free, such as a flux_log() call. That might produce the above assertion from libev. But there isn't much to go on in the log, so hypothetical!

@garlick

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

Skimming through, I see what you mean - not a lot to worry about!

One minor thing flux_kvs_get_namespace() probably shouldn't be a public function and shouldn't take a flux_t argument since it just looks up the environment variable and substitutes primary if unset.

@garlick

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

Is the fact that (in the kvs module) unwatch_cmp() doesn't compare namespaces a problem?

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2019

One minor thing flux_kvs_get_namespace() probably shouldn't be a public function and shouldn't take a flux_t argument since it just looks up the environment variable and substitutes primary if unset.

Now that you mention it, I agree, it's not a useful "convenience" function to make public anymore. I'll yoink it.

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2019

Is the fact that (in the kvs module) unwatch_cmp() doesn't compare namespaces a problem?

Hmmm.

    if (wait_destroy_msg (root->watchlist, unwatch_cmp, &p) < 0) {
        errnum = errno;
        flux_log_error (h, "%s: wait_destroy_msg", __FUNCTION__);
        goto done;
    }
    if (cache_wait_destroy_msg (ctx->cache, unwatch_cmp, &p) < 0) {
	errnum = errno;
        flux_log_error (h, "%s: cache_wait_destroy_msg", __FUNCTION__);
        goto done;
    }

The call to wait_destroy_msg() is fine, since that's a per-namespace watchlist. The cache_wait_destroy_msg() though is not per namespace. So I think it could be a problem.

BUT the cache is shared between lookups, commits, watches, etc. and not every operation requires a namespace to be specified, most notably lookupat() (in just finalized #1971).

This is probably for another issue. Or do we just punt until we remove the old watch interface?

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2019

Actually, the call to cache_wait_destroy_msg() may not be safe at all. B/c there could be a wait sitting in the cache that was just written, and that is completely independent of of the watch. The above code may have assumed that all waiters were associated with a watch.

@garlick

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

So based on your comment in #1980 do you think we should go ahead and merge this? (If so maybe rebase after #1982 goes in?)

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2019

yeah, I think the unwatch issue is a separate issue. I'll re-push this after #1982 goes in.

chu11 added 11 commits Jan 30, 2019
Values being required to be JSON formatted was removed during
tree object conversion in the KVS.
Add a namespace parameter to flux_kvs_commit() and flux_kvs_fence().
If the ns argument is NULL, follow prior rules on selecting a default
namespace.

As a side effect remove flux_kvs_commit_ns().

Update all docs, callers, and tests appropriately.
Add namespace parameter to flux_kvs_get_version() and
flux_kvs_wait_version().

Update callers appropriately.
Add a namespace parameter to flux_kvs_watch family of functions.

Update callers appropriately.
Add a namespace parameter to flux_kvs_lookup().
If the ns argument is NULL, follow prior rules on selecting a default
namespace.

As a side effect remove flux_kvs_lookup_ns().

Update all docs, callers, and tests appropriately.
With flux kvs functions now taking namespaces, remove use of
flux_kvs_set_namespace() in several tests.
Support a -N ns option for any kvs commands that support namespaces.

This is in preperation for the removal of the global -N option.
Use the --namespace option assigned to individual flux-kvs sub-commands
and not the global --namespace option.
Remove calls to it and remove documentation as well.
Remove global --namespace option and document --namespace
option in the sub-commands that it is supported in.
As namespaces can only be set via environment variable now, the
need for a public function like flux_kvs_get_namespace() is no longer
necessary.  Create a private convenience function kvs_get_namespace()
for internal use.
@chu11 chu11 force-pushed the chu11:issue1435 branch from 4406535 to 7528813 Feb 1, 2019
@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2019

re-pushed, removing flux_kvs_get_namespace(). Also realized I had forgotten to remove kvs_lookup_private.h in the prior pull request.

@garlick

This comment has been minimized.

Copy link
Member

commented Feb 2, 2019

Thanks!

@garlick garlick merged commit 5ed7f26 into flux-framework:master Feb 2, 2019
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.