Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop using deprecated libczmq classes #1013

Merged
merged 26 commits into from Mar 29, 2017

Conversation

Projects
None yet
4 participants
@garlick
Copy link
Member

garlick commented Mar 22, 2017

This PR moves flux-core off of the deprecated zsocket, zctx, and zauth_v2 libczmq classes, converting to zsock, zsys, and zauth respectively. As discussed in #974, these classes have been removed as of czmq-4.0.2 so we ought to move forward ASAP.

zsys installs an atexit() handler that tries to close any open zsock sockets, and complains about them on stderr while doing it. Since zmq sockets are not MT-safe I had to change the exit() call in the SIGLARM timeout handler to _exit() and manually invoke the (file) cleanup function, otherwise there were segfaults. To avoid chattiness on stderr when we manage to avoid the timeout, I added rc3 files for the sharness "personality" rc1 files, and some flux module remove calls for modules that were loaded in sharness tests.

Finally, I tidied up the flux_sec_t class a bit and added a TAP test for it. In the course of writing that test, I needed an unlink_recursive() function like we had in the cleanup class, so I pulled that out to a new module, exported it, and added a TAP test for that (fixing a bug in the process).

I had some problems earlier with segfaults (only) in travis so may need to run those down, but thought I'd go ahead and get this PR out there for review.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 22, 2017

Coverage Status

Coverage decreased (-0.01%) to 77.893% when pulling 19cc62d on garlick:czmq4 into a549739 on flux-framework:master.

@garlick garlick force-pushed the garlick:czmq4 branch from 19cc62d to ca659de Mar 24, 2017

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Mar 24, 2017

Codecov Report

Merging #1013 into master will increase coverage by 0.15%.
The diff coverage is 71.28%.

@@            Coverage Diff             @@
##           master    #1013      +/-   ##
==========================================
+ Coverage   77.72%   77.87%   +0.15%     
==========================================
  Files         152      150       -2     
  Lines       25752    25487     -265     
==========================================
- Hits        20016    19849     -167     
+ Misses       5736     5638      -98
Impacted Files Coverage Δ
src/cmd/builtin/hwloc.c 80.59% <ø> (ø) ⬆️
src/connectors/shmem/shmem.c 86.31% <100%> (+11.31%) ⬆️
src/broker/broker.c 72.68% <52.63%> (-0.3%) ⬇️
src/cmd/flux-keygen.c 57.69% <66.66%> (-3.03%) ⬇️
src/common/libflux/security.c 70.06% <68.75%> (+12.83%) ⬆️
src/broker/overlay.c 70.83% <77.27%> (-0.24%) ⬇️
src/modules/barrier/barrier.c 75% <80%> (-2.64%) ⬇️
src/common/libutil/ev_zmq.c 91.3% <81.81%> (-3.44%) ⬇️
src/broker/module.c 82.15% <83.33%> (-0.07%) ⬇️
src/common/libflux/message.c 81.02% <0%> (-2.63%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9685852...56fa866. Read the comment docs.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 24, 2017

Coverage Status

Coverage decreased (-0.003%) to 77.896% when pulling ca659de on garlick:czmq4 into f78422e on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 24, 2017

Coverage Status

Coverage decreased (-0.01%) to 77.888% when pulling b98d86d on garlick:czmq4 into f78422e on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Mar 24, 2017

It's strange but a bunch of sharness tests are failing because the new "rc3" scripts cannot be executed.

I verified that they were committed with +x, checking with:

$ cd t/rc
$ git ls-tree HEAD
100755 blob 52d75f21b7aa3ae77bd947a91c3fd4e9da2dfc02	rc1-kvs
100755 blob d9949b13f87a477e62580f3304a86eaa090a8089	rc1-testenv
100755 blob ecd2ea7b2e25cd6734f68179ef7270d676230a66	rc1-wreck
100755 blob 49e323741296ec1b138a2990229d7a4886849806	rc3-kvs
100755 blob 7f615156ea784b3a472686e5c4ff8c7b91473fdf	rc3-testenv
100755 blob 65a631cd1d70b218a697c954c54d83778cb02856	rc3-wreck

if anybody else has any suggestions I'd appreciate it...

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 24, 2017

Coverage Status

Coverage increased (+0.08%) to 77.976% when pulling b98d86d on garlick:czmq4 into f78422e on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Mar 24, 2017

The new rc3 scripts are not in EXTRA_DIST.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Mar 24, 2017

Oh duh. Thank you!

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 24, 2017

Coverage Status

Coverage increased (+0.07%) to 77.965% when pulling 3f0ec5c on garlick:czmq4 into f78422e on flux-framework:master.

@garlick garlick force-pushed the garlick:czmq4 branch from 3f0ec5c to cf0e083 Mar 24, 2017

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Mar 24, 2017

That fixed it (thanks again @grondo). I just squashed down the incremental development and think this is ready for review.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Mar 24, 2017

I don't know how many times I've hit that problem with missing EXTRA_DIST.

Looks like a lot of work here!

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 24, 2017

Coverage Status

Coverage increased (+0.08%) to 77.976% when pulling cf0e083 on garlick:czmq4 into f78422e on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Mar 25, 2017

This PR includes some changes to the cleanup() code and to the man page test driver that are unrelated to the main topic. I think I'll go ahead and pull those out as separate PR's and rebase this one once those are merged.

garlick added some commits Mar 14, 2017

libflux/security: drop large comment
The big explanatory comment at the top of security.c is about
to become very out of date.  Since ZeroMQ security is discussed
in RFC 12, drop the comment rather than attempt to modernize it.
libutil/ev_zmq: accept zsock_t sockets
Problem: ev_zmq watcher assumes its socket argument is
the raw void * socket type used by libzmq and the
deprecated libczmq "zsocket" class.  It would be easier
for us to convert stepwise to zsocket's replacement
"zsock" if czmq socket polymorphism was implemented in
ev_zmq.

Call zsock_resolve() on zmq socket argument to convert
from any of the socket types to the raw libzmq socket.
It thus works as before but also accepts a zsock_t socket.
connectors/shmem: convert to zsock
Problem: the shmem connector uses zsocket, which
is deprecated in libczmq 4.0.2.

Convert the shmem connector and the broker end to use
zsock_t.

The FLUX_OPT_ZEROMQ_CONTEXT handle option is
dead code now that shmem connector uses zsock class.
Remove FLUX_OPT_ZEROMQ_CONTEXT (and opt_set/opt_get methods
since that was the only option supported) from connector.
Remove broker-side support for obtaining it, then drop
the unused handle option from handle.h.
broker: force czmq to log to stderr
Problem: if any zsock_t sockets are left open, czmq's
atexit() handler lists them on stdout, which causes problems
for some of our tests.

Call zsys_set_logstream (stderr).

Note that a NULL stream doesn't work here.  One has to pass a
FILE pointer open to /dev/null to suppress this output.
We'll just leave it on stderr and address the dangling sockets.
broker/snoop: drop snoop facility
Problem: the "snoop" facility uses the deprecated czmq
zsocket class.

Other difficulties: the snoop faciliity no longer captures
all the messages passing through the broker as its hooks
have not been kept up to date.  It uses a seperate socket
when it probably should be built as a regular flux service.
The socket is created on demand, but listing the snoop-uri
attribute (including lsattr -v) causes it to be created,
which is an open issue.

Drop the snoop facility for now, including broker class,
broker hooks, command, tests, and man page.

Fixes #455
test/tmunge: convert to zsock_t
Problem: munge-encoded message test uses deprecated czmq
zsocket class.

Convert to zsock_t.
test/tasyncsock: drop unused test
Problem: zeromq asynchronous socket test case uses
deprecated czmq zsocket class.

This test is more of an experiment probing how zeromq works
than a Flux test and is not driven by make check.  Drop it.
test/message: convert to zsock_t
Problem: standalone message test uses deprecated czmq
zsocket class.

Convert to zsock_t.
test/reactor: convert to zsock_t
Problem: standalone reactor test uses deprecated czmq
zsocket class.

Convert to zsock_t.
libflux/security: drop internal mutex
Problem: mutex is not needed and adds complexity.

There is really nothing that requires a mutex to
make it thread safe here.  Drop the unnecessary mutex.
broker: drop ZMQ_IMMEDIATE backwards compat
Problem: broker defines ZMQ_IMMEDIATE and zmq_set_immediate()
macros for backwards compatibility with older libzmq/libczmq,
then doesn't use them.

Drop unused macros.
libflux/security: change constructor prototype
Problem: There is no use case for changing security
type flags or security config directory at runtime
yet the API implements setters for them.

Drop the following functions:
  flux_sec_enable ()
  flux_sec_disable ()
  flux_sec_set_directory ()

Add typemask and config directory params to constructor:
  flux_sec_create ()

Update users:
  flux-keygen
  broker
  tmunge (test)
libflux/security: avoid deprecated czmq classes
Problem: security API uses the deprecated czmq
zsocket_t, zctx_t, and zauth_v2 classes.

Convert to zsock, zsys, and zauth (actor).

The separate functions for initializing security for
munge and zauth are no longer necessary since zauth
requires no arguments (ZAP domain is internally fixed
to "flux", libzmq context is now obtained through zsys).
The combined function is now called flux_sec_comms_init().

Update users: broker and tmunge test.

garlick added some commits Mar 21, 2017

libflux/security: return EEXIST on PLAIN keygen
Problem: errno is not set if flux_sec_keygen() is called
on PLAIN keys with force flag clear.

Set errono to EEXIST on error.
libflux/security: rename FLUX_SEC_FAKEMUNGE
Rename FLUX_SEC_TYPE_FAKEMUNGE to FLUX_SEC_FAKEMUNGE
since it isn't a type, it's a flag.
libflux/security: use flags for keygen params
Drop the boolean parameters from flux_sec_keygen()
and add FLUX_SEC_VERBOSE and FLUX_SEC_KEYGEN_OVERWRITE
flags.

Specifying FLUX_SEC_VERBOSE turns on zauth verbosity
as well.
libflux/security: restrict CURVE clients
Problem: any zeromq client in possession of the server
public key may connect to overlay network due to
CURVE_ALLOW_ANY configuration.

Register $confdir/curve as the "certstore" (directory
containing client public keys authorized to connect
to server).

Note: all keys (public and private) were stored in a private
directory by flux-keygen with time of use failsafe checks.
Due to the 0.x.x release status of flux-core, a detailed
analysis of whether other opportunties exist for an attacker
to obtain the server public key was not performed.
broker/overlay: convert to zsock_t
Problem: overlay code uses deprecated czmq zsocket class.

Convert to zsock_t.

Note: now that the broker configured zsys default HWM values,
we no longer need to set them at socket creation.

@garlick garlick force-pushed the garlick:czmq4 branch from cf0e083 to 9fd9939 Mar 28, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 28, 2017

Coverage Status

Coverage increased (+0.2%) to 78.19% when pulling 9fd9939 on garlick:czmq4 into 9685852 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Mar 28, 2017

I am seeing reproducible failure in security unit test on this branch, running on hype after make clean && make && make check

Still looking into it, perhaps something wrong with my environment? Seems like the problem is in some of the zpoller_wait calls. If you give me a hint of what to look for I can try debugging this so you don't have to log in to hype.

ok 1 - flux_sec_destroy accepts a NULL argument
ok 2 - flux_sec_create with no selected method works
ok 3 - flux_sec_errstr returns 'Success'
ok 4 - flux_sec_get_directory returns configured confdir
ok 5 - flux_sec_type_enabled FLUX_SEC_TYPE_PLAIN false
ok 6 - flux_sec_type_enabled FLUX_SEC_TYPE_CURVE false
ok 7 - flux_sec_type_enabled FLUX_SEC_TYPE_CURVE false
ok 8 - flux_sec_create with NULL confdir works
ok 9 - flux_sec_get_directory returns configured NULL
ok 10 - flux_sec_create PLAIN|CURVE returns EINVAL
ok 11 - flux_sec_create PLAIN works
ok 12 - flux_sec_type_enabled FLUX_SEC_TYPE_PLAIN true
ok 13 - flux_sec_type_enabled FLUX_SEC_TYPE_CURVE false
ok 14 - flux_sec_type_enabled FLUX_SEC_TYPE_CURVE false
ok 15 - flux_sec_create PLAIN|MUNGE works
ok 16 - flux_sec_type_enabled FLUX_SEC_TYPE_PLAIN true
ok 17 - flux_sec_type_enabled FLUX_SEC_TYPE_CURVE false
ok 18 - flux_sec_type_enabled FLUX_SEC_TYPE_CURVE false
ok 19 - flux_sec_create CURVE|MUNGE works
ok 20 - flux_sec_type_enabled FLUX_SEC_TYPE_PLAIN true
ok 21 - flux_sec_type_enabled FLUX_SEC_TYPE_CURVE false
ok 22 - flux_sec_type_enabled FLUX_SEC_TYPE_CURVE false
ok 23 - flux_sec_keygen fails with EINVAL if confdir not set
ok 24 - flux_sec_keygen fails with EACCES if confdir does not exist
ok 25 - flux_sec_keygen (force) fails with EACCES if confdir does not exist
ok 26 - flux_sec_keygen with no security modes works
ok 27 - confdir is a directory with mode 0700
ok 28 - unlinked 1 file/dir
ok 29 - flux_sec_keygen with bad mode confdir fails with EPERM
ok 30 - unlinked 1 file/dir
ok 31 - flux_sec_keygen PLAIN works
ok 32 - unlinked 2 file/dir
ok 33 - flux_sec_keygen CURVE works
ok 34 - unlinked 6 file/dir
ok 35 - flux_sec_keygen CURVE-overwrite fails with EEXIST
ok 36 - unlinked 6 file/dir
ok 37 - flux_sec_keygen (force) CURVE-overwrite works
ok 38 - unlinked 6 file/dir
ok 39 - flux_sec_keygen PLAIN-overwrite fails with EEXIST
ok 40 - unlinked 2 file/dir
ok 41 - flux_sec_keygen (force) PLAIN-overwrite works
ok 42 - unlinked 2 file/dir
ok 43 - flux_sec_create MUNGE-real works
ok 44 - flux_sec_comms_init MUNGE-real works
ok 45 - flux_sec_create MUNGE-fake works
ok 46 - flux_sec_comms_init MUNGE-fake works
ok 47 - flux_sec_csockinit MUNGE-fake works (no-op)
ok 48 - flux_sec_ssockinit MUNGE-fake works (no-op)
ok 49 - flux_sec_munge (fake) works
ok 50 - flux_sec_unmunge (fake) works
ok 51 - unmunge(munge(x))==x
Saving /var/tmp/grondo/sectest.GbvPcl/passwd
I: 17-03-28 16:12:38 zauth: API command=PLAIN
ok 52 - flux_sec_comms_init PLAIN works
ok 53 - flux_sec_ssockinit works
ok 54 - server bound to localhost on port 49152
ok 55 - flux_sec_csockinit works
ok 56 - client connected to server
ok 57 - client sent Hi
I: 17-03-28 16:12:38 zauth: ZAP request mechanism=PLAIN ipaddress=127.0.0.1
I: 17-03-28 16:12:38 zauth: - allowed (PLAIN) username=client password=77AFD409528A3835315F260EF3B2F80D
I: 17-03-28 16:12:38 zauth: - ZAP reply status_code=200 status_text=OK
ok 58 - server ready within 100ms timeout
ok 59 - server received Hi
ok 60 - rogue connected to server with no security
ok 61 - rogue sent Blimey!
ok 62 - server not ready within 100ms timeout
ok 63 - rogue connected to server using wrong password
ok 64 - rogue sent Skallywag!
I: 17-03-28 16:12:38 zauth: ZAP request mechanism=PLAIN ipaddress=127.0.0.1
I: 17-03-28 16:12:38 zauth: - denied (PLAIN) username=client password=not-the-correct-password
I: 17-03-28 16:12:38 zauth: - ZAP reply status_code=400 status_text=No access
ok 65 - server not ready within 100ms timeout
I: 17-03-28 16:12:38 zauth: API command=$TERM
Saving /var/tmp/grondo/sectest.IOXEE5/curve/client
Saving /var/tmp/grondo/sectest.IOXEE5/curve/client_private
Saving /var/tmp/grondo/sectest.IOXEE5/curve/server
Saving /var/tmp/grondo/sectest.IOXEE5/curve/server_private
I: 17-03-28 16:12:38 zauth: API command=CURVE
ok 66 - flux_sec_comms_init CURVE works
ok 67 - flux_sec_ssockinit works
ok 68 - server bound to localhost on port 49152
ok 69 - flux_sec_csockinit works
ok 70 - client connected to server
ok 71 - client sent Greetings!
not ok 72 - server ready within 100ms timeout
#   Failed test 'server ready within 100ms timeout'
#   at test/security.c line 414.
not ok 73 - server received Greetings!
#   Failed test 'server received Greetings!'
#   at test/security.c line 418.
ok 74 - rogue connected to server with no security
ok 75 - rogue sent Avast
I: 17-03-28 16:12:38 zauth: ZAP request mechanism=CURVE ipaddress=127.0.0.1
I: 17-03-28 16:12:38 zauth: - allowed (CURVE) client_key=8.v#+IEqsrf9A>*ciKVOYngk]Be[<FJFT}$Ywsz#
I: 17-03-28 16:12:38 zauth: - ZAP reply status_code=200 status_text=OK
not ok 76 - server not ready within 100ms timeout
#   Failed test 'server not ready within 100ms timeout'
#   at test/security.c line 430.
ok 77 - rogue connected to server using right server, wrong client key
ok 78 - rogue sent Haar!
not ok 79 - server not ready within 100ms timeout
#   Failed test 'server not ready within 100ms timeout'
#   at test/security.c line 457.
I: 17-03-28 16:12:38 zauth: API command=$TERM
1..79
# Looks like you failed 4 tests of 79 run.
FAIL: test_security.t

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Mar 28, 2017

I suspect what's happening is for some reason the 100ms timeout is too small, and then a cascading failure when messages that were supposed to be consumed in a previous test are there confusing the next test. I guess the timeouts really only come into play when a test is going to fail so it doesn't hurt to make them bigger like say 1000ms. Maybe try that? Meanwhile I'll have a look at how to avoid the cascading failures... Thanks!

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Mar 28, 2017

Yes, If I increase the timeout on the zpoller_wait() after client sends Greetings, all tests pass.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Mar 28, 2017

I misspoke a bit though, some of the tests actually do wait out the timeout (when they expect not to get a response), so increasing all the timeouts does slow down the test. Let me ponder if there's a better way. Thanks for trying that out!

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Mar 28, 2017

last commit fixes the issues I was seeing on hype, thanks!

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 29, 2017

Coverage Status

Coverage increased (+0.2%) to 78.198% when pulling d881b00 on garlick:czmq4 into 9685852 on flux-framework:master.

garlick added some commits Mar 21, 2017

libflux/security: fix zcert format args
Problem: non-const format string passed to zcert_add().
The newer czmq has appropriate compiler attributes to flag
this.

Change the format string to %s and move argument to first
variable argument.
modules/barrier: avoid deprecated zhash_foreach()
Problem: zhash_foreach() is deprecated in czmq4.

Convert to FOREACH_ZHASH from libutil/iterators.h.

@garlick garlick force-pushed the garlick:czmq4 branch from d881b00 to 56fa866 Mar 29, 2017

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Mar 29, 2017

OK, I squashed that one down. Thanks!

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 29, 2017

Coverage Status

Coverage increased (+0.2%) to 78.19% when pulling 56fa866 on garlick:czmq4 into 9685852 on flux-framework:master.

@grondo grondo merged commit ea0d930 into flux-framework:master Mar 29, 2017

3 of 4 checks passed

codecov/patch 71.28% of diff hit (target 77.72%)
Details
codecov/project 77.87% (+0.15%) compared to 9685852
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.2%) to 78.19%
Details

@grondo grondo referenced this pull request Mar 29, 2017

Closed

0.7.0 Release Notes #1019

@garlick garlick deleted the garlick:czmq4 branch Mar 29, 2017

@garlick garlick referenced this pull request Mar 15, 2018

Open

update czmq dependency #1355

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.