Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux-shell: use RFC 24 eventlog output #2308

Merged
merged 6 commits into from Aug 15, 2019

Conversation

@garlick
Copy link
Member

commented Aug 14, 2019

This PR changes the format of the guest.output key in the KVS from a JSON array of "iodecode" objects to an RFC 24 eventlog.

This opens the door to more work such as

  • "compressing" identical output that occurs within a time window down to one event
  • UTF-8 encoding option
  • committing output at regular intervals or upon reaching some buffer threshold

However in this PR, only the output format is changed.

Problem: translating 'ioencode' objects to RFC 24
data events requires that an integer rank be converted
to string.

Simplify our lives and switch 'ioencode' to use a string
encoding for rank so that it is a valid RFC 24 data event
context object.

For now, retain the interfaces for ioencode/decode that
assume a rank is a single integer.  If we want to "compress"
I/O on a shell before transmitting it to the leader, e.g.
so that one object represents output from multiple tasks,
we can add a new API interface for that.
@grondo

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2019

Very nice!

On fluke, the t2601-job-shell-standalone.t tests fail for me on this branch. (Haven't investigated further, figure you're working on it already)

Also, in the output eventlog, the stream names don't match RFC 24 or the header, i.e. instead of "stdout" "stderr" we have STDOUT STDERR, presumably from libsubprocess.

I would actually argue, unless there is some reason I'm not thinking of, that we should change libsubprocess internal names for stdout stderr from upper to lowercase.

@garlick

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

Sorry about that - just pushed a fixup commit for the test failure.

Heading in now, will look at the stream names when I get to work. Good catch!

@codecov-io

This comment has been minimized.

Copy link

commented Aug 14, 2019

Codecov Report

Merging #2308 into master will decrease coverage by 0.05%.
The diff coverage is 79.01%.

@@            Coverage Diff             @@
##           master    #2308      +/-   ##
==========================================
- Coverage   80.91%   80.86%   -0.06%     
==========================================
  Files         214      214              
  Lines       33677    33732      +55     
==========================================
+ Hits        27251    27278      +27     
- Misses       6426     6454      +28
Impacted Files Coverage Δ
src/common/libioencode/ioencode.c 92.53% <100%> (+0.22%) ⬆️
src/shell/io.c 75.52% <75%> (-1.9%) ⬇️
src/cmd/flux-job.c 85.26% <78.57%> (-0.07%) ⬇️
src/common/libeventlog/eventlog.c 85.96% <79.41%> (-1.98%) ⬇️
src/common/libutil/aux.c 90.74% <0%> (-3.71%) ⬇️
src/common/libflux/mrpc.c 87.79% <0%> (-1.19%) ⬇️
src/modules/connector-local/local.c 73.26% <0%> (-1.16%) ⬇️
@garlick

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

Ok, pushed the following

  • small change to eventlog_encode() that caused it to run slowly for large eventlog chunks (thanks for catching the excessive calls to strlen() with perf @grondo!)
  • libsubprocess: eliminate stream name defaults when name=NULL
  • libsubprocess: rename default stream names to lower case to comply with RFC 24 naming
@chu11

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2019

I would actually argue, unless there is some reason I'm not thinking of, that we should change libsubprocess internal names for stdout stderr from upper to lowercase.

No issues with this. I think I did upper case b/c we tended to do upper case with channel names.

Copy link
Contributor

left a comment

Not done reviewing, but wanted to log these before lunch :-) (Edit: Now done)

@@ -730,14 +730,13 @@ int flux_subprocess_stream_start (flux_subprocess_t *p, const char *stream)
struct subprocess_channel *c;

This comment has been minimized.

Copy link
@chu11

chu11 Aug 14, 2019

Contributor

in comments, you can mark fixes #2288

@@ -612,10 +612,10 @@ void test_basic_stdin_default_stream (flux_reactor_t *r)
ok (flux_subprocess_state (p) == FLUX_SUBPROCESS_RUNNING,
"subprocess state == RUNNING after flux_local_exec");

ok (flux_subprocess_write (p, NULL, "hi", 2) == 2,
ok (flux_subprocess_write (p, "STDIN", "hi", 2) == 2,

This comment has been minimized.

Copy link
@chu11

chu11 Aug 14, 2019

Contributor

I'd go ahead and delete the test test_basic_stdin_default_stream(), it's now the same as test_basic_stdin().

@@ -521,7 +521,7 @@ void output_default_stream_cb (flux_subprocess_t *p, const char *stream)
ok (flux_subprocess_read_stream_closed (p, stream) > 0,
"flux_subprocess_read_stream_closed saw EOF on %s", "STDOUT");

ptr = flux_subprocess_read (p, NULL, -1, &lenp);
ptr = flux_subprocess_read (p, "STDOUT", -1, &lenp);

This comment has been minimized.

Copy link
@chu11

chu11 Aug 14, 2019

Contributor

I'd delete the test test_basic_stdout_default_stream() and the callback output_default_stream_cb(), since they don't do anything different than other tests now.

@garlick garlick force-pushed the garlick:stdio_eventlog branch from add8e68 to a788c19 Aug 14, 2019
@garlick

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

Pushed a fixup for the suggested test removals, and also forced a commit message change to reference #2288. LMK if I should go ahead and squash the fixups.

@grondo

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2019

I'm fine with a squash

garlick added 5 commits Aug 13, 2019
Problem: there is an eventlog_decode() and
eventlog_entry_encode(), but no eventlog_encode().
One is needed to append events in batches to the
stdio eventlog per RFC 24.

Add eventlog_encode().
Problem: stdio output KVS format uses an ad-hoc format.

Change flux-shell to store stdio data in RFC 24 eventlog
format:  one 'header' event followed by 'data' events.

Change flux-job attach to read data in that format.

Assumptions for first cut
- base64 encoding
- no options are recorded in the stream header
- each data event represents output from one task (no idset)

Fixes #2305
Problem: subprocess API accepts NULL for a stream name,
to indicate a "default stream".  This feature has no users,
and encourages code to be less readable.

Handle a NULL stream by failing with errno == EINVAL.

Update subprocess unit test.

Fixes #2288
Problem: upper case subprocess stream names STDIN, STDOUT,
and STDERR are not compatible with RFC 24 stream names.

Change the subprocess stream names to lower case.

Update users and tests.
@garlick garlick force-pushed the garlick:stdio_eventlog branch from a788c19 to 695d4fe Aug 14, 2019
@garlick

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

Squashed.

@codecov-io

This comment has been minimized.

Copy link

commented Aug 14, 2019

Codecov Report

Merging #2308 into master will decrease coverage by 0.06%.
The diff coverage is 85.59%.

@@            Coverage Diff             @@
##           master    #2308      +/-   ##
==========================================
- Coverage   80.91%   80.85%   -0.07%     
==========================================
  Files         214      214              
  Lines       33677    33732      +55     
==========================================
+ Hits        27251    27275      +24     
- Misses       6426     6457      +31
Impacted Files Coverage Δ
src/common/libsubprocess/remote.c 71.22% <ø> (ø) ⬆️
src/modules/cron/task.c 79.31% <100%> (ø) ⬆️
src/broker/broker.c 73.52% <100%> (ø) ⬆️
src/cmd/flux-exec.c 75.79% <100%> (ø) ⬆️
src/common/libsubprocess/local.c 79.79% <100%> (ø) ⬆️
src/modules/job-ingest/worker.c 72.72% <100%> (ø) ⬆️
src/common/libioencode/ioencode.c 92.53% <100%> (+0.22%) ⬆️
src/common/libsubprocess/subprocess.c 88.44% <100%> (-1.49%) ⬇️
src/shell/io.c 75.52% <76.66%> (-1.9%) ⬇️
src/cmd/flux-job.c 85.26% <78.57%> (-0.07%) ⬇️
... and 5 more
@grondo

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2019

Poked at this again, and looks good! Merging.

@grondo grondo merged commit 5052842 into flux-framework:master Aug 15, 2019
4 checks passed
4 checks passed
Summary 1 potential rule
Details
codecov/patch 85.59% of diff hit (target 80.91%)
Details
codecov/project Absolute coverage decreased by -0.06% but relative coverage increased by +4.67% compared to 624aa04
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@garlick

This comment has been minimized.

Copy link
Member Author

commented Aug 15, 2019

Thanks!

@garlick garlick referenced this pull request Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.