Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux-shell: limit the number of I/O requests in flight + cleanup #2296

Merged
merged 3 commits into from Aug 9, 2019

Conversation

@garlick
Copy link
Member

commented Aug 9, 2019

When tasks produce a lot of output quickly, many write requests to the leader shell can be concurrently in flight. This PR adds a hard coded high water mark of 1000 requests per shell, with a low water mark of 100, such that I/O is paused when the high water mark is reached, and resumed when the low water mark is reached. This utilizes the subprocess stream start/stop mechanism added in #2271.

This limits the consumption of client resources such as matchtags, and also should meter the overall message rate somewhat, although this simple mechanism is completely local to a shell and has no idea what other shells are doing, so a many-task job can still overwhelm the leader shell. Complexity is very low so it seemed like a good first step.

Without this patch, when running an lptest 79 X, where X is on the order of 100K, one can observe the matchtag pool incrementally growing as tags are consumed. With the patch, the matchtag pool never grows beyond its initial size. Yet wall clock performance is about the same.

In addition, some cleanup was performed to eliminate gratuitous wrapper functions for subprocess API in shell/task.c.

@garlick garlick changed the title flux-shell: limit the number of requests in flight + cleanup flux-shell: limit the number of I/O requests in flight + cleanup Aug 9, 2019
@garlick garlick force-pushed the garlick:shell_xoff branch 2 times, most recently from 5a7167a to 6d263de Aug 9, 2019
garlick added 3 commits Aug 9, 2019
Problem: shells sending task output to the leader
shell have no limit on the number of outstanding
RPC requests, and can exhaust the tagpool and
potentially overwhelm the leader shell and broker
sockets.

Add a per-shell high water mark and low water mark
to limit the number of outstanding write requests.

When the number of outstanding write requests reaches
the high water mark, task output is stopped. While
stopped, output accumulates in subprocess buffers.
If they fill, tasks may block in write(2).  When the
number of outstanding write requests drops below the
low water mark, task output is resumed.

Fixes #2256
Problem: task.c provides wrappers for flux_subprocess_getline(),
flux_subprocess_write(), and flux_subprocess_read_stream_closed()
which are not necessary since shell->task->proc is available
to io.c and pmi.c where these functions are used.

Convert users in io.c and pmi.c to directly use libsubprocess API.

Drop the following wrapper functions:

- shell_task_io_readline()
- shell_task_io_at_eof()
- shell_task_pmi_readline()
- shell_task_pmi_write()
Run a single task job that produces 10K lines of output
in a burst.  This should exceed the shell's high water
mark and trigger the shell's rate limiting code.
@garlick garlick force-pushed the garlick:shell_xoff branch from 6d263de to 1afe30e Aug 9, 2019
@codecov-io

This comment has been minimized.

Copy link

commented Aug 9, 2019

Codecov Report

Merging #2296 into master will increase coverage by 0.01%.
The diff coverage is 90.9%.

@@            Coverage Diff             @@
##           master    #2296      +/-   ##
==========================================
+ Coverage   80.87%   80.88%   +0.01%     
==========================================
  Files         214      214              
  Lines       33649    33654       +5     
==========================================
+ Hits        27212    27221       +9     
+ Misses       6437     6433       -4
Impacted Files Coverage Δ
src/shell/task.c 89.74% <ø> (-1.37%) ⬇️
src/shell/pmi.c 74.53% <100%> (ø) ⬆️
src/shell/io.c 77.41% <90%> (+1.71%) ⬆️
src/common/libutil/aux.c 90.74% <0%> (-3.71%) ⬇️
src/common/libflux/message.c 80.62% <0%> (-0.26%) ⬇️
src/cmd/flux-module.c 83.96% <0%> (-0.24%) ⬇️
src/modules/connector-local/local.c 74.42% <0%> (+1.15%) ⬆️
src/common/libutil/veb.c 98.85% <0%> (+1.71%) ⬆️
@garlick

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2019

Apologies for the stream of force pushes. All done now.

@grondo

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2019

Nice!

Possibly naive question: Could this support to limit the number of outstanding messages be moved into the shell "svc" wrapper? This might benefit plugins that might be collecting other data on rank 0 shell (e.g. performance, energy, or other monitoring data).

@grondo

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2019

This might benefit plugins that might be collecting other data on rank 0 shell (e.g. performance, energy, or other monitoring data).

Sorry, that might not be clear. I was thinking all shell services might want to share a single limit for # of outstanding requests, which could be implemented by the svc code.

However, I'm not sure if that would allow one service (e.g. I/O) to starve out others, and also belatedly realized that the responses are coming back as fulfilled futures local to the implementation code, so svc would have no way to account for responses.

Sorry for the noise.

@garlick

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2019

I'm not sure what that might look like, but it's something to keep in mind, should the same need come up again. This was pretty simple to add given that the source of data can now be stopped and started, and the futures were already kept in a list so they could be handled synchronously in the destructor (to ensure all I/O gets flushed when the shell exits). It was a small amount of code to stop/start when the watermarks are surpassed.

@grondo

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2019

but it's something to keep in mind, should the same need come up again.

I guess my question was more about whether you wanted to limit outstanding messages for the whole shell or just per service. There's only one service that matters for now, so I guess it isn't important.

@grondo grondo merged commit ecf3f78 into flux-framework:master Aug 9, 2019
4 checks passed
4 checks passed
Summary 1 potential rule
Details
codecov/patch 90.9% of diff hit (target 80.87%)
Details
codecov/project 80.88% (+0.01%) compared to 7bdda9d
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@garlick

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2019

Sorry, I reread your message and realized I misunderstood. I guess I still vote we kick the can down the road though :-)

Thanks for the merge!

@garlick garlick deleted the garlick:shell_xoff branch Aug 12, 2019
@garlick garlick referenced this pull request Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.