Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux-jobs: improve recursive job listing #4024

Merged
merged 6 commits into from Jan 5, 2022

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Jan 4, 2022

This PR makes some improvements to recursive job listing by

  • only attempting to recurse into current user's jobs by default, unless --recurse-all is used.
  • connect and fetch all child job info in parallel using a ThreadPoolExecutor. A new --threads option is added to optionally adjust the max_workers value for this recursive jobs as well as instance-info gathering ThreadPools.

This is based on top of #4022

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, there are a lot of great improvements in here! Approving, just a couple of minor comments.

@@ -191,7 +191,7 @@ def __getattr__(self, attr):
raise AttributeError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit message for c4a3555, missed a "t" in "AtributeError".

Comment on lines 81 to 87
**--recurse-all**
By default, jobs not owned by the user running ``flux jobs`` are
skipped with ``-R, --recurseive``. This option forces the command
to attempt to recurse into the jobs of other users. Implies
``--recursive``.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a note that "By default, Flux instances can only permit the instance owner to connect"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! I added that comment, rebased on current master and pushed the result.

Problem: In a multi-user instance, flux-jobs -R, --recursive will
attempt to recurse into all jobs found in the initial listing. If the
-A or --user=all options are used, then many of these jobs may be for
different users, which almost certainly will result in failures to
connect, greatly slowing the program down as each flux_open(3) fails.

Only recurse into jobs owned by the current user by default, unless
a new --recurse-all option is provided.
Problem: The --recurse-all option is not documented in the flux-jobs
man page.

Add documentation for this option.
Problem: flux-jobs -R, --recursive currently queries recursive job
lists serially, which can be quite slow especially if one or more
connections time out or get errors.

Use a ThreadPoolExecutor to run all recursive job-list queries.
Problem: It may be useful to allow user to tune the max number of
worker threads when flux-jobs uses a ThreadPoolExecutor.

Add a --threads option to set the max_workers option in the
ThreadPoolExecutor constructor.
Problem: The flux-jobs --threads option is not documented.

Add a section describing the --threads option to the flux-jobs(1)
manpage.
Problem: t2800-jobs-recursive.t doesn't test handling of jobs owned
by users other than the current user and the --recurse-all option.

Add support for running a fake job as an alternate user to this test,
and ensure that this job is skipped by default for recursion, even
with -A, and recursion on this job is attempted wtih --recurse-all.
@grondo
Copy link
Contributor Author

grondo commented Jan 5, 2022

Rebased after #4022 landed. Ok to set MWP?

@grondo
Copy link
Contributor Author

grondo commented Jan 5, 2022

went ahead and set MWP.

@codecov
Copy link

codecov bot commented Jan 5, 2022

Codecov Report

Merging #4024 (fe89e96) into master (05e6fe6) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4024      +/-   ##
==========================================
+ Coverage   83.32%   83.36%   +0.04%     
==========================================
  Files         373      373              
  Lines       61060    61078      +18     
==========================================
+ Hits        50877    50918      +41     
+ Misses      10183    10160      -23     
Impacted Files Coverage Δ
src/cmd/flux-jobs.py 96.04% <100.00%> (+1.70%) ⬆️
src/cmd/top/top.c 76.71% <0.00%> (-3.43%) ⬇️
src/cmd/top/summary_pane.c 79.14% <0.00%> (-2.14%) ⬇️
src/modules/job-archive/job-archive.c 60.66% <0.00%> (-0.74%) ⬇️
src/cmd/flux-module.c 83.38% <0.00%> (+0.29%) ⬆️
src/shell/input.c 83.33% <0.00%> (+0.34%) ⬆️
src/broker/overlay.c 87.69% <0.00%> (+0.47%) ⬆️
src/modules/job-info/guest_watch.c 77.02% <0.00%> (+0.81%) ⬆️
src/bindings/python/flux/core/handle.py 92.10% <0.00%> (+1.31%) ⬆️
src/modules/job-info/watch.c 68.24% <0.00%> (+1.71%) ⬆️
... and 3 more

@mergify mergify bot merged commit 8a472fe into flux-framework:master Jan 5, 2022
@grondo grondo deleted the flux-jobs-fixes branch January 5, 2022 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants