Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wreck: need a way to list running,pending jobs #1456

Closed
grondo opened this issue Apr 12, 2018 · 0 comments
Closed

wreck: need a way to list running,pending jobs #1456

grondo opened this issue Apr 12, 2018 · 0 comments

Comments

@grondo
Copy link
Contributor

grondo commented Apr 12, 2018

The wreck prototype is not well suited for real-world use. It doesn't have a concept of queue (so pending jobs can't be sorted in priority order), doesn't track which jobs are running besides the lwj.state kvs entry, and doesn't offer any kind of search functionality (like show all single node jobs, or jobs by name, etc)

This means that the job listing tool flux-wreck ls is very awkward, since it orders output based on job id, which is pretty arbitrary in real-world use.

Like job listing utilities in other schedulers, we may need a way to list pending and running jobs, in priority order (with running jobs having de facto infinite priority, so they are always listed first), to get much more mileage with real-world use of the wreck system.

There's probably multiple ways to accomplish this, but reading all job states from the kvs is probably contraindicated, so we may want to set up some new soft link dirs like lwj-complete (because everything is solved with another layer of indirection).

We could also just keep a list of not-complete jobs in the job module. This might help that module ensure that runrequest jobs make it to running within some timeout. The wreck tools could query for this list.

If we wanted to sort pending jobs in priority order, then we'd probably have to query the scheduler.

In the wreck-replacement, we'll have to have a good story for how to do this in a distributed manner. I think we already had the beginning of a design in the KVS schema doc, but I haven't looked back at that yet. The purpose of this issue would be to satisfy the minimum requirement for splash.

grondo added a commit to garlick/flux-core that referenced this issue Apr 24, 2018
Improve wreck.joblist with the following additions:

 * retrieve a list of "active" jobs first using the new `job.list`
   rpc available from the wreck/job module. Inactive jobs are
   appended to this active list only if the returned list of
   active jobs does not meet or exceed arg.max.
   Fixes flux-framework#1456

 * Allow filtering jobs by state in wreck.joblist with a states
   table with two allowable members

    - include:  include *only* states where include[state] == true
    - exclude:  exclude states in this table where exclude[state] == true

   These states are passed to `job.list` rpc for the active job list
   and directly filtered on kvs job state for the jobs retrieved from
   the kvs.

 * Add an active_only flag to wreck.joblist which returns immediately
   after retrieving the active job list from the job module. This effectively
   skips kvs traversal and all complete and failed jobs.

 * Add a kvs_only flag to wreck.joblist which skips the retrieval of active
   jobs from the `job.list` rpc. This avoids an unnecessary rpc when it is
   known that no active jobs are required to be returned from the function.
   (Should be used in combination with exclude/include to restrict job
   states returned from kvs)
garlick pushed a commit to garlick/flux-core that referenced this issue Apr 25, 2018
Improve wreck.joblist with the following additions:

 * retrieve a list of "active" jobs first using the new `job.list`
   rpc available from the wreck/job module. Inactive jobs are
   appended to this active list only if the returned list of
   active jobs does not meet or exceed arg.max.
   Fixes flux-framework#1456

 * Allow filtering jobs by state in wreck.joblist with a states
   table with two allowable members

    - include:  include *only* states where include[state] == true
    - exclude:  exclude states in this table where exclude[state] == true

   These states are passed to `job.list` rpc for the active job list
   and directly filtered on kvs job state for the jobs retrieved from
   the kvs.

 * Add an active_only flag to wreck.joblist which returns immediately
   after retrieving the active job list from the job module. This effectively
   skips kvs traversal and all complete and failed jobs.

 * Add a kvs_only flag to wreck.joblist which skips the retrieval of active
   jobs from the `job.list` rpc. This avoids an unnecessary rpc when it is
   known that no active jobs are required to be returned from the function.
   (Should be used in combination with exclude/include to restrict job
   states returned from kvs)
garlick pushed a commit to garlick/flux-core that referenced this issue Apr 26, 2018
Improve wreck.joblist with the following additions:

 * retrieve a list of "active" jobs first using the new `job.list`
   rpc available from the wreck/job module. Inactive jobs are
   appended to this active list only if the returned list of
   active jobs does not meet or exceed arg.max.
   Fixes flux-framework#1456

 * Allow filtering jobs by state in wreck.joblist with a states
   table with two allowable members

    - include:  include *only* states where include[state] == true
    - exclude:  exclude states in this table where exclude[state] == true

   These states are passed to `job.list` rpc for the active job list
   and directly filtered on kvs job state for the jobs retrieved from
   the kvs.

 * Add an active_only flag to wreck.joblist which returns immediately
   after retrieving the active job list from the job module. This effectively
   skips kvs traversal and all complete and failed jobs.

 * Add a kvs_only flag to wreck.joblist which skips the retrieval of active
   jobs from the `job.list` rpc. This avoids an unnecessary rpc when it is
   known that no active jobs are required to be returned from the function.
   (Should be used in combination with exclude/include to restrict job
   states returned from kvs)
@grondo grondo closed this as completed in 245cd1f Apr 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant