wreck: need a way to list running,pending jobs #1456

grondo · 2018-04-12T21:03:07Z

The wreck prototype is not well suited for real-world use. It doesn't have a concept of queue (so pending jobs can't be sorted in priority order), doesn't track which jobs are running besides the lwj.state kvs entry, and doesn't offer any kind of search functionality (like show all single node jobs, or jobs by name, etc)

This means that the job listing tool flux-wreck ls is very awkward, since it orders output based on job id, which is pretty arbitrary in real-world use.

Like job listing utilities in other schedulers, we may need a way to list pending and running jobs, in priority order (with running jobs having de facto infinite priority, so they are always listed first), to get much more mileage with real-world use of the wreck system.

There's probably multiple ways to accomplish this, but reading all job states from the kvs is probably contraindicated, so we may want to set up some new soft link dirs like lwj-complete (because everything is solved with another layer of indirection).

We could also just keep a list of not-complete jobs in the job module. This might help that module ensure that runrequest jobs make it to running within some timeout. The wreck tools could query for this list.

If we wanted to sort pending jobs in priority order, then we'd probably have to query the scheduler.

In the wreck-replacement, we'll have to have a good story for how to do this in a distributed manner. I think we already had the beginning of a design in the KVS schema doc, but I haven't looked back at that yet. The purpose of this issue would be to satisfy the minimum requirement for splash.

The text was updated successfully, but these errors were encountered:

Improve wreck.joblist with the following additions: * retrieve a list of "active" jobs first using the new `job.list` rpc available from the wreck/job module. Inactive jobs are appended to this active list only if the returned list of active jobs does not meet or exceed arg.max. Fixes flux-framework#1456 * Allow filtering jobs by state in wreck.joblist with a states table with two allowable members - include: include *only* states where include[state] == true - exclude: exclude states in this table where exclude[state] == true These states are passed to `job.list` rpc for the active job list and directly filtered on kvs job state for the jobs retrieved from the kvs. * Add an active_only flag to wreck.joblist which returns immediately after retrieving the active job list from the job module. This effectively skips kvs traversal and all complete and failed jobs. * Add a kvs_only flag to wreck.joblist which skips the retrieval of active jobs from the `job.list` rpc. This avoids an unnecessary rpc when it is known that no active jobs are required to be returned from the function. (Should be used in combination with exclude/include to restrict job states returned from kvs)

garlick mentioned this issue Apr 20, 2018

wreck: track active jobs #1481

Merged

grondo closed this as completed in 245cd1f Apr 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wreck: need a way to list running,pending jobs #1456

wreck: need a way to list running,pending jobs #1456

grondo commented Apr 12, 2018

wreck: need a way to list running,pending jobs #1456

wreck: need a way to list running,pending jobs #1456

Comments

grondo commented Apr 12, 2018