flux-jobs: add --json option to dump output in JSON #4994

grondo · 2023-03-14T15:27:06Z

There seems to be a need for a command line utility that dumps all known information for a job (i.e. all fields available in flux-jobs) to JSON. Currently, we have the deprecated (and hidden) flux job list-ids, but that dumps the raw information returned from the job-list module and thus is not as useful as the processed information available in flux jobs via the JobInfo class.

Therefore, perhaps the easiest path forward would be to add a --json argument to flux jobs which ignored any format and just dumped a JSON representation of the matching JobInfo objects.

The text was updated successfully, but these errors were encountered:

vsoch · 2023-03-14T15:28:54Z

bump bump bump! 🙌

I have badly needed this like 8 times!! 😆

I'll also add a note I made in chat - the json dump might include more info than shown in the table. Specifically I'd like the status and state, and return code. Exceptions would be extra nice! A suggested prototype:

{
    "id": 361295786278912,
    "userid": 0,
    "urgency": 16,
    "priority": 16,
    "t_submit": 1668305924.5898263,
    "t_depend": 1668305924.5898263,
    "t_run": 1668305924.6036532,
    "state": "RUN",
    "name": "sleep",
    "ntasks": 1,
    "ncores": 1,
    "duration": 0.0,
    "nnodes": 1,
    "ranks": "3",
    "nodelist": "57d0f60fce2e",
    "expiration": 4821905924.0,
    "result": "",
    "returncode": "",
    "runtime": 0.4636096954345703,
    "waitstatus": "",
    "exception": {
        "occurred": "",
        "severity": "",
        "type": "",
        "note": ""
    }
}

chu11 · 2023-03-27T12:54:07Z

The last time we discussed this, I think one of the major questions was how to deal with "non existent" fields. e.g. a job that is not yet running does not have a t_run field or a runtime. jobs that haven't failed don't have an exception. etc.

It seems from @vsoch response above, folks would prefer ALL fields and appropriate emptiness/defaults when data is not yet sensible.

If we're going to return ALL fields, possible crazy idea. Could json output just be a special formatting output, i.e. like --format=json. The format would be (excuse possible need to escape stuff) { "id": {id}, "name": "{name}", ...}

Pro: no need to change JobInfo class or refactor any logic in flux-jobs or add a --json option or anything.

Con: when new fields are added, have to update the format. Although, if we add a JobInfo.get_json() or whatever function, that may need to be updated anyways. And a JobInfo.get_json() may be useful to users of the JobInfo class.

grondo · 2023-03-27T13:42:35Z

I had that same thought and came to same conclusion as you, a JobInfo.to_json() or similar method would be more useful.

grondo · 2023-03-30T15:42:08Z

I have a prototype of this. Currently it emits a JSON object with an array of job dictionaries under a jobs, key, e.g.

{
  "jobs": [
    {
      "t_depend": 1680190618.4250562,
      "t_run": 0,
      "t_cleanup": 0,
      "t_inactive": 0,
      "duration": 0,
      "expiration": 0,
      "name": "hostname",
      "queue": "",
      "ntasks": 16,
      "ncores": 16,
      "nnodes": "",
      "priority": 16,
      "ranks": "",
      "nodelist": "",
      "success": "",
      "result": "",
      "waitstatus": "",
      "id": 453219713024,
      "t_submit": 1680190618.4123542,
      "t_remaining": 0,
      "state": "SCHED",
      "username": "grondo",
      "userid": 1000,
      "urgency": 16,
      "runtime": 0,
      "status": "SCHED",
      "returncode": "",
      "dependencies": [],
      "annotations": {},
      "exception": {
        "occurred": "",
        "severity": "",
        "type": "",
        "note": ""
      }
    },
    {
      "t_depend": 1680190612.6026108,
      "t_run": 1680190612.61606,
      "t_cleanup": 0,
      "t_inactive": 0,
      "duration": 0,
      "expiration": 0,
      "name": "sleep",
      "queue": "",
      "ntasks": 2,
      "ncores": 2,
      "nnodes": 1,
      "priority": 16,
      "ranks": "0",
      "nodelist": "asp",
      "success": "",
      "result": "",
      "waitstatus": "",
      "id": 355542761472,
      "t_submit": 1680190612.5900893,
      "t_remaining": 0,
      "state": "RUN",
      "username": "grondo",
      "userid": 1000,
      "urgency": 16,
      "runtime": 11.922030448913574,
      "status": "RUN",
      "returncode": "",
      "dependencies": [],
      "annotations": {
        "sched": {
          "resource_summary": "rank0/core[0-1]"
        }
      },
      "exception": {
        "occurred": "",
        "severity": "",
        "type": "",
        "note": ""
      }
    },
    {
      "t_depend": 1680190600.589544,
      "t_run": 1680190600.603245,
      "t_cleanup": 1680190600.6439605,
      "t_inactive": 1680190600.6472647,
      "duration": 0,
      "expiration": 0,
      "name": "hostname",
      "queue": "",
      "ntasks": 1,
      "ncores": 1,
      "nnodes": 1,
      "priority": 16,
      "ranks": "0",
      "nodelist": "asp",
      "success": true,
      "result": "COMPLETED",
      "waitstatus": 0,
      "id": 153998065664,
      "t_submit": 1680190600.5772986,
      "t_remaining": 0,
      "state": "INACTIVE",
      "username": "grondo",
      "userid": 1000,
      "urgency": 16,
      "runtime": 0.04071545600891113,
      "status": "COMPLETED",
      "returncode": 0,
      "dependencies": [],
      "annotations": {
        "sched": {
          "resource_summary": "rank0/core0"
        }
      },
      "exception": {
        "occurred": false,
        "severity": "",
        "type": "",
        "note": ""
      }
    }
  ]
}

If there are no jobs you still get {"jobs": []}.

Would it be better if, when only one job is requested with flux jobs --json JOBID, a single JSON object for that job was output? Seems like consistency would be better but I thought I'd ask.

Also, as noted by @chu11 above, this implementation just uses the defaults from JobInfo so a lot of "unset" attributes are empty strings. Maybe this is a good place to start?

chu11 · 2023-03-30T16:09:53Z

Just skimming the above, it occurred to me that the default return code and exception occurred are empty strings (ie “”). Should we pick a default that is the correct type? Granted it’s hard to come up with a good default for an integer return code and boolean. Maybe it can’t be helped.

Edit: oh and waitstatus

grondo · 2023-03-30T16:16:37Z

Yeah, I'm not sure either. This approach is the least code. Replacing empty strings with something "valid" will be a bit fussy and will all have to go into the to_dict() function. The second easiest thing would just be to clear any empty values from the dict before returning it.

grondo · 2023-03-30T16:23:41Z

Here's an active job with empty values removed from the result (just for comparison):

{
  "jobs": [
    {
      "t_depend": 1680193341.4998024,
      "t_run": 1680193341.513417,
      "t_cleanup": 0,
      "t_inactive": 0,
      "duration": 0,
      "expiration": 0,
      "name": "sleep",
      "ntasks": 1,
      "ncores": 1,
      "nnodes": 1,
      "priority": 16,
      "ranks": "0",
      "nodelist": "asp",
      "id": 46138837172224,
      "t_submit": 1680193341.486764,
      "t_remaining": 0,
      "state": "RUN",
      "username": "grondo",
      "userid": 1000,
      "urgency": 16,
      "runtime": 5.172987937927246,
      "status": "RUN",
      "dependencies": [],
      "annotations": {
        "sched": {
          "resource_summary": "rank0/core0"
        }
      },
      "exception": {
        "occurred": false
      }
    }
  ]
}

grondo · 2023-03-30T16:27:34Z

Calling on @vsoch! We're in dire need of a strong opinion here!

vsoch · 2023-03-30T16:44:58Z

I would typically keep the keys consistent so it’s easier to parse in languages like Go, but if you think it looks better without, those fields could be optional. Another pro is that likely the listings could be large, and if we hide the missing fields it would reduce the size a bit. So probably for that reason alone missing or unknown fields should be removed.

For the structure, I would expect the result without an ID to return a listing of jobs, and with an ID just one job. It’s not perfectly consistent, but it will be easier to parse. As long as the JOBID is a field we wouldn’t need it redundantly as a key too.

grondo · 2023-03-30T16:51:09Z

Yeah, if it is ok to leave the unknown fields out, let's go with that. Not only for the reasons you mention, but also because sometimes it is not obvious what the unset value should even be.

I'll change the output when a single job is requested to be just one JSON object for the job. Will get to that later today and post a WIP PR. Thanks!

chu11 · 2023-03-30T16:51:20Z

The second easiest thing would just be to clear any empty values from the dict before returning it.

This gets complicated even more, b/c when t_run is 0, that's also "empty", we just defaulted it to 0 instead of empty string.

My gut feeling is to leave all of the "empty/default" stuff in. It just seems easier for the average user scripting/programming than having to do the average "does this field exist" check.

But @vsoch's point about less data is important. Dunno where the pro/con tradeoff lies.

Edit: also dependencies is an empty array in above, that's also "empty"

vsoch · 2023-03-30T16:52:49Z

@chu11 I’m conflicted about it for the same reasons. It was the payload potential size that flipped me to err on the side of taking them out.

chu11 · 2023-03-30T16:54:49Z

Would it be hard/offensive to have an option for one way or the other? --json and --json-prune? (prune isn't good, just throwing that out there).

grondo · 2023-03-30T16:56:15Z

yeah, we could remove the unset t_* attributes as well. Something like dependencies is more questionable since an empty array is meaningful in that context (i.e. dependencies is not unset it is set to an empty list)

Would it be hard/offensive to have an option for one way or the other? --json and --json-prune? (prune isn't good, just throwing that out there).

I say we pick one way for now, and if a use case comes up that requires fully populated JSON objects for jobs, we could add an option for that.

chu11 · 2023-03-30T17:04:30Z

sounds good. skimming the list, the only "doesnt make sense" ones that are 0 are runtime and expiration, which are populated when the job starts I think. I think duration=0 does make sense being empty, as that's a user input. (IIRC)

chu11 · 2023-03-30T18:55:57Z

I say we pick one way for now, and if a use case comes up that requires fully populated JSON objects for jobs, we could add an option for that.

Thinking about this for a bit, shouldn't it be the other way around? fully populated JSON objects is the less-efficient "easy" way, non-fully-populated is the optimized advanced way? thus the latter would get an option?

grondo · 2023-03-30T21:20:07Z

I don't know, it doesn't matter which way is technically easy vs advanced. We pick the way that is the "default" and then people are upset we add an option to change the default. It is so easy to work around the missing fields that I think nobody is ever going to ask for the option.

vsoch · 2023-04-02T07:00:51Z

For others that stumble on this issue, right now I'm using:

$ flux jobs --no-header -o '{status}:{returncode}' ƒj4CgebBV

To get a return code (and you could change the -o parameter to get something else.

grondo added this to the flux-core v0.49.0 milestone Mar 23, 2023

grondo self-assigned this Mar 30, 2023

grondo mentioned this issue Mar 31, 2023

flux-jobs: add --json option #5054

Merged

mergify bot closed this as completed in #5054 Apr 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flux-jobs: add --json option to dump output in JSON #4994

flux-jobs: add --json option to dump output in JSON #4994

grondo commented Mar 14, 2023

vsoch commented Mar 14, 2023

chu11 commented Mar 27, 2023

grondo commented Mar 27, 2023

grondo commented Mar 30, 2023

chu11 commented Mar 30, 2023 •

edited

grondo commented Mar 30, 2023

grondo commented Mar 30, 2023

grondo commented Mar 30, 2023

vsoch commented Mar 30, 2023

grondo commented Mar 30, 2023

chu11 commented Mar 30, 2023 •

edited

vsoch commented Mar 30, 2023

chu11 commented Mar 30, 2023

grondo commented Mar 30, 2023

chu11 commented Mar 30, 2023

chu11 commented Mar 30, 2023

grondo commented Mar 30, 2023

vsoch commented Apr 2, 2023

flux-jobs: add --json option to dump output in JSON #4994

flux-jobs: add --json option to dump output in JSON #4994

Comments

grondo commented Mar 14, 2023

vsoch commented Mar 14, 2023

chu11 commented Mar 27, 2023

grondo commented Mar 27, 2023

grondo commented Mar 30, 2023

chu11 commented Mar 30, 2023 • edited

grondo commented Mar 30, 2023

grondo commented Mar 30, 2023

grondo commented Mar 30, 2023

vsoch commented Mar 30, 2023

grondo commented Mar 30, 2023

chu11 commented Mar 30, 2023 • edited

vsoch commented Mar 30, 2023

chu11 commented Mar 30, 2023

grondo commented Mar 30, 2023

chu11 commented Mar 30, 2023

chu11 commented Mar 30, 2023

grondo commented Mar 30, 2023

vsoch commented Apr 2, 2023

chu11 commented Mar 30, 2023 •

edited

chu11 commented Mar 30, 2023 •

edited