-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flux-jobs: add --json option to dump output in JSON #4994
Comments
bump bump bump! 🙌 I have badly needed this like 8 times!! 😆 I'll also add a note I made in chat - the json dump might include more info than shown in the table. Specifically I'd like the status and state, and return code. Exceptions would be extra nice! A suggested prototype:
|
The last time we discussed this, I think one of the major questions was how to deal with "non existent" fields. e.g. a job that is not yet running does not have a It seems from @vsoch response above, folks would prefer ALL fields and appropriate emptiness/defaults when data is not yet sensible. If we're going to return ALL fields, possible crazy idea. Could json output just be a special formatting output, i.e. like Pro: no need to change Con: when new fields are added, have to update the format. Although, if we add a |
I had that same thought and came to same conclusion as you, a |
I have a prototype of this. Currently it emits a JSON object with an array of job dictionaries under a {
"jobs": [
{
"t_depend": 1680190618.4250562,
"t_run": 0,
"t_cleanup": 0,
"t_inactive": 0,
"duration": 0,
"expiration": 0,
"name": "hostname",
"queue": "",
"ntasks": 16,
"ncores": 16,
"nnodes": "",
"priority": 16,
"ranks": "",
"nodelist": "",
"success": "",
"result": "",
"waitstatus": "",
"id": 453219713024,
"t_submit": 1680190618.4123542,
"t_remaining": 0,
"state": "SCHED",
"username": "grondo",
"userid": 1000,
"urgency": 16,
"runtime": 0,
"status": "SCHED",
"returncode": "",
"dependencies": [],
"annotations": {},
"exception": {
"occurred": "",
"severity": "",
"type": "",
"note": ""
}
},
{
"t_depend": 1680190612.6026108,
"t_run": 1680190612.61606,
"t_cleanup": 0,
"t_inactive": 0,
"duration": 0,
"expiration": 0,
"name": "sleep",
"queue": "",
"ntasks": 2,
"ncores": 2,
"nnodes": 1,
"priority": 16,
"ranks": "0",
"nodelist": "asp",
"success": "",
"result": "",
"waitstatus": "",
"id": 355542761472,
"t_submit": 1680190612.5900893,
"t_remaining": 0,
"state": "RUN",
"username": "grondo",
"userid": 1000,
"urgency": 16,
"runtime": 11.922030448913574,
"status": "RUN",
"returncode": "",
"dependencies": [],
"annotations": {
"sched": {
"resource_summary": "rank0/core[0-1]"
}
},
"exception": {
"occurred": "",
"severity": "",
"type": "",
"note": ""
}
},
{
"t_depend": 1680190600.589544,
"t_run": 1680190600.603245,
"t_cleanup": 1680190600.6439605,
"t_inactive": 1680190600.6472647,
"duration": 0,
"expiration": 0,
"name": "hostname",
"queue": "",
"ntasks": 1,
"ncores": 1,
"nnodes": 1,
"priority": 16,
"ranks": "0",
"nodelist": "asp",
"success": true,
"result": "COMPLETED",
"waitstatus": 0,
"id": 153998065664,
"t_submit": 1680190600.5772986,
"t_remaining": 0,
"state": "INACTIVE",
"username": "grondo",
"userid": 1000,
"urgency": 16,
"runtime": 0.04071545600891113,
"status": "COMPLETED",
"returncode": 0,
"dependencies": [],
"annotations": {
"sched": {
"resource_summary": "rank0/core0"
}
},
"exception": {
"occurred": false,
"severity": "",
"type": "",
"note": ""
}
}
]
} If there are no jobs you still get Would it be better if, when only one job is requested with Also, as noted by @chu11 above, this implementation just uses the defaults from JobInfo so a lot of "unset" attributes are empty strings. Maybe this is a good place to start? |
Just skimming the above, it occurred to me that the default return code and exception occurred are empty strings (ie “”). Should we pick a default that is the correct type? Granted it’s hard to come up with a good default for an integer return code and boolean. Maybe it can’t be helped. Edit: oh and waitstatus |
Yeah, I'm not sure either. This approach is the least code. Replacing empty strings with something "valid" will be a bit fussy and will all have to go into the |
Here's an active job with empty values removed from the result (just for comparison): {
"jobs": [
{
"t_depend": 1680193341.4998024,
"t_run": 1680193341.513417,
"t_cleanup": 0,
"t_inactive": 0,
"duration": 0,
"expiration": 0,
"name": "sleep",
"ntasks": 1,
"ncores": 1,
"nnodes": 1,
"priority": 16,
"ranks": "0",
"nodelist": "asp",
"id": 46138837172224,
"t_submit": 1680193341.486764,
"t_remaining": 0,
"state": "RUN",
"username": "grondo",
"userid": 1000,
"urgency": 16,
"runtime": 5.172987937927246,
"status": "RUN",
"dependencies": [],
"annotations": {
"sched": {
"resource_summary": "rank0/core0"
}
},
"exception": {
"occurred": false
}
}
]
} |
Calling on @vsoch! We're in dire need of a strong opinion here! |
I would typically keep the keys consistent so it’s easier to parse in languages like Go, but if you think it looks better without, those fields could be optional. Another pro is that likely the listings could be large, and if we hide the missing fields it would reduce the size a bit. So probably for that reason alone missing or unknown fields should be removed. For the structure, I would expect the result without an ID to return a listing of jobs, and with an ID just one job. It’s not perfectly consistent, but it will be easier to parse. As long as the JOBID is a field we wouldn’t need it redundantly as a key too. |
Yeah, if it is ok to leave the unknown fields out, let's go with that. Not only for the reasons you mention, but also because sometimes it is not obvious what the unset value should even be. I'll change the output when a single job is requested to be just one JSON object for the job. Will get to that later today and post a WIP PR. Thanks! |
This gets complicated even more, b/c when My gut feeling is to leave all of the "empty/default" stuff in. It just seems easier for the average user scripting/programming than having to do the average "does this field exist" check. But @vsoch's point about less data is important. Dunno where the pro/con tradeoff lies. Edit: also dependencies is an empty array in above, that's also "empty" |
@chu11 I’m conflicted about it for the same reasons. It was the payload potential size that flipped me to err on the side of taking them out. |
Would it be hard/offensive to have an option for one way or the other? |
yeah, we could remove the unset
I say we pick one way for now, and if a use case comes up that requires fully populated JSON objects for jobs, we could add an option for that. |
sounds good. skimming the list, the only "doesnt make sense" ones that are 0 are |
Thinking about this for a bit, shouldn't it be the other way around? fully populated JSON objects is the less-efficient "easy" way, non-fully-populated is the optimized advanced way? thus the latter would get an option? |
I don't know, it doesn't matter which way is technically easy vs advanced. We pick the way that is the "default" and then people are upset we add an option to change the default. It is so easy to work around the missing fields that I think nobody is ever going to ask for the option. |
For others that stumble on this issue, right now I'm using: $ flux jobs --no-header -o '{status}:{returncode}' ƒj4CgebBV To get a return code (and you could change the |
There seems to be a need for a command line utility that dumps all known information for a job (i.e. all fields available in flux-jobs) to JSON. Currently, we have the deprecated (and hidden)
flux job list-ids
, but that dumps the raw information returned from the job-list module and thus is not as useful as the processed information available influx jobs
via theJobInfo
class.Therefore, perhaps the easiest path forward would be to add a
--json
argument toflux jobs
which ignored any format and just dumped a JSON representation of the matchingJobInfo
objects.The text was updated successfully, but these errors were encountered: