-
Notifications
You must be signed in to change notification settings - Fork 5
Handling batch submit/status failure for many tasks #210
Comments
I agree with 200 on the full batch going through. On a partial failure, we could do So we'd have: |
@yadudoc This seems reasonable, my only concern is about this case: user does If the user had called Alternatively, maybe when we get a 207 back, we just want to start iterating through it and raise the first error that we come across in the list of responses. Like if a user had multiple |
Handling partial failures is easier in
|
Some design thoughts for this approach:
the returned format for
to:
|
We need to consider this case:
This behavior is consistent with the current model that once the client gets a state once, that state is no longer saved on our end. If the user wanted to know that |
We also need to do more thinking about standardizing the format of task status objects. The current schema for task info from the service is pretty nice:
though we should switch Also, we need to think about if it is a bad idea to mix these with error objects, like when we are returning status info. |
FuncX SDK ChangesNon-async mode for single task run (default)
async mode
|
There is currently some inconsistent behavior with how
/batch_status
and/submit
error handling is done if many tasks are being dealt with.In the case of
/batch_status
, there is no real error handling. This route returns a list of task statuses that are requested for using thetask_ids
parameter. If a task is not found, it is added to this list with'status': 'Failed'
, and if it succeeds it is added to the list with the desired data for that task. The route always responds with a success response, even if all the tasks are marked as 'Failed'.In the case of
/submit
, there is some error handling, and the route will respond with a 4xx/5xx if any of the submitted tasks fail during submission. This means that even if some tasks were submitted successfully during the request, a failure will be sent back with no additional info if any task fails.This is a bit tricky particularly for the
/submit
endpoint, because if a user submits a single task the ideal standard is good error readability. But since this single submission is internally just a batch submit, it needs to also maintain consistency with what happens when many tasks are submitted, where some fail and others succeed.My proposal: If an internal error occurs preventing the request from being processed at all, or if status/submit fails for all of the tasks, a 4xx/5xx is sent back with an error. If a status/submit succeeds for one of the tasks, send back a 200, with a list of successes/failures for each individual task. For both status and submit, even if a task fails, proceed with the other tasks in the batch until they have all been tried.
The pros of this approach are readability for simple, single submission tasks. The cons of this are that it could be confusing that a 4xx/5xx is sent back if all tasks fail, but a 200 is sent back if some tasks fail.
Proposed Changes
The
/submit
route response would change from(response code usually 200 unless some or all task launches fail)
to
(This response code would be a 207 HTTP multi-response since there were some successes and some fails. If everything was a success, it would be a 200. If an internal error occurred that made everything fail, it would be some 5XX)
When the funcx sdk receives such a batch response, it would store all the failed task submits in the local table, to be retrieved with
get_result
orget_batch_result
. These failures would not be saved on the service side.I think similar changes to the
/batch_status
route would be fitting, though they wouldn't need to be as drastic. Each status response object in the list would be an "http response object" of its own like above.The text was updated successfully, but these errors were encountered: