New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Rollup] Add wait_for_completion
option to StopRollupJob API
#34811
[Rollup] Add wait_for_completion
option to StopRollupJob API
#34811
Conversation
This adds a flag which allows the user to block the Stop API until the task has actually moved to a stopped state, instead of returning immediately. If the flag is unset (default) the behavior is as before: returns immediately. If the flag is set to a TimeValue, it will block the call until the status changes to `STOPPED` (up to the specified time limit). If the time value is exceeded a timeout exception is thrown. Note: The job will still likely move to STOPPED eventually, the timeout just means the blocking call timed out before the state changed.
Pinging @elastic/es-search-aggs |
Quick note: if we find this too risky to put in 6.5, #34574 should largely negate the race condition that Kibana was worried about. If this PR doesn't merge the behavior will just be: user hits stop, spinny for a second, hits delete, gets an error about job still running, tries again ( Not wonderful, but not end of the world either. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @polyfractal . I left some comments.
@@ -22,6 +22,14 @@ Stopping an already stopped job has no action. | |||
`job_id` (required):: | |||
(string) Identifier for the job | |||
|
|||
==== Query Parameters | |||
|
|||
`wait_for_stopped` (optional):: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait_for_completion
for consistency with the other APIs that waits for the completion of the command ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it might be difficult for a user to set up a value for the timeout. So maybe we should make it a boolean
and have another parameter (timeout
) that could default to 30s
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was hoping to avoid two parameters, since configuring a timeout without setting wait_for_completion
to true won't do anything. But I can see how it's a strange setup. I could go either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two params solution is consistent with the GetTask API but I don't have a strong opinion here either.
@@ -52,12 +57,52 @@ protected void taskOperation(StopRollupJobAction.Request request, | |||
ActionListener<StopRollupJobAction.Response> listener) { | |||
if (jobTask.getConfig().getId().equals(request.getId())) { | |||
jobTask.stop(listener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not call the listener
if waitForStopped
is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urgh yes, that would have been bad...
@@ -52,12 +57,52 @@ protected void taskOperation(StopRollupJobAction.Request request, | |||
ActionListener<StopRollupJobAction.Response> listener) { | |||
if (jobTask.getConfig().getId().equals(request.getId())) { | |||
jobTask.stop(listener); | |||
waitForStopped(request, jobTask, listener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't block the current thread. I think we can use the generic
thread pool here in order to make sure that we don't block on important threads (network threads). Something like:
threadPool.generic().execute(new AbstractRunnable() {
@Override
protected void doRun() throws Exception {
if (waitForStopped(request, jobTask)) {
listener.onResponse(new Stop...);
} else {
listener.onFailure(new Stop...);
}
}
@Override
public void onFailure(Exception e) {
listener.onFailure(e);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just have the task run on the Generic thread from the start? E.g. in the ctor just specify generic
instead of same
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer if we explicitly fork to another thread with a comment explaining why we do that. But yes I think it would work if we use generic
directly. No strong opinion here either, just a preference ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@jimczi pending CI passing again I think I addressed your comments
|
Right, trying this again. This time it has the generic threadpool stuff :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for iterating !
wait_for_stopped
option to StopRollupJob APIwait_for_completion
option to StopRollupJob API
Jenkins, run gradle build tests please |
With elastic#34811 the API for stopping rollup jobs got two new url parameters "wait_for_completion" and "timeout". This change adds these to the HLRC APIs as well. Relates to elastic#34811
This adds a `wait_for_completion` flag which allows the user to block the Stop API until the task has actually moved to a stopped state, instead of returning immediately. If the flag is set, a `timeout` parameter can be specified to determine how long (at max) to block the API call. If unspecified, the timeout is 30s. If the timeout is exceeded before the job moves to STOPPED, a timeout exception is thrown. Note: this is just signifying that the API call itself timed out. The job will remain in STOPPING and evenutally flip over to STOPPED in the background. If the user asks the API to block, we move over the the generic threadpool so that we don't hold up a networking thread.
This adds a flag which allows the user to block the Stop API until the task has actually moved to a stopped state, instead of returning immediately.
If the flag is unset (default) the behavior is as before: returns immediately and stopping happens async in the background. If the flag is set to a TimeValue, it will block the call until the status changes to
STOPPED
(up to the specified time limit). If the time value is exceeded a timeout exception is thrown.Question: The current code executes the Tasks operation on the
SAME
thread, which I'm assuming would be the networking thread. Since this opens the chance of blocking the thread temporarily, should we switch toMANAGEMENT
orGENERIC
or something?Also, this is technically an enhancement but critical to a Kibana blocker, so I'd lobby that it be allowed into 6.5
Followup to #34574
/cc @jen-huang @cjcenizal