Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rollup] Add wait_for_completion option to StopRollupJob API #34811

Merged
merged 11 commits into from Nov 13, 2018

Conversation

polyfractal
Copy link
Contributor

@polyfractal polyfractal commented Oct 24, 2018

This adds a flag which allows the user to block the Stop API until the task has actually moved to a stopped state, instead of returning immediately.

If the flag is unset (default) the behavior is as before: returns immediately and stopping happens async in the background. If the flag is set to a TimeValue, it will block the call until the status changes to STOPPED (up to the specified time limit). If the time value is exceeded a timeout exception is thrown.

Question: The current code executes the Tasks operation on the SAME thread, which I'm assuming would be the networking thread. Since this opens the chance of blocking the thread temporarily, should we switch to MANAGEMENT or GENERIC or something?

Also, this is technically an enhancement but critical to a Kibana blocker, so I'd lobby that it be allowed into 6.5

Followup to #34574

/cc @jen-huang @cjcenizal

This adds a flag which allows the user to block the Stop API until
the task has actually moved to a stopped state, instead of returning
immediately.

If the flag is unset (default) the behavior is as before: returns
immediately.  If the flag is set to a TimeValue, it will block the call
until the status changes to `STOPPED` (up to the specified time limit).
If the time value is exceeded a timeout exception is thrown.

Note: The job will still likely move to STOPPED eventually, the timeout
just means the blocking call timed out before the state changed.
@polyfractal polyfractal added >enhancement v7.0.0 :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v6.5.0 labels Oct 24, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@polyfractal
Copy link
Contributor Author

Quick note: if we find this too risky to put in 6.5, #34574 should largely negate the race condition that Kibana was worried about. If this PR doesn't merge the behavior will just be:

user hits stop, spinny for a second, hits delete, gets an error about job still running, tries again (n times) until it finally deletes.

Not wonderful, but not end of the world either.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @polyfractal . I left some comments.

@@ -22,6 +22,14 @@ Stopping an already stopped job has no action.
`job_id` (required)::
(string) Identifier for the job

==== Query Parameters

`wait_for_stopped` (optional)::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_for_completion for consistency with the other APIs that waits for the completion of the command ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it might be difficult for a user to set up a value for the timeout. So maybe we should make it a boolean and have another parameter (timeout) that could default to 30s ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was hoping to avoid two parameters, since configuring a timeout without setting wait_for_completion to true won't do anything. But I can see how it's a strange setup. I could go either way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two params solution is consistent with the GetTask API but I don't have a strong opinion here either.

@@ -52,12 +57,52 @@ protected void taskOperation(StopRollupJobAction.Request request,
ActionListener<StopRollupJobAction.Response> listener) {
if (jobTask.getConfig().getId().equals(request.getId())) {
jobTask.stop(listener);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not call the listener if waitForStopped is set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urgh yes, that would have been bad...

@@ -52,12 +57,52 @@ protected void taskOperation(StopRollupJobAction.Request request,
ActionListener<StopRollupJobAction.Response> listener) {
if (jobTask.getConfig().getId().equals(request.getId())) {
jobTask.stop(listener);
waitForStopped(request, jobTask, listener);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't block the current thread. I think we can use the generic thread pool here in order to make sure that we don't block on important threads (network threads). Something like:

 threadPool.generic().execute(new AbstractRunnable() {
                   @Override
                    protected void doRun() throws Exception {
                          if (waitForStopped(request, jobTask)) {
                            listener.onResponse(new Stop...);
                          } else {
                            listener.onFailure(new Stop...);
                         }
                    }

                    @Override
                    public void onFailure(Exception e) {
                        listener.onFailure(e);
                    }
   

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just have the task run on the Generic thread from the start? E.g. in the ctor just specify generic instead of same?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer if we explicitly fork to another thread with a comment explaining why we do that. But yes I think it would work if we use generic directly. No strong opinion here either, just a preference ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@colings86 colings86 added v6.6.0 and removed v6.5.0 labels Oct 25, 2018
@polyfractal
Copy link
Contributor Author

@jimczi pending CI passing again I think I addressed your comments

  • swapped to wait_for_completion, added a timeout. If timeout is omitted the default is 30s like the other task APIs. If wait_for_completion: false then the timeout doesn't do anything
  • Spawns to a generic threadpool on demand, as discussed.
  • Refactored so that the listener is wrapped correctly instead of being called twice.

@polyfractal
Copy link
Contributor Author

Right, trying this again. This time it has the generic threadpool stuff :)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for iterating !

@polyfractal polyfractal changed the title [Rollup] Add wait_for_stopped option to StopRollupJob API [Rollup] Add wait_for_completion option to StopRollupJob API Nov 13, 2018
@polyfractal
Copy link
Contributor Author

Jenkins, run gradle build tests please

@polyfractal polyfractal merged commit c346a0f into elastic:master Nov 13, 2018
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Nov 14, 2018
With elastic#34811 the API for stopping rollup jobs got two new url parameters
"wait_for_completion" and "timeout". This change adds these to the HLRC APIs as
well.

Relates to elastic#34811
polyfractal added a commit that referenced this pull request Nov 14, 2018
This adds a `wait_for_completion` flag which allows the user to block
the Stop API until the task has actually moved to a stopped state,
instead of returning immediately.  If the flag is set, a `timeout` parameter
can be specified to determine how long (at max) to block the API
call.  If unspecified, the timeout is 30s.

If the timeout is exceeded before the job moves to STOPPED, a
timeout exception is thrown.  Note: this is just signifying that the API
call itself timed out.  The job will remain in STOPPING and evenutally
flip over to STOPPED in the background.

If the user asks the API to block, we move over the the generic
threadpool so that we don't hold up a networking thread.
cbuescher pushed a commit that referenced this pull request Nov 15, 2018
With #34811 the API for stopping rollup jobs got two new url parameters
"wait_for_completion" and "timeout". This change adds these to the HLRC APIs as
well.

Relates to #34811
cbuescher pushed a commit that referenced this pull request Nov 15, 2018
With #34811 the API for stopping rollup jobs got two new url parameters
"wait_for_completion" and "timeout". This change adds these to the HLRC APIs as
well.

Relates to #34811
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants