Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makes search action cancelable by task management API #20405

Merged
merged 1 commit into from Oct 26, 2016

Conversation

Projects
None yet
4 participants
@imotov
Copy link
Member

commented Sep 9, 2016

Long running searches now can be cancelled using standard task cancellation mechanism.

@nik9000

This comment has been minimized.

Copy link
Contributor

commented Sep 9, 2016

@imotov, want me to review it?

@imotov

This comment has been minimized.

Copy link
Member Author

commented Sep 9, 2016

@nik9000 sure - you from Task Management perspective and we will need somebody closer to Lucene ideally.

public SearchTask(long id, String type, String action, String description, TaskId parentTaskId) {
super(id, type, action, description, parentTaskId);
}

This comment has been minimized.

Copy link
@nik9000

nik9000 Sep 9, 2016

Contributor

It'd be cool to be able to list the phase and/or which shards you are waiting for. You could put all kinds of cool stuff in here one day! But for now this seems like the right thing to do.

This comment has been minimized.

Copy link
@imotov

imotov Sep 9, 2016

Author Member

Yes, that's the plan, but I didn't want to clump everything into a single PR.

@@ -227,10 +229,11 @@ protected void doClose() {
keepAliveReaper.cancel();
}

public DfsSearchResult executeDfsPhase(ShardSearchRequest request) throws IOException {
public DfsSearchResult executeDfsPhase(ShardSearchRequest request, SearchTask task) throws IOException {
final SearchContext context = createAndPutContext(request);

This comment has been minimized.

Copy link
@nik9000

nik9000 Sep 9, 2016

Contributor

Maybe createContext should take the SearchTask as an argument? That'd make it very difficult to forget to set it.

This comment has been minimized.

Copy link
@jpountz

jpountz Sep 20, 2016

Contributor

+1

This comment has been minimized.

Copy link
@imotov

imotov Oct 13, 2016

Author Member

See @nik9000's comment bellow, this doesn't seem to be a good way to handle it since the task can change over time.

final SearchContext context = findContext(request.id());
SearchOperationListener operationListener = context.indexShard().getSearchOperationListener();
context.incRef();
try {
context.setTask(task);

This comment has been minimized.

Copy link
@nik9000

nik9000 Sep 9, 2016

Contributor

Oooh! Yeah! That makes sense - scrolls contexts will outlast that task that spawns them and need to get a new task.

I haven't read the whole PR yet, but I wonder if we should clear task from the context between requests? If we don't then we maintain a reference to it after the task manager and that seems a bit weird. Changes to it wouldn't be useful to anyone.

@nik9000

View changes

core/src/main/java/org/elasticsearch/search/SearchService.java Outdated
@@ -633,6 +643,7 @@ private void cleanContext(SearchContext context) {
try {
assert context == SearchContext.current();
context.clearReleasables(Lifetime.PHASE);
context.setTask(null);

This comment has been minimized.

Copy link
@nik9000

nik9000 Sep 9, 2016

Contributor

And here is the clear. Cool.

@nik9000

View changes

core/src/main/java/org/elasticsearch/search/dfs/DfsPhase.java Outdated
@@ -58,6 +59,9 @@ public void execute(SearchContext context) {
TermStatistics[] termStatistics = new TermStatistics[terms.length];
IndexReaderContext indexReaderContext = context.searcher().getTopReaderContext();
for (int i = 0; i < terms.length; i++) {
if(context.isCancelled()) {
throw new SearchContextException(context, "Cancelled");

This comment has been minimized.

Copy link
@nik9000

nik9000 Sep 9, 2016

Contributor

This one is upper case and the one below is lower case.

@nik9000

View changes

core/src/main/java/org/elasticsearch/search/internal/SearchContext.java Outdated
@@ -114,6 +116,18 @@ public ParseFieldMatcher parseFieldMatcher() {
return parseFieldMatcher;
}

public void setTask(SearchTask task) {

This comment has been minimized.

Copy link
@nik9000

nik9000 Sep 9, 2016

Contributor

It'd be nice to have javadoc for these, especially given that task changes for scrolls.

@nik9000

View changes

core/src/main/java/org/elasticsearch/search/internal/SearchContext.java Outdated
@@ -102,6 +103,7 @@ public static SearchContext current() {
private Map<Lifetime, List<Releasable>> clearables = null;
private final AtomicBoolean closed = new AtomicBoolean(false);
private InnerHitsContext innerHitsContext;
private SearchTask task;

This comment has been minimized.

Copy link
@nik9000

nik9000 Sep 9, 2016

Contributor

It is probably worth some javadoc explaining why this isn't final and why it is ok that it isn't volatile.

@nik9000

View changes

core/src/main/java/org/elasticsearch/search/query/CancellableCollector.java Outdated

@Override
public void collect(int doc) throws IOException {
if (searchContext.isCancelled()) {

This comment has been minimized.

Copy link
@nik9000

nik9000 Sep 9, 2016

Contributor

It is worth benchmarking this because it adds a volatile read. If it is too heavy we can make the check less frequent I imagine.

@nik9000

View changes

core/src/main/java/org/elasticsearch/search/query/QueryPhase.java Outdated
@@ -361,6 +361,15 @@ public TopDocs call() throws Exception {
}
}

if (collector != null && searchContext.getTask() != null) {
final Collector child = collector;

This comment has been minimized.

Copy link
@nik9000

nik9000 Sep 9, 2016

Contributor

A null task here is pretty weird, right?

@pickypg pickypg added the das awesome label Sep 9, 2016

@nik9000

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2016

@jpountz, would you like to have a look at this from a Lucene perspective?

@jpountz
Copy link
Contributor

left a comment

In general it looks good to me, however I am concerned that wrapping the collector can cause significant slow downs so I'd like to either remove it or make it an opt-in for now and see what other options we have. For instance I'm wondering that we could make the slow down more acceptable by checking whether tasks are cancelled at the bulk scorer level so that we can perform the check less often.

core/src/main/java/org/elasticsearch/transport/TransportService.java Outdated
handler.handleException(new TransportException(ex));
return;
}
sendRequest(node, action, request, options, handler);

This comment has been minimized.

Copy link
@jpountz

jpountz Sep 20, 2016

Contributor

nit: I'd prefer that we move the sendRequest call to the try block and remove the return from the catch block

core/src/main/java/org/elasticsearch/search/internal/SearchContext.java Outdated
}

public boolean isCancelled() {
return task != null && task.isCancelled();

This comment has been minimized.

Copy link
@jpountz

jpountz Sep 20, 2016

Contributor

I don't think task should ever be null at that point? Should we just assert that task is not null?

core/src/main/java/org/elasticsearch/search/query/QueryPhase.java Outdated
@@ -361,6 +361,15 @@ public TopDocs call() throws Exception {
}
}

if (collector != null) {
final Collector child = collector;
collector = new CancellableCollector(searchContext, collector);

This comment has been minimized.

Copy link
@jpountz

jpountz Sep 20, 2016

Contributor

This part can cause significant slow downs. There may be ways that we can make it better, eg. by doing it at the bulk scorer level so that we can perform the check less often. Can we remove it for now and work on making this part cancellable in another PR. Or at least make it an opt-in for now with a setting?

This comment has been minimized.

Copy link
@imotov

imotov Oct 13, 2016

Author Member

@jpountz what do you think about this 662e1f5e3a2e9db89511749fe104e922edbb86ad?

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 13, 2016

Contributor

I'm not a fan of the name (but I don't have better ideas) but this looks good to me.

@imotov imotov force-pushed the imotov:make-searches-cancellable branch Oct 13, 2016

@jpountz
Copy link
Contributor

left a comment

I left a few more comments about the search part of the PR.

core/src/main/java/org/elasticsearch/search/query/CancellableCollector.java Outdated
private final SearchContext searchContext;
private final boolean leafLevel;

public CancellableCollector(SearchContext searchContext, Collector in) {

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 13, 2016

Contributor

Can we avoid taking a SearchContext as it makes this class hard to unit test, maybe we could eg. replace the search context with a BooleanSupplier and use a method reference? Then can you add some basic unit tests to this class?

core/src/main/java/org/elasticsearch/search/profile/query/CollectorResult.java Outdated
@@ -45,6 +45,7 @@
public static final String REASON_SEARCH_MIN_SCORE = "search_min_score";
public static final String REASON_SEARCH_MULTI = "search_multi";
public static final String REASON_SEARCH_TIMEOUT = "search_timeout";
public static final String REASON_SEARCH_CANCELED = "search_canceled";

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 13, 2016

Contributor

should we be consistent with the number of l? there is only one here while you put 2 in cancellation? (feel free to ignore, it could easily a special case in english that I don't know about!)

core/src/main/java/org/elasticsearch/search/internal/SearchContext.java Outdated
@@ -220,6 +238,11 @@ public InnerHitsContext innerHits() {

public abstract void terminateAfter(int terminateAfter);

// Indicates if the current index should perform frequent low level search cancellation check

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 13, 2016

Contributor

Can you make it an actual javadoc and explain the trade-off?

@imotov imotov force-pushed the imotov:make-searches-cancellable branch Oct 17, 2016

@imotov

This comment has been minimized.

Copy link
Member Author

commented Oct 17, 2016

@jpountz I have implemented the changes that you have requested and rebased it against the current master since it quite a few things has changed. Would you mind taking another look when you have a chance?

@jpountz
Copy link
Contributor

left a comment

I left a question.

core/src/main/java/org/elasticsearch/search/query/CancellableCollector.java Outdated
private final boolean leafLevel;

/**
* Constractor

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 18, 2016

Contributor

construction contractor? :)

@@ -546,6 +567,7 @@ final SearchContext createContext(ShardSearchRequest request, @Nullable Engine.S
keepAlive = request.scroll().keepAlive().millis();
}
context.keepAlive(keepAlive);
context.lowLevelCancellation(lowLevelCancellation);

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 18, 2016

Contributor

why do we only do it in the fetch phase?

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 18, 2016

Contributor

sorry I just realized this is the createContext method and not executeFetchPhase as I initially thought.

then I'm wondering if we could have the lowLevelCancellation only on DefaultSearchContext rather than the SearchContext base class?

This comment has been minimized.

Copy link
@imotov

imotov Oct 19, 2016

Author Member

done

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Oct 20, 2016

The search part looks good to me.

@nik9000
Copy link
Contributor

left a comment

I left 19 review comments, mostly just comments and a few nits. Please make whatever changes you think are appropriate and merge when ready.

This is a huge start for canceling searches but I think it is worth expanding on the PR's description so folks know exactly what they are getting. Even with it's limitations I think this should be a release highlight for whatever release gets it. 5.1 I think....

/**
* Transport request handlers that is using task context
*/
public abstract class TaskAwareTransportRequestHandler<T extends TransportRequest> implements TransportRequestHandler<T> {

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 20, 2016

Contributor

I guess you could make this a final class that takes a functional interface as its only parameter and runs it and you could still use lambdas for it. My instincts are that that is marginally better from a design standpoint (composition vs inheritance) and a little easier to read but I'm not sure.

This comment has been minimized.

Copy link
@imotov

imotov Oct 20, 2016

Author Member

This sounds like an interesting idea, but I think we should do it on the TransportRequestHandler then and should, probably, be another PR.

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 21, 2016

Contributor

++

core/src/main/java/org/elasticsearch/search/SearchService.java Outdated
@@ -107,6 +109,8 @@
Setting.positiveTimeSetting("search.default_keep_alive", timeValueMinutes(5), Property.NodeScope);
public static final Setting<TimeValue> KEEPALIVE_INTERVAL_SETTING =
Setting.positiveTimeSetting("search.keep_alive_interval", timeValueMinutes(1), Property.NodeScope);
public static final Setting<Boolean> LOW_LEVEL_CANCELLATION_SETTING =

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 20, 2016

Contributor

Can haz javadoc?

core/src/main/java/org/elasticsearch/search/query/CancellableCollector.java Outdated

/**
* Constructor
* @param cancelled supplier of the cancellation flag

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 20, 2016

Contributor

Maybe mention that this amounts to a volatile read.

This comment has been minimized.

Copy link
@imotov

imotov Oct 20, 2016

Author Member

It's supplier implementation detail, isn't it? It seems to be wrong place to say something like this. However, the information about how often it will be called can be useful here. I will add that.

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 21, 2016

Contributor

Something like "This class assumes that the supplier is fast, with performance on the order of a volatile read." would give a lot of context to the decisions around how to use the Supplier.

core/src/test/java/org/elasticsearch/search/SearchCancellationIT.java Outdated
import static org.hamcrest.Matchers.greaterThan;

/**
*/

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 20, 2016

Contributor

Can you remove the empty javdoc? We are going to fail the build on those at some point....

core/src/test/java/org/elasticsearch/search/SearchCancellationIT.java Outdated

/**
*/
@ESIntegTestCase.ClusterScope(scope = ESIntegTestCase.Scope.SUITE)

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 20, 2016

Contributor

Can you explain why this is needed in a comment? I'm never sure when to use this and when not to so I figure it'd be nice to have it explained.

This comment has been minimized.

Copy link
@imotov

imotov Oct 20, 2016

Author Member

Because I want nodes with specific settings and I don't want other tests to run with these settings.

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 21, 2016

Contributor

Then could you do something like:

@ESIntegTestCase.ClusterScope(scope = ESIntegTestCase.Scope.SUITE) // Changes settings

That'd make it clear why it is needed so folks don't go copying it into their tests without knowing why... 👼

core/src/test/java/org/elasticsearch/search/SearchCancellationTests.java Outdated
for(int i=0; i<1000; i++) {
leafCollector.collect(0);
}
logger.info("took: {}", System.nanoTime() - start);

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 20, 2016

Contributor

If we keep this maybe we should rename it so it doesn't run by default? And javadoc it? It is fairly obvious what it is for but I expect it isn't worth doing it during a regular test run because no one is reading it....

This comment has been minimized.

Copy link
@imotov

imotov Oct 21, 2016

Author Member

Leftovers, deleted.

docs/reference/search.asciidoc Outdated
[[global-search-cancellation]]
== Search Cancellation

Long running searches can be cancelled using standard <<task-cancellation,task cancellation>>

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 20, 2016

Contributor

s/Long running search/Searchs/ ?

docs/reference/search.asciidoc Outdated
mechanism. By default, a running search only checks if it is cancelled or
not on segment boundaries, therefore the cancellation can be delayed by large
segments. The search cancellation responsiveness can be improved by setting
the dynamic cluster-level setting `search.low_level_cancellation` to `true`.

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 20, 2016

Contributor

Mention that it only effects searches after the change to the setting.

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 20, 2016

Contributor

Are we thinking of automatically opting certain thing into this setting? Maybe we can have search benchmarks that compare?

This comment has been minimized.

Copy link
@imotov

imotov Oct 21, 2016

Author Member

No, we are thinking about making it generally faster in the next iteration, so we can opt everything in.

@@ -150,7 +157,8 @@ This will yield the following result:
// TESTRESPONSE[s/"build_scorer": 42602/"build_scorer": $body.profile.shards.0.searches.0.query.0.children.1.breakdown.build_scorer/]
// TESTRESPONSE[s/"create_weight": 89323/"create_weight": $body.profile.shards.0.searches.0.query.0.children.1.breakdown.create_weight/]
// TESTRESPONSE[s/"next_doc": 2852/"next_doc": $body.profile.shards.0.searches.0.query.0.children.1.breakdown.next_doc/]
// TESTRESPONSE[s/"time": "0.06989100000ms"/"time": $body.profile.shards.0.searches.0.collector.0.time/]
// TESTRESPONSE[s/"time": "0.3043110000ms"/"time": $body.profile.shards.0.searches.0.collector.0.time/]
// TESTRESPONSE[s/"time": "0.03227300000ms"/"time": $body.profile.shards.0.searches.0.collector.0.children.0.time/]

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 20, 2016

Contributor

I'm sorry about this nightmare.....

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 20, 2016

Contributor

I'd love some way to say "just match anything in this position that looks like X" so you don't have to describe the whole path....

docs/reference/search/profile.asciidoc Outdated
We see a single collector named `SimpleTopScoreDocCollector`. This is the default "scoring and sorting" Collector
used by Elasticsearch. The `"reason"` field attempts to give a plain english description of the class name. The
We see a single collector named `SimpleTopScoreDocCollector` wrapped into `CancellableCollector`. `SimpleTopScoreDocCollector` is the default "scoring and sorting"
Collector used by Elasticsearch. The `"reason"` field attempts to give a plain english description of the class name. The

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 20, 2016

Contributor

Collector?

This comment has been minimized.

Copy link
@nik9000

nik9000 Oct 21, 2016

Contributor

Like, with the backticks.

Makes search action cancelable by task management API
Long running searches now can be cancelled using standard task cancellation mechanism.

@imotov imotov force-pushed the imotov:make-searches-cancellable branch to 17ad88d Oct 26, 2016

@imotov imotov merged commit 17ad88d into elastic:master Oct 26, 2016

1 of 2 checks passed

elasticsearch-ci Build started sha1 is merged.
Details
CLA Commit author is a member of Elasticsearch
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.