Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add delete-by-query plugin #11516

Merged
merged 1 commit into from Jun 17, 2015

Conversation

Projects
None yet
7 participants
@tlrx
Copy link
Member

commented Jun 5, 2015

This pull request adds a new plugin called "delete-by-query" which implements the now deprecated delete-by-query feature using scan/scroll/bulk requests.

Notes:

  • size parameter controls the scroll shard_size and the number of actions in bulk requests (defaults to 1000)
  • timeout parameter can be used to stop scrolling documents after a given time
  • response now looks like this (here a node is killed during the DBQ execution):
{  
   "took":60866,
   "timed_out":false,
   "_indices":{  
      "_all":{  
         "found":531046,
         "deleted":79901,
         "missing":0,
         "failed":301702
      },
      "disposants-2014":{  
         "found":375702,
         "deleted":74000,
         "missing":0,
         "failed":301702
      },
      "beer":{  
         "found":5901,
         "deleted":5901,
         "missing":0,
         "failed":0
      }
   },
   "failures":[  
      {  
         "shard":-1,
         "index":null,
         "reason":{  
            "type":"node_not_connected_exception",
            "reason":"[Puck][inet[/192.168.1.16:9300]] Node not connected"
         }
      }
   ]
}

Since the process involves the execution of a scan request (which can fail), then successive async scroll requests (which can also fail) we may imagine a better failure reporting. When a scroll request succeed, the scrolled documents are added to a Bulk request executed in an async manner. If the bulk fails, all documents are reported as failed documents in the counter.

Rest API documentation and test will be added later.

}
scanRequest.source(source);

logger.debug("executing scan request");

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 6, 2015

Member

this probably need to be trace, action package has DEBUG enabled by default

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 6, 2015

Member

applies to other logging statements here

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 9, 2015

Author Member

Done

}

void executeScroll(final String scrollId) {
threadPool.generic().execute(new AbstractRunnable() {

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 6, 2015

Member

I don't think we need to have it execute on another thread pool, its perfectly fine just to go and execute it right away, it doesn't block on anything

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 9, 2015

Author Member

Ok

}

// Delete the scrolled documents using the Bulk API
threadPool.generic().execute(new AbstractRunnable() {

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 6, 2015

Member

I don't think we need to execute it on a thread pool, nothing blocking, we can do it without one and delegate to the async bulk execution

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 9, 2015

Author Member

Right

@kimchy

This comment has been minimized.

Copy link
Member

commented Jun 6, 2015

I left some minor comments around logging and usage of thread pool (not needed I think).

I could't follow why we need a semaphore and such, I think I a missing something. My thought was that we do search -> bulk -> search -> .... until there are no more results, so always async callback execution type chain until we are done.

@tlrx tlrx force-pushed the tlrx:delete-by-query branch from bd5d1b9 Jun 9, 2015

@tlrx

This comment has been minimized.

Copy link
Member Author

commented Jun 9, 2015

@kimchy thanks for your review! Your comments make sense, no need to use semaphore stuff... I rebased and updated the code, it is way simpler now.

I'll add some rest tests too.

@kimchy

View changes

...query/src/main/java/org/elasticsearch/action/deletebyquery/TransportDeleteByQueryAction.java Outdated
final String nextScrollId = scrollResponse.getScrollId();
addShardFailures(scrollResponse.getShardFailures());

if (logger.isDebugEnabled()) {

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 9, 2015

Member

this should be trace?

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 9, 2015

Author Member

raaaah yes

@kimchy

View changes

...query/src/main/java/org/elasticsearch/action/deletebyquery/TransportDeleteByQueryAction.java Outdated
}

void onBulkResponse(int bulkId, String scrollId, BulkResponse bulkResponse) {
if (logger.isDebugEnabled()) {

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 9, 2015

Member

trace? I don't think we need the check, btw (isDebug)

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 9, 2015

Member

this applies to the rest of the logging statements here, so won't comment on those as well

@s1monw

View changes

plugins/delete-by-query/pom.xml Outdated

<properties>
<!-- You can add any specific project property here -->
<tests.jvms>1</tests.jvms>

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 9, 2015

Contributor

why do we need 1 JVM here only?

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 9, 2015

Author Member

Wrong copy paste, we don't need to limit to 1 JVM.

@Override
public void readFrom(StreamInput in) throws IOException {
super.readFrom(in);
index = in.readString();

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 9, 2015

Member

we need to serialize the counters here as well

@tlrx

This comment has been minimized.

Copy link
Member Author

commented Jun 10, 2015

@s1monw thanks for your review. I updated the code following your comment and added a REST test. Can you please have another look if possible? Thanks :)

Documentation will be added in another PR.

@s1monw

View changes

...lete-by-query/src/main/java/org/elasticsearch/action/deletebyquery/DeleteByQueryRequest.java Outdated
out.writeBoolean(false);
} else {
out.writeBoolean(true);
out.writeVLong(timeout);

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

if you write a VLong make sure it's not negative!

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

Ok, timeout has been changed to TimeValue

@s1monw

View changes

...lete-by-query/src/main/java/org/elasticsearch/action/deletebyquery/DeleteByQueryRequest.java Outdated

private Scroll scroll = new Scroll(TimeValue.timeValueMinutes(10));

private Long timeout;

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

IMO timeout should be a TimeValue

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

Ok, timeout has been changed to TimeValue

@s1monw

View changes

...lete-by-query/src/main/java/org/elasticsearch/action/deletebyquery/DeleteByQueryRequest.java Outdated

private String routing;

private Integer size = 1_000;

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

can this be int rather than Integer?

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

Sure.

I also think that a default size of 1000 docs per shard is not a good idea... Let's use the default value of 10.

}

public DeleteByQueryRequest indicesOptions(IndicesOptions indicesOptions) {
if (indicesOptions == null) {

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

maybe validate size and timeout too here?

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

Ok

out.writeStringArray(types);
out.writeBytesReference(source);
out.writeOptionalString(routing);
out.writeVInt(size);

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

watach out size must be positive here though

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

Ok. size is initialized to 0 and can't be set to a negative value.

For my education, writeVInt does not support negative values at all or it's just a question of better serialization?

@s1monw

View changes

...lete-by-query/src/main/java/org/elasticsearch/action/deletebyquery/DeleteByQueryRequest.java Outdated
out.writeBytesReference(source);
out.writeOptionalString(routing);
out.writeVInt(size);
if (scroll == null) {

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

you can use writeOptionalStreamable here?

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

Ok

@s1monw

View changes

...lete-by-query/src/main/java/org/elasticsearch/action/deletebyquery/DeleteByQueryRequest.java Outdated
} catch (Exception e) {
// ignore
}
return "[" + Arrays.toString(indices) + "][" + Arrays.toString(types) + "], source[" + sSource + "]";

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

maybe prepend that this is a delete by query?

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

Ok

@s1monw

View changes

...y-query/src/main/java/org/elasticsearch/action/deletebyquery/IndexDeleteByQueryResponse.java Outdated
}

IndexDeleteByQueryResponse(String index) {
super();

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

super crot is not needed

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

Right

@s1monw

View changes

...y-query/src/main/java/org/elasticsearch/action/deletebyquery/IndexDeleteByQueryResponse.java Outdated
}

public IndexDeleteByQueryResponse(String index, long found, long deleted, long missing, long failed) {
this.index = index;

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

call this(index) here?

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

Yes

@s1monw

View changes

...y-query/src/main/java/org/elasticsearch/action/deletebyquery/IndexDeleteByQueryResponse.java Outdated
}

public void incrementFound(long delta) {
this.found = found + delta;

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

can assert here everywhere that they never become negative

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

Sure. I also added some unit tests for that.

@s1monw

View changes

...query/src/main/java/org/elasticsearch/action/deletebyquery/TransportDeleteByQueryAction.java Outdated
AsyncDeleteByQueryAction(DeleteByQueryRequest request, ActionListener<DeleteByQueryResponse> listener) {
this.request = request;
this.listener = listener;
this.startTime = System.currentTimeMillis();

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

maybe we wanna use ThreadPool.estimatedTimeInMillis() here?

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

I didn't know about this class but for what I saw it captures the system current time at regular intervals, trading off a less time precision for better performance, right?

If so, I think we can use it here since millisecond precision for this potentially long process is not really necessary.

@s1monw

View changes

...query/src/main/java/org/elasticsearch/action/deletebyquery/TransportDeleteByQueryAction.java Outdated
return;
}

if (isTimedOut()) {

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

can we call it hasTimedOut?

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

Yes

@s1monw

View changes

...query/src/main/java/org/elasticsearch/action/deletebyquery/TransportDeleteByQueryAction.java Outdated
}

private boolean isTimedOut() {
return request.timeout() != null && (System.currentTimeMillis() >= (startTime + request.timeout().millis()));

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

also use the Threadpool estimations here mabye?

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

Yes

@s1monw

View changes

...elete-by-query/src/main/java/org/elasticsearch/plugin/deletebyquery/DeleteByQueryPlugin.java Outdated
public Collection<Module> modules(Settings settings) {
Collection<Module> modules = new ArrayList<>();
modules.add(new DeleteByQueryModule());
return modules;

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

just call return Arrays.asList(new DeleteByQueryModule())

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

Ok

listener.onFailure(e);
}
});
} catch (Throwable t) {

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

I think you should have such a catch clause in every method in here since might easily hang the entire requires if there is a simple assertion tripped elsewhere

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

I agree, good catch, I should have thought of that. I reuse the finishHim method when possible in order to clean the scrolIId and release search context and report failures.

@s1monw

View changes

...query/src/main/java/org/elasticsearch/action/deletebyquery/TransportDeleteByQueryAction.java Outdated
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicLong;

public class TransportDeleteByQueryAction extends HandledTransportAction<DeleteByQueryRequest, DeleteByQueryResponse> {

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

If possible I'd love to have a single node test that really calls all these functions directly to ensure we tests all the corner cases, makse sense?

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

can you document this class and state what the capabilities of this plugin are? I think we should make clear to the user that this is best effort, non-transactional, memory intensive etc. etc. We should maybe also think about having a shortcut if there is a match all query passed to this. I think folks to this all the time so I wonder if we should have such a destructive operation as feature on this as well behind the scenes. Doesn't need to be added now but in general we can think about adding this to IndexShard / Engine since in lucene that is quite fast.

This comment has been minimized.

Copy link
@tlrx

tlrx Jun 16, 2015

Author Member

I created the TransportDeleteByQueryActionTests to test the methods, let me know what you think about it.

We should maybe also think about having a shortcut if there is a match all query passed to this. I think folks to this all the time so I wonder if we should have such a destructive operation as feature on this as well behind the scenes. Doesn't need to be added now but in general we can think about adding this to IndexShard / Engine since in lucene that is quite fast.

I created an issue for that: #11700

import org.elasticsearch.action.deletebyquery.TransportDeleteByQueryAction;
import org.elasticsearch.rest.action.deletebyquery.RestDeleteByQueryAction;
import org.elasticsearch.rest.RestModule;

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2015

Contributor

can we have a doc-string here explaining what this thing does and why. Ie. I'd also like to have some docs in here that explain why it's not in core anymore etc. if you need help I am happy to explain

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2015

looks pretty good though. I left a bunch of comments

@tlrx

This comment has been minimized.

Copy link
Member Author

commented Jun 16, 2015

@s1monw thanks a lot for your review, very valuable. I updated the code following your comments, please let me know if there are still things to improve.

I'd love to have your help on writing documentation for this plugin, since I'm not sure to be able to explain all fallacies of the previous implementation.

@tlrx tlrx force-pushed the tlrx:delete-by-query branch Jun 16, 2015

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jun 16, 2015

I'd love to have your help on writing documentation for this plugin, since I'm not sure to be able to explain all fallacies of the previous implementation.

lets get this in as is and open an issue for the documentation I will take a look at comment on it what aspects I would take into account?

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jun 16, 2015

oh yeah so here is my LGTM ;)

Add delete-by-query plugin
The delete by query plugin adds support for deleting all of the documents (from one or more indices) which match the specified query. It is a replacement for the problematic delete-by-query functionality which has been removed from Elasticsearch core in 2.0. Internally, it uses the Scan/Scroll and Bulk APIs to delete documents in an efficient and safe manner. It is slower than the old delete-by-query functionality, but fixes the problems with the previous implementation.

Closes #7052

@tlrx tlrx force-pushed the tlrx:delete-by-query branch to ba35406 Jun 17, 2015

@tlrx tlrx merged commit ba35406 into elastic:master Jun 17, 2015

1 check passed

CLA Commit author is a member of Elasticsearch
Details

@kevinkluge kevinkluge removed the review label Jun 17, 2015

@tlrx

This comment has been minimized.

Copy link
Member Author

commented Jun 17, 2015

@s1monw thanks!

I created #11723 for the java doc aspect.

@dadoonet

This comment has been minimized.

@tlrx I think we should write here addValidationError("source is missing", null); (actually IntelliJ is throwing a warning about it when compiling)

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Jun 23, 2015

Add documentation for delete by query plugin (see elastic#11516)
This page is placed in a /plugins directory until we figure where to place all plugins documentation.

@tlrx tlrx deleted the tlrx:delete-by-query branch May 19, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.