New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch default batch size for reindex to 1000 #18340
Conversation
@tlrx want to review this? @clintongormley we discussed this earlier today. |
LGTM |
LGTM too :) |
5f65a4e
to
f569576
Compare
Thanks for reviewing @clintongormley and @tlrx ! |
@nik9000 Would you backport this to 2.x as well please? |
Sure. I'll head back to the 2.x branch later today if possible. |
thanks - btw i've already doc'ed the batch size as 1000 in 2.x (see #18484) |
@nik9000 scratch my previous comment - i've reverted that commit as I see you include the docs in this PR |
2.x: 62bf6c8 |
thanks @nik9000 |
Thanks @nik9000, but out of necessity, is there a way to configure the batch size, even if it isn't through the Java API? What I mean is that even something like a server-side configuration would do. |
You can't change the default but you can set it on each request. Of hand I
|
@nik9000 thanks for the quick reply. I'm aware of the ability to change the batch size via the HTTP REST API, I believe it is called "scroll_size" there, however I don't see a way to set the same parameter via the Java API. All I see through the Java API is a "size", which is actually the hard cap on the total number of objects to process, not the batch size. That's why I was hoping that there might be a way to configure this value on the server itself, via the yaml properties file. Because I don't think as a Java consumer of ES I have any other option (unless of course I write my own Java HTTP client to interface w/ ES via its HTTP API). |
In the Java API it is the size parameter on the search request.
|
@nik9000 Are you sure? I tested this earlier today by setting size to 1000, what I got was an update which updated exactly 1000 items, and did so in 19 batches (1000 updates + 835 conflicts = 1835 total / 19 batches = batch-size of 100, which is the default for ES 2.3) For your reference this is how I'm building it: UpdateByQueryRequestBuilder requestBuilder = UpdateByQueryAction.INSTANCE.newRequestBuilder(CLIENT) |
This ought to work:
|
Thank you very much @nik9000, that (almost) did the trick! The solution for anyone else who is interested ended up being: Just FYI the Javadocs are VERY misleading about this, i.e.: |
Yeah, part of the reason for that is that reindex is reusing SearchRequestBuilder to give you control over the search request it uses to start the scroll. So everything under |
1000 is likely to perform much better.