Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch default batch size for reindex to 1000 #18340

Merged
merged 1 commit into from May 16, 2016

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented May 13, 2016

1000 is likely to perform much better.

@nik9000
Copy link
Member Author

nik9000 commented May 13, 2016

@tlrx want to review this?

@clintongormley we discussed this earlier today.

@clintongormley
Copy link

LGTM

@tlrx
Copy link
Member

tlrx commented May 14, 2016

LGTM too :)

@tlrx tlrx removed the review label May 14, 2016
@nik9000 nik9000 force-pushed the reindex_bigger_batch_size branch from 5f65a4e to f569576 Compare May 16, 2016 12:25
@nik9000 nik9000 merged commit f569576 into elastic:master May 16, 2016
@nik9000
Copy link
Member Author

nik9000 commented May 16, 2016

Thanks for reviewing @clintongormley and @tlrx !

@clintongormley
Copy link

@nik9000 Would you backport this to 2.x as well please?

@nik9000
Copy link
Member Author

nik9000 commented May 20, 2016

Sure. I'll head back to the 2.x branch later today if possible.

@clintongormley
Copy link

thanks - btw i've already doc'ed the batch size as 1000 in 2.x (see #18484)

@clintongormley
Copy link

@nik9000 scratch my previous comment - i've reverted that commit as I see you include the docs in this PR

@nik9000
Copy link
Member Author

nik9000 commented May 23, 2016

2.x: 62bf6c8

@clintongormley
Copy link

thanks @nik9000

@persiarash
Copy link

Thanks @nik9000, but out of necessity, is there a way to configure the batch size, even if it isn't through the Java API? What I mean is that even something like a server-side configuration would do.

@nik9000
Copy link
Member Author

nik9000 commented May 25, 2016

You can't change the default but you can set it on each request. Of hand I
believe it is called batch_size in update_by_query and it is the "size"
field in the "source" object of a redirect.
On May 24, 2016 7:23 PM, "persiarash" notifications@github.com wrote:

Thanks @nik9000 https://github.com/nik9000, but out of necessity, is
there a way to configure the batch size, even if it isn't through the Java
API? What I mean is that even something like a server-side configuration
would do.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#18340 (comment)

@persiarash
Copy link

@nik9000 thanks for the quick reply.

I'm aware of the ability to change the batch size via the HTTP REST API, I believe it is called "scroll_size" there, however I don't see a way to set the same parameter via the Java API. All I see through the Java API is a "size", which is actually the hard cap on the total number of objects to process, not the batch size.

That's why I was hoping that there might be a way to configure this value on the server itself, via the yaml properties file. Because I don't think as a Java consumer of ES I have any other option (unless of course I write my own Java HTTP client to interface w/ ES via its HTTP API).

@nik9000
Copy link
Member Author

nik9000 commented May 25, 2016

In the Java API it is the size parameter on the search request.
On May 24, 2016 10:28 PM, "persiarash" notifications@github.com wrote:

@nik9000 https://github.com/nik9000 thanks for the quick reply.

I'm aware of the ability to change the batch size via the HTTP REST API, I
believe it is called "scroll_size" there, however I don't see a way to set
the same parameter via the Java API. All I see through the Java API is a
"size", which is actually the hard cap on the total number of objects to
process, not the batch size.

That's why I was hoping that there might be a way to configure this value
on the server itself, via the yaml properties file. Because I don't think
as a Java consumer of ES I have any other option (unless of course I write
my own Java HTTP client to interface w/ ES via its HTTP API).


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#18340 (comment)

@persiarash
Copy link

persiarash commented May 25, 2016

@nik9000 Are you sure? I tested this earlier today by setting size to 1000, what I got was an update which updated exactly 1000 items, and did so in 19 batches (1000 updates + 835 conflicts = 1835 total / 19 batches = batch-size of 100, which is the default for ES 2.3)

For your reference this is how I'm building it:

UpdateByQueryRequestBuilder requestBuilder = UpdateByQueryAction.INSTANCE.newRequestBuilder(CLIENT)
.source(toStringArray(indexIds))
.filter(queryBuilder)
.abortOnVersionConflict(false)
.consistency(WriteConsistencyLevel.DEFAULT)
.size(1000) // This does NOT work, it sets maximum results count not batch size.
.script(updateScript);

@nik9000
Copy link
Member Author

nik9000 commented May 26, 2016

This ought to work:

UpdateByQueryRequestBuilder requestBuilder = UpdateByQueryAction.INSTANCE.newRequestBuilder(CLIENT)
.source(toStringArray(indexIds))
.filter(queryBuilder)
.abortOnVersionConflict(false)
.consistency(WriteConsistencyLevel.DEFAULT)
.script(updateScript);
requestBuilder.source().size(1000);

@persiarash
Copy link

Thank you very much @nik9000, that (almost) did the trick! The solution for anyone else who is interested ended up being:
requestBuilder.source().setSize(1000);

Just FYI the Javadocs are VERY misleading about this, i.e.:
.setSize() => "The number of search hits to return. Defaults to 10."
Of course in this scenario it is actually the batch size, and it defaults to 100 (or 1000 in 2.4+)

@nik9000
Copy link
Member Author

nik9000 commented May 27, 2016

Javadocs are VERY misleading

Yeah, part of the reason for that is that reindex is reusing SearchRequestBuilder to give you control over the search request it uses to start the scroll. So everything under source() from a scrolling perspective. But SearchRequestBuilder javadocs don't think about scrolling.

@lcawl lcawl added :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Reindex API labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement v2.4.0 v5.0.0-alpha3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants