Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default reindex to return first 50 bulk failures #20461

Closed
wants to merge 1 commit into from

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Sep 13, 2016

Reindex, update_by_query, and delete_by_query work in batches based
on the scroll size of their source query, defaulting to 1000 documents
at a time. If all 1000 of those documents fail this returns an
avalanche of errors which is just fine over the Java API and, probably,
fine for any programatic consumers over the REST API as well. But for
interactive users the size of the response can be devestating.

So this commit creates a URL parameter to control the number of
indexing failures returned in the response,
max_reported_bulk_failures, which defaults to 50. The option
also exists in the Transport Client for completeness sake but
defaults to Integer.MAX_VALUE because transport client users are
unlikely to suffer if the response is large. Even though the
default is Integer.MAX_VALUE, not more than batch_size errors
are ever returned because a single error causes reindex to abort
after processing the current batch.

Closes #20199

Reindex, update_by_query, and delete_by_query work in batches based
on the scroll size of their source query, defaulting to 1000 documents
at a time. If all 1000 of those documents fail this returns an
avalanche of errors which is just fine over the Java API and, probably,
fine for any programatic consumers over the REST API as well. But for
interactive users the size of the response can be devestating.

So this commit creates a URL parameter to control the number of
indexing failures returned in the response,
`max_reported_bulk_failures`, which defaults to 50. The option
also exists in the Transport Client for completeness sake but
defaults to `Integer.MAX_VALUE` because transport client users are
unlikely to suffer if the response is large. Even though the
default is `Integer.MAX_VALUE`, not more than `batch_size` errors
are ever returned because a single error causes reindex to abort
after processing the current batch.

Closes elastic#20199
@clintongormley
Copy link

I don't think we need an override setting here. Reindex aborts when it gets error messages, and there is no indication of how many more exceptions would be thrown if reindex had just continued. I can't imagine a use case where a user will programmatically process each error case.

@nik9000
Copy link
Member Author

nik9000 commented Sep 15, 2016

I can't imagine a use case where a user will programmatically process each error case.

I'm thinking mostly of shops where it takes a long time to push new code. If it takes two weeks to push a new reindex you really want all the error messages you can get. I suppose they could get them from the log though.

@nik9000 nik9000 closed this Feb 3, 2017
@lcawl lcawl added :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Reindex API labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants