Default reindex to return first 50 bulk failures #20461

nik9000 · 2016-09-13T21:46:14Z

Reindex, update_by_query, and delete_by_query work in batches based
on the scroll size of their source query, defaulting to 1000 documents
at a time. If all 1000 of those documents fail this returns an
avalanche of errors which is just fine over the Java API and, probably,
fine for any programatic consumers over the REST API as well. But for
interactive users the size of the response can be devestating.

So this commit creates a URL parameter to control the number of
indexing failures returned in the response,
max_reported_bulk_failures, which defaults to 50. The option
also exists in the Transport Client for completeness sake but
defaults to Integer.MAX_VALUE because transport client users are
unlikely to suffer if the response is large. Even though the
default is Integer.MAX_VALUE, not more than batch_size errors
are ever returned because a single error causes reindex to abort
after processing the current batch.

Closes #20199

Reindex, update_by_query, and delete_by_query work in batches based on the scroll size of their source query, defaulting to 1000 documents at a time. If all 1000 of those documents fail this returns an avalanche of errors which is just fine over the Java API and, probably, fine for any programatic consumers over the REST API as well. But for interactive users the size of the response can be devestating. So this commit creates a URL parameter to control the number of indexing failures returned in the response, `max_reported_bulk_failures`, which defaults to 50. The option also exists in the Transport Client for completeness sake but defaults to `Integer.MAX_VALUE` because transport client users are unlikely to suffer if the response is large. Even though the default is `Integer.MAX_VALUE`, not more than `batch_size` errors are ever returned because a single error causes reindex to abort after processing the current batch. Closes elastic#20199

clintongormley · 2016-09-15T14:17:39Z

I don't think we need an override setting here. Reindex aborts when it gets error messages, and there is no indication of how many more exceptions would be thrown if reindex had just continued. I can't imagine a use case where a user will programmatically process each error case.

nik9000 · 2016-09-15T14:21:50Z

I can't imagine a use case where a user will programmatically process each error case.

I'm thinking mostly of shops where it takes a long time to push new code. If it takes two weeks to push a new reindex you really want all the error messages you can get. I suppose they could get them from the log though.

nik9000 added >enhancement review v6.0.0-alpha1 v5.1.1 labels Sep 13, 2016

Mpdreamz mentioned this pull request Sep 15, 2016

reindex can produce very large failures in API response #20199

Open

clintongormley added v5.2.0 and removed >enhancement v5.1.1 labels Dec 7, 2016

clintongormley added v5.3.0 and removed v5.2.0 labels Jan 24, 2017

nik9000 closed this Feb 3, 2017

clintongormley removed v5.3.0 v6.0.0-alpha1 labels Feb 5, 2017

lcawl added :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Reindex API labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default reindex to return first 50 bulk failures #20461

Default reindex to return first 50 bulk failures #20461

nik9000 commented Sep 13, 2016

clintongormley commented Sep 15, 2016

nik9000 commented Sep 15, 2016

Default reindex to return first 50 bulk failures #20461

Default reindex to return first 50 bulk failures #20461

Conversation

nik9000 commented Sep 13, 2016

clintongormley commented Sep 15, 2016

nik9000 commented Sep 15, 2016