New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete api: remove broadcast delete if routing is missing when required #10136
Delete api: remove broadcast delete if routing is missing when required #10136
Conversation
897bd76
to
8649abb
Compare
+1 LGTM nice stats!! can you update the migration guide to ensure we mention it? |
good point @s1monw I just updated the migration guide. |
This commit changes the behaviour of the delete api when processing a delete request that refers to a type that has routing set to required in the mapping, and the routing is missing in the request. Up until now the delete api sent a broadcast delete request to all of the shards that belong to the index, making sure that the document could be found although the routing value wasn't specified. This was probably not the best choice: if the routing is set to required, an error should be thrown instead. A `RoutingMissingException` gets now thrown instead, like it happens in the same situation with every other api (index, update, get etc.). Last but not least, this change allows to get rid of a couple of `TransportAction`s, `Request`s and `Response`s and simplify the codebase. Closes elastic#9123 Closes elastic#10136
49019fa
to
4348959
Compare
As part of elastic#10136 we removed the transport action for broadcast deletes in case routing is required but not specified. Bulk api worked differently though and kept on doing the broadcast delete internally in that case. This commit makes sure that delete items are marked as failed in such cases. Also the check has been moved up in the code together with the existing check for the update api, and we now make sure that the exception is the same as the one thrown for single document apis (delete/update). Note that the failure for the update api contained the wrong optype (the type of the document rather than "update"), that's been fixed too and tested. Closes elastic#16645
As part of elastic#10136 we removed the transport action for broadcast deletes in case routing is required but not specified. Bulk api worked differently though and kept on doing the broadcast delete internally in that case. This commit makes sure that delete items are marked as failed in such cases. Also the check has been moved up in the code together with the existing check for the update api, and we now make sure that the exception is the same as the one thrown for single document apis (delete/update). Note that the failure for the update api contained the wrong optype (the type of the document rather than "update"), that's been fixed too and tested. Closes elastic#16645
As part of elastic#10136 we removed the transport action for broadcast deletes in case routing is required but not specified. Bulk api worked differently though and kept on doing the broadcast delete internally in that case. This commit makes sure that delete items are marked as failed in such cases. Also the check has been moved up in the code together with the existing check for the update api, and we now make sure that the exception is the same as the one thrown for single document apis (delete/update). Note that the failure for the update api contained the wrong optype (the type of the document rather than "update"), that's been fixed too and tested. Closes elastic#16645
The docs state:
This doesn't seem to be the case anymore, unfortunately :-( :-( :-( |
this is a leftover @mrkamel sorry about that. Care to send a PR? |
is it somehow possible to do a broadcast? i use the mysql binlog to to have ES in sync - and at the time of the delete there is no way to know the routing key :-( |
@mrkamel use delete-by-query instead |
thx, ... unfortunately, delete-by-query doesn't seem to work for this case, as the query only sees records after index refresh, right? So, this has consistency issues for records that are deleted right after being created. |
Hi @mrkamel I see what you mean. May I ask how come that you know the routing key when you index documents but not when deleting? sorry if I previously missed that. I also wonder if at this point it wouldn't be better for you to use ordinary routing rather than custom routing keys. |
@javanna i) i'm using mysql with binlog_row_image=minimal. Thus, my binlog-to-es-replicator has the PK only ... no chance to get the routing key for delete operations. I'll probably switch to binlog_row_image=full, but this will have huge performance implications - i need to check ii) I can't use ordinary routing, because for a statistical/reporting app, i'm heavily using aggregations - and to get correct counts per bucket, i have to use custom routing. Having an option to at least explicitly state, that you want the request to be broadcasted would be awsome - or - to explicitly state to which shard you want the request to be sent, such that you can broadcast "manually" would be okish. |
@javanna @clintongormley wondering if a bit more context can be supplied around this and if you guys will consider rethinking the necessity of this change. I got hit hard by this change today, we don't use custom routing but we have parent/child mappings which in ES 2 automatically sets Deletes however aren't as straight forward. Even the Delete API documentation mentions that routing values aren't often known at the time of a delete:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html#delete-routing The breaking changes documentation only mentions 'custom routing' in regards to the delete API - https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_20_crud_and_routing_changes.html#_delete_api_with_custom_routing. I didn't take this to mean deleting child docs would also be affected. When using custom routing typically that routing value would be static and known at all times, but with parent/child configurations, it now requires extra leg work to look up associated parent ids. I can understand from a purely correctness point of view that the routing should be enforced but from a practical perspective it makes the APIs (including bulk) much harder to consume. |
…e is specified This note in the delete api about broadcasting to all shards is a leftover that should have been removed when the broadcasting feature was removed Relates to #10136
…e is specified This note in the delete api about broadcasting to all shards is a leftover that should have been removed when the broadcasting feature was removed Relates to #10136
…e is specified This note in the delete api about broadcasting to all shards is a leftover that should have been removed when the broadcasting feature was removed Relates to #10136
…e is specified This note in the delete api about broadcasting to all shards is a leftover that should have been removed when the broadcasting feature was removed Relates to #10136
…e is specified This note in the delete api about broadcasting to all shards is a leftover that should have been removed when the broadcasting feature was removed Relates to #10136
…e is specified This note in the delete api about broadcasting to all shards is a leftover that should have been removed when the broadcasting feature was removed Relates to #10136
Hi @rayward I amended the documentation, sorry about the leftover. I think that broadcasting the delete request to all shards was a wrong tradeoff. Trying the request on all shards is a scary thing, also made worse by the problems caused in the bulk api as described in #16645 (comment). Ideally one would always provide the routing value when deleting a document. It is certainly a shame if this requirement causes problems to our users. We recommend using delete by query when the routing value is not known. |
i'd like to mention that delete by query doesn't seem to be a perfect fit, because it seems much slower and i guess you need to refresh the index before the call to be sure to not miss docs. |
You are right @mrkamel delete by query executes a search. The perfect fit is providing the routing values when they were used at index time. Or even consider not using custom routing as a whole given that it complicates things in some cases. |
Thanks for amending the docs @javanna, I guess I can understand sending those cluster wide requests could be problematic, especially on indexes with many shards. |
This commit changes the behaviour of the delete api when processing a delete request that refers to a type that has routing set to required in the mapping, and the routing is missing in the request. Up until now the delete api sent a broadcast delete request to all of the shards that belong to the index, making sure that the document could be found although the routing value wasn't specified. This was probably not the best choice: if the routing is set to required, an error should be thrown instead.
A
RoutingMissingException
gets now thrown instead, like it happens in the same situation with every other api (index, update, get etc.). Last but not least, this change allows to get rid of a couple ofTransportAction
s,Request
s andResponse
s and simplify the codebase.Closes #9123