Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete api: remove broadcast delete if routing is missing when required #10136

Merged
merged 1 commit into from Mar 20, 2015

Conversation

javanna
Copy link
Member

@javanna javanna commented Mar 18, 2015

This commit changes the behaviour of the delete api when processing a delete request that refers to a type that has routing set to required in the mapping, and the routing is missing in the request. Up until now the delete api sent a broadcast delete request to all of the shards that belong to the index, making sure that the document could be found although the routing value wasn't specified. This was probably not the best choice: if the routing is set to required, an error should be thrown instead.

A RoutingMissingException gets now thrown instead, like it happens in the same situation with every other api (index, update, get etc.). Last but not least, this change allows to get rid of a couple of TransportActions, Requests and Responses and simplify the codebase.

Closes #9123

@javanna javanna force-pushed the enhancement/remove_broadcast_delete branch from 897bd76 to 8649abb Compare March 18, 2015 13:26
@javanna javanna added the :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. label Mar 18, 2015
@s1monw
Copy link
Contributor

s1monw commented Mar 18, 2015

+1 LGTM nice stats!! can you update the migration guide to ensure we mention it?

@javanna
Copy link
Member Author

javanna commented Mar 18, 2015

good point @s1monw I just updated the migration guide.

This commit changes the behaviour of the delete api when processing a delete request that refers to a type that has routing set to required in the mapping, and the routing is missing in the request. Up until now the delete api sent a broadcast delete request to all of the shards that belong to the index, making sure that the document could be found although the routing value wasn't specified. This was probably not the best choice: if the routing is set to required, an error should be thrown instead.

A `RoutingMissingException` gets now thrown instead, like it happens in the same situation with every other api (index, update, get etc.). Last but not least, this change allows to get rid of a couple of `TransportAction`s, `Request`s and `Response`s and simplify the codebase.

Closes elastic#9123
Closes elastic#10136
@javanna javanna force-pushed the enhancement/remove_broadcast_delete branch from 49019fa to 4348959 Compare March 20, 2015 08:34
@javanna javanna removed the review label Mar 20, 2015
@javanna javanna merged commit 4348959 into elastic:master Mar 20, 2015
javanna added a commit to javanna/elasticsearch that referenced this pull request Feb 27, 2016
As part of elastic#10136 we removed the transport action for broadcast deletes in case routing is required but not specified. Bulk api worked differently though and kept on doing the broadcast delete internally in that case. This commit makes sure that delete items are marked as failed in such cases. Also the check has been moved up in the code together with the existing check for the update api, and we now make sure that the exception is the same as the one thrown for single document apis (delete/update).

Note that the failure for the update api contained the wrong optype (the type of the document rather than "update"), that's been fixed too and tested.

Closes elastic#16645
javanna added a commit to javanna/elasticsearch that referenced this pull request Feb 29, 2016
As part of elastic#10136 we removed the transport action for broadcast deletes in case routing is required but not specified. Bulk api worked differently though and kept on doing the broadcast delete internally in that case. This commit makes sure that delete items are marked as failed in such cases. Also the check has been moved up in the code together with the existing check for the update api, and we now make sure that the exception is the same as the one thrown for single document apis (delete/update).

Note that the failure for the update api contained the wrong optype (the type of the document rather than "update"), that's been fixed too and tested.

Closes elastic#16645
javanna added a commit to javanna/elasticsearch that referenced this pull request Feb 29, 2016
As part of elastic#10136 we removed the transport action for broadcast deletes in case routing is required but not specified. Bulk api worked differently though and kept on doing the broadcast delete internally in that case. This commit makes sure that delete items are marked as failed in such cases. Also the check has been moved up in the code together with the existing check for the update api, and we now make sure that the exception is the same as the one thrown for single document apis (delete/update).

Note that the failure for the update api contained the wrong optype (the type of the document rather than "update"), that's been fixed too and tested.

Closes elastic#16645
@mrkamel
Copy link

mrkamel commented Jul 14, 2016

The docs state:

Many times, the routing value is not known when deleting a document. For those cases, when specifying the _routing mapping as required, and no routing value is specified, the delete will be broadcast automatically to all shards.

This doesn't seem to be the case anymore, unfortunately :-( :-( :-(

@javanna
Copy link
Member Author

javanna commented Jul 14, 2016

this is a leftover @mrkamel sorry about that. Care to send a PR?

@mrkamel
Copy link

mrkamel commented Jul 14, 2016

is it somehow possible to do a broadcast? i use the mysql binlog to to have ES in sync - and at the time of the delete there is no way to know the routing key :-(

@clintongormley
Copy link

@mrkamel use delete-by-query instead

@mrkamel
Copy link

mrkamel commented Jul 15, 2016

thx, ... unfortunately, delete-by-query doesn't seem to work for this case, as the query only sees records after index refresh, right? So, this has consistency issues for records that are deleted right after being created.

@javanna
Copy link
Member Author

javanna commented Jul 15, 2016

Hi @mrkamel I see what you mean. May I ask how come that you know the routing key when you index documents but not when deleting? sorry if I previously missed that. I also wonder if at this point it wouldn't be better for you to use ordinary routing rather than custom routing keys.

@mrkamel
Copy link

mrkamel commented Jul 15, 2016

@javanna i) i'm using mysql with binlog_row_image=minimal. Thus, my binlog-to-es-replicator has the PK only ... no chance to get the routing key for delete operations. I'll probably switch to binlog_row_image=full, but this will have huge performance implications - i need to check ii) I can't use ordinary routing, because for a statistical/reporting app, i'm heavily using aggregations - and to get correct counts per bucket, i have to use custom routing.

Having an option to at least explicitly state, that you want the request to be broadcasted would be awsome - or - to explicitly state to which shard you want the request to be sent, such that you can broadcast "manually" would be okish.

@rayward
Copy link

rayward commented Aug 8, 2016

@javanna @clintongormley wondering if a bit more context can be supplied around this and if you guys will consider rethinking the necessity of this change.

I got hit hard by this change today, we don't use custom routing but we have parent/child mappings which in ES 2 automatically sets {"_routing": {"required": true}} on the child type. This is fine for indexing as it enforces the requirement for child docs existing on the same shard as the parent doc. It's also easy to fulfil this requirement as we have that data readily available.

Deletes however aren't as straight forward. Even the Delete API documentation mentions that routing values aren't often known at the time of a delete:

Many times, the routing value is not known when deleting a document. For those cases, when specifying the _routing mapping as required, and no routing value is specified, the delete will be broadcast automatically to all shards.

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html#delete-routing

The breaking changes documentation only mentions 'custom routing' in regards to the delete API - https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_20_crud_and_routing_changes.html#_delete_api_with_custom_routing. I didn't take this to mean deleting child docs would also be affected.

When using custom routing typically that routing value would be static and known at all times, but with parent/child configurations, it now requires extra leg work to look up associated parent ids.

I can understand from a purely correctness point of view that the routing should be enforced but from a practical perspective it makes the APIs (including bulk) much harder to consume.

javanna added a commit that referenced this pull request Aug 8, 2016
…e is specified

This note in the delete api about broadcasting to all shards is a leftover that should have been removed when the broadcasting feature was removed

Relates to #10136
javanna added a commit that referenced this pull request Aug 8, 2016
…e is specified

This note in the delete api about broadcasting to all shards is a leftover that should have been removed when the broadcasting feature was removed

Relates to #10136
javanna added a commit that referenced this pull request Aug 8, 2016
…e is specified

This note in the delete api about broadcasting to all shards is a leftover that should have been removed when the broadcasting feature was removed

Relates to #10136
javanna added a commit that referenced this pull request Aug 8, 2016
…e is specified

This note in the delete api about broadcasting to all shards is a leftover that should have been removed when the broadcasting feature was removed

Relates to #10136
javanna added a commit that referenced this pull request Aug 8, 2016
…e is specified

This note in the delete api about broadcasting to all shards is a leftover that should have been removed when the broadcasting feature was removed

Relates to #10136
javanna added a commit that referenced this pull request Aug 8, 2016
…e is specified

This note in the delete api about broadcasting to all shards is a leftover that should have been removed when the broadcasting feature was removed

Relates to #10136
@javanna
Copy link
Member Author

javanna commented Aug 8, 2016

Hi @rayward I amended the documentation, sorry about the leftover. I think that broadcasting the delete request to all shards was a wrong tradeoff. Trying the request on all shards is a scary thing, also made worse by the problems caused in the bulk api as described in #16645 (comment).

Ideally one would always provide the routing value when deleting a document. It is certainly a shame if this requirement causes problems to our users. We recommend using delete by query when the routing value is not known.

@mrkamel
Copy link

mrkamel commented Aug 8, 2016

i'd like to mention that delete by query doesn't seem to be a perfect fit, because it seems much slower and i guess you need to refresh the index before the call to be sure to not miss docs.

@javanna
Copy link
Member Author

javanna commented Aug 8, 2016

You are right @mrkamel delete by query executes a search. The perfect fit is providing the routing values when they were used at index time. Or even consider not using custom routing as a whole given that it complicates things in some cases.

@rayward
Copy link

rayward commented Aug 8, 2016

Thanks for amending the docs @javanna, I guess I can understand sending those cluster wide requests could be problematic, especially on indexes with many shards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't broadcast deletes to all shards
5 participants