-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible to index duplicate documents with same id and routing id. #31976
Comments
|
The description of this problem seems similar to #10511, however I have double checked that all of the documents are of the type "ce". |
|
Pinging @elastic/es-distributed |
|
@kylelyk Can you provide more info on the bulk indexing process? Are you setting the routing value on the bulk request? Are you using auto-generated IDs? |
|
We are using routing values for each document indexed during a bulk request and we are using external GUIDs from a DB for the id. Searching using the preferences you specified, I can see that there are two documents on shard 1 primary with same id, type, and routing id, and 1 document on shard 1 replica. |
Did you mean the duplicate occurs on the primary? Can you also provide the |
|
Yes, the duplicate occurs on the primary shard. When I try to search using _version as documented here, I get two documents with version 60 and 59. I get 1 document when I then specify the preference=shards:X where x is any number. Maybe _version doesn't play well with preferences? |
|
@kylelyk Thanks a lot for the info. Which version type did you use for these documents? |
|
I am not using any kind of versioning when indexing so the default should be no version checking and automatic version incrementing. |
@kylelyk We don't have to delete before reindexing a document. However, can you confirm that you always use a bulk of delete and index when updating documents or just sometimes? Thank you! |
|
@ywelsch found that this issue is related to and fixed by #29619. Given the way we deleted/updated these documents and their versions, this issue can be explained as follows:
Our formal model uncovered this problem and we already fixed this in 6.3.0 by #29619. @kylelyk I really appreciate your helpfulness here. |
|
@kylelyk can you update to the latest ES version (6.3.1 as of this reply) and check if this still happens? |
|
Unfortunately, we're using the AWS hosted version of Elasticsearch so it might take some time for Amazon to update it to 6.3.x. I'll close this issue and re-open it if the problem persists after the update. |
|
@ywelsch I'm having the same issue which I can reproduce with the following commands: Followed by: Which will result in: The same commands issued against an index without joinType does not produce duplicate documents. My template looks like: I'm on Elasticsearch 6.3.2. |
|
@HJK181 you have different routing keys. Your documents most likely go to different shards. This is expected behaviour. You need to ensure that if you use routing values two documents with the same id cannot have different routing keys. If you have any further questions or need help with elasticsearch, please don't hesitate to ask on our discussion forum. |
Elasticsearch version: 6.2.4
Plugins installed: []
JVM version: 1.8.0_172
OS version: MacOS (Darwin Kernel Version 15.6.0)
Description of the problem including expected versus actual behavior:
Over the past few months, we've been seeing completely identical documents pop up which have the same id, type and routing id. We're using custom routing to get parent-child joins working correctly and we make sure to delete the existing documents when re-indexing them to avoid two copies of the same document on the same shard. We use Bulk Index API calls to delete and index the documents. The indexTime field below is set by the service that indexes the document into ES and as you can see, the documents were indexed about 1 second apart from each other. This problem only seems to happen on our production server which has more traffic and 1 read replica, and it's only ever 2 documents that are duplicated on what I believe to be a single shard.
The problem can be fixed by deleting the existing documents with that id and re-indexing it again which is weird since that is what the indexing service is doing in the first place.
Queries:
GET /my-index/_search
The text was updated successfully, but these errors were encountered: