New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shard initialization fails with DocValues exception #8009
Comments
@egueidan Thanks for reporting this issue.
This could indeed be the cause of the issue. Can you please provide us with all the mappings that you have on the |
Hi @jpountz, thanks for looking into this. Sadly I'm not at liberty to share the full mapping here as it contains sensitive data... Here is a dummied down and anonymized version which I hope will help you get going.
Cheers, |
The thing I was looking for in your mappings is whether you have two types on the same index that have the |
Another question: do you know on which elasticsearch version the problematic index has been created? |
Concerning your first question, no, we only have one type on those indices. |
would it be possilbe to get the |
Sorry but the index is gone now but I'll get the files if we run into the issue again. All the nodes are running 1.3.4 (build_hash a70f3cc) and we haven't started testing 1.4.0.Beta1 (yet!). |
Guys, as I was writing my last comment we had to restart one of our nodes... Immediately after we have started seing this exception again. At the moment, the cluster is green but we have lots of exceptions at indexation:
This happens for example on a field which has been discovered as double in the mapping yet we try to index a document with "Infinity" as value for that field. I have the segment* and *.si files from the shard/node reporting the exception, but for privacy reasons, I can't attach them on github. What would be the best way to share them privately with you? |
This is actually a different exception. Your first exception (the original you opened the issue with) happens during IndexWriter creation so there somehow is a conflicting segment on that node and it might be related to problems that we fixed in Yet, lemme ask a couple of more questions to get closer to the source of the problem.
This second failure is due to the fact that you are trying to index the same field as string while it's a double which obviously doesn't work. Are you using dynamic mappings for this? |
Yes, sorry, it is indeed a different exception. I mentioned it because it never happened before the restart of the machine this morning, although I know for a fact that there were similar messages (with string instead of a double) being inserted. Note that we are using I had a gut feeling it could be related and funny thing is right now the index with those exceptions is yellow with one replica shard perpetually initializing with many exceptions like:
To answer your questions this happens on a fully 1.3.4 cluster. The index was created on 1.3.4. There is no 1.3.2 node involved. To summarize what we see, it goes:
Let me know if you need more info. |
Hi @egueidan Are the segment files small enough to email? In which case you can send them to: clinton dot gormley at elasticsearch dot com. thanks for the detailed info |
Hi @clintongormley, |
Hi,
Thought it might be relevant. Regards, |
Hi @egueidan
When we look at the segment files that you sent, some of them are from Lucene 4.9 (1.3.2), not Lucene 4.9.1. So 1.3.2 is involved somewhere here - either you have mixed nodes in your cluster, or the index was originally created with 1.3.2. Can you clarify? thanks |
Hi @clintongormley, |
I opened a lucene issue for this https://issues.apache.org/jira/browse/LUCENE-6019 |
@s1monw Glad you were able to pinpoint the issue and thanks a lot for taking the time. Any idea (even ballpark) of what version/when this will be integrated into elasticsearch? Regards. |
@egueidan the Lucene issue is fixed, and we're going to spin a new Lucene release with this fix. This will prevent mis-use of Lucene's APIs from causing index corruption. But we will also separately to fix Elasticsearch "handle" the same field name across types having different settings (since Elasticsearch currently stores multiple types in one Lucene index), likely by refusing mappings of the same field name across different types with different index settings (such as doc values type). Net/net, even once we've fixed Elasticsearch/Lucene to not corrupt the index on inconsistent doc values types, you'll still have to fix your usage of Elasticsearch to not allow the same field name across different types to have different mappings settings. |
Hi @mikemccand and thanks for the quick fix. |
Hi @egueidan I don't fully understand what {{index.mapping.ignore_malformed}} should do; maybe someone else can chime in here. It sounds like a spooky setting: it seems dangerous to ignore serious errors like a type change. |
Closing this in favour of #8688 |
Hi,
we have been running into strange errors lately. We get a lot of exceptions of the type:
We are in a daily index situation so the index is quite new. It contains 10 to 20 millions of documents spread over 10 shards and 5 nodes. At some point (after hours of the index being green), one of the shards becomes INITALIZING and can never start (because of the aforementioned exception). The index is then in red state and we cannot set it back on track... In this case the only solution we have found is to scroll over the whole index and reindex the data into a new index (but we most likely have lost the data from the failing shard). The field that causes the issue has the following definition
{"type":"long","doc_values":true,"include_in_all":false}
. This mapping is inferred from a dynamic template{"mapping":{"index":"not_analyzed","include_in_all":false,"doc_values":true,"type":"{dynamic_type}"},"match":"*"}
.One important note is that this part of the data is free-form (ie user input) and it is possible that some documents have conflicting types (one document having the field as string, the other as long); that's why the index has the setting
index.mapping.ignore_malformed
set totrue
.Also, it might not be relevant, but this only happened on days when we had at least one node that was restarted.
We have noticed this issue since running ElasticSearch 1.3.4 (but can't be 100% sure that it did not happen before).
We cannot isolate and reproduce the issue but have faced it several times over the past few days. Feel free to suggest actions we can undertake should it happen again, to get more details to help fix it. Also, if you have suggestions to help bypass the issue when it happens (so that we can avoid reindexing the data), that'd be great.
Thanks,
Emmanuel
The text was updated successfully, but these errors were encountered: