-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mappings] partially parsed documents can cause mapping loss #9851
Comments
brwe
added a commit
that referenced
this issue
Mar 6, 2015
… fails for the rest of doc The local DocumentMapper is updated while parsing and dynamic fields are added before parsing has finished. If parsing fails after a dynamic field has been added already then the field was not added to the cluster state but was present in the local mapper of this node. New documents with the same field would not necessarily cause an update either and after restarting the node the mapping for these fields were lost. Instead the new fields should always be updated. closes #9851 closes #9874
brwe
added a commit
to brwe/elasticsearch
that referenced
this issue
Mar 17, 2015
… fails for the rest of doc The local DocumentMapper is updated while parsing and dynamic fields are added before parsing has finished. If parsing fails after a dynamic field has been added already then the field was not added to the cluster state but was present in the local mapper of this node. New documents with the same field would not necessarily cause an update either and after restarting the node the mapping for these fields were lost. Instead the new fields should always be updated. closes elastic#9851 closes elastic#9874
brwe
added a commit
that referenced
this issue
Mar 26, 2015
… fails for the rest of doc The local DocumentMapper is updated while parsing and dynamic fields are added before parsing has finished. If parsing fails after a dynamic field has been added already then the field was not added to the cluster state but was present in the local mapper of this node. New documents with the same field would not necessarily cause an update either and after restarting the node the mapping for these fields were lost. Instead the new fields should always be updated. closes #9851 closes #9874
jpountz
added a commit
to jpountz/elasticsearch
that referenced
this issue
Apr 14, 2015
…ing from the API. We have two completely different code paths for mappings updates, depending on whether they come from the API or are guessed based on the parsed documents. This commit makes dynamic mappings updates execute like updates from the API. The only change in behaviour is that a document that fails parsing can not modify mappings anymore (useful to prevent issues such as elastic#9851). Other than that, this change should be fairly transparent to users but working this way opens doors to other changes such as validating dynamic mappings updates on the master node (elastic#8688). The way it works internally is that Mapper.parse now returns a Mapper instead of being void. The returned Mapper represents a mapping update that has been performed in order to parse the document. Mappings updates are propagated recursively back to the root mapper, and once parsing is finished, we check that the mappings update can be applied, and either fail the parsing if the update cannot be merged (eg. because of a concurrent mapping update from the API) or merge the update into the mappings. However not all mappings updates can be applied recursively, `copy_to` for instance can add mappings at totally different places in the tree. Because of it I added ParseContext.rootMapperUpdates which `copy_to` fills when the field to copy data to does not exist in the mappings yet. These mappings updates are merged from the ones generated by regular parsing. One particular mapping update was the `auto_boost` setting on the `all` root mapper. Being tricky to work on, I removed it in favour of search-time checks that payloads have been indexed. One interesting side-effect of the change is that concurrency on ObjectMapper is greatly simplified since we do not have to care anymore about having concurrent dynamic mappings and API updates.
jpountz
added a commit
to jpountz/elasticsearch
that referenced
this issue
Apr 16, 2015
…ing from the API. We have two completely different code paths for mappings updates, depending on whether they come from the API or are guessed based on the parsed documents. This commit makes dynamic mappings updates execute like updates from the API. The only change in behaviour is that a document that fails parsing can not modify mappings anymore (useful to prevent issues such as elastic#9851). Other than that, this change should be fairly transparent to users but working this way opens doors to other changes such as validating dynamic mappings updates on the master node (elastic#8688). The way it works internally is that Mapper.parse now returns a Mapper instead of being void. The returned Mapper represents a mapping update that has been performed in order to parse the document. Mappings updates are propagated recursively back to the root mapper, and once parsing is finished, we check that the mappings update can be applied, and either fail the parsing if the update cannot be merged (eg. because of a concurrent mapping update from the API) or merge the update into the mappings. However not all mappings updates can be applied recursively, `copy_to` for instance can add mappings at totally different places in the tree. Because of it I added ParseContext.rootMapperUpdates which `copy_to` fills when the field to copy data to does not exist in the mappings yet. These mappings updates are merged from the ones generated by regular parsing. One particular mapping update was the `auto_boost` setting on the `all` root mapper. Being tricky to work on, I removed it in favour of search-time checks that payloads have been indexed. One interesting side-effect of the change is that concurrency on ObjectMapper is greatly simplified since we do not have to care anymore about having concurrent dynamic mappings and API updates.
jpountz
added a commit
to jpountz/elasticsearch
that referenced
this issue
Apr 16, 2015
…ing from the API. We have two completely different code paths for mappings updates, depending on whether they come from the API or are guessed based on the parsed documents. This commit makes dynamic mappings updates execute like updates from the API. The only change in behaviour is that a document that fails parsing can not modify mappings anymore (useful to prevent issues such as elastic#9851). Other than that, this change should be fairly transparent to users but working this way opens doors to other changes such as validating dynamic mappings updates on the master node (elastic#8688). The way it works internally is that Mapper.parse now returns a Mapper instead of being void. The returned Mapper represents a mapping update that has been performed in order to parse the document. Mappings updates are propagated recursively back to the root mapper, and once parsing is finished, we check that the mappings update can be applied, and either fail the parsing if the update cannot be merged (eg. because of a concurrent mapping update from the API) or merge the update into the mappings. However not all mappings updates can be applied recursively, `copy_to` for instance can add mappings at totally different places in the tree. Because of it I added ParseContext.rootMapperUpdates which `copy_to` fills when the field to copy data to does not exist in the mappings yet. These mappings updates are merged from the ones generated by regular parsing. One particular mapping update was the `auto_boost` setting on the `all` root mapper. Being tricky to work on, I removed it in favour of search-time checks that payloads have been indexed. One interesting side-effect of the change is that concurrency on ObjectMapper is greatly simplified since we do not have to care anymore about having concurrent dynamic mappings and API updates.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When a document is parsed only halfway but parsing exits with a MapperParsingException, the fields that were parsed till then are still in the local DocumentMapper but never added to the cluster state. Once the nodes are restarted the mapping is gone. I wore a test for it here brwe@52cd27c#diff-defbaaff93b959a2f9a93e7167f6f345R246
We can potentailly fix this by also update the mapping on MapperParsingExceptions (brwe@52cd27c#diff-9669e07f0556311d187e534e321a0393R422) but we would probably need to check first if the mapper was actually changed, otherwise we might end with one update for each Exception.
The text was updated successfully, but these errors were encountered: