[mappings] partially parsed documents can cause mapping loss #9851

brwe · 2015-02-24T17:05:02Z

When a document is parsed only halfway but parsing exits with a MapperParsingException, the fields that were parsed till then are still in the local DocumentMapper but never added to the cluster state. Once the nodes are restarted the mapping is gone. I wore a test for it here brwe@52cd27c#diff-defbaaff93b959a2f9a93e7167f6f345R246

We can potentailly fix this by also update the mapping on MapperParsingExceptions (brwe@52cd27c#diff-9669e07f0556311d187e534e321a0393R422) but we would probably need to check first if the mapper was actually changed, otherwise we might end with one update for each Exception.

The text was updated successfully, but these errors were encountered:

… fails for the rest of doc The local DocumentMapper is updated while parsing and dynamic fields are added before parsing has finished. If parsing fails after a dynamic field has been added already then the field was not added to the cluster state but was present in the local mapper of this node. New documents with the same field would not necessarily cause an update either and after restarting the node the mapping for these fields were lost. Instead the new fields should always be updated. closes #9851 closes #9874

… fails for the rest of doc The local DocumentMapper is updated while parsing and dynamic fields are added before parsing has finished. If parsing fails after a dynamic field has been added already then the field was not added to the cluster state but was present in the local mapper of this node. New documents with the same field would not necessarily cause an update either and after restarting the node the mapping for these fields were lost. Instead the new fields should always be updated. closes elastic#9851 closes elastic#9874

… fails for the rest of doc The local DocumentMapper is updated while parsing and dynamic fields are added before parsing has finished. If parsing fails after a dynamic field has been added already then the field was not added to the cluster state but was present in the local mapper of this node. New documents with the same field would not necessarily cause an update either and after restarting the node the mapping for these fields were lost. Instead the new fields should always be updated. closes #9851 closes #9874

…ing from the API. We have two completely different code paths for mappings updates, depending on whether they come from the API or are guessed based on the parsed documents. This commit makes dynamic mappings updates execute like updates from the API. The only change in behaviour is that a document that fails parsing can not modify mappings anymore (useful to prevent issues such as elastic#9851). Other than that, this change should be fairly transparent to users but working this way opens doors to other changes such as validating dynamic mappings updates on the master node (elastic#8688). The way it works internally is that Mapper.parse now returns a Mapper instead of being void. The returned Mapper represents a mapping update that has been performed in order to parse the document. Mappings updates are propagated recursively back to the root mapper, and once parsing is finished, we check that the mappings update can be applied, and either fail the parsing if the update cannot be merged (eg. because of a concurrent mapping update from the API) or merge the update into the mappings. However not all mappings updates can be applied recursively, `copy_to` for instance can add mappings at totally different places in the tree. Because of it I added ParseContext.rootMapperUpdates which `copy_to` fills when the field to copy data to does not exist in the mappings yet. These mappings updates are merged from the ones generated by regular parsing. One particular mapping update was the `auto_boost` setting on the `all` root mapper. Being tricky to work on, I removed it in favour of search-time checks that payloads have been indexed. One interesting side-effect of the change is that concurrency on ObjectMapper is greatly simplified since we do not have to care anymore about having concurrent dynamic mappings and API updates.

brwe changed the title ~~partially parsed documents can cause mapping loss~~ [mappings] partially parsed documents can cause mapping loss Feb 25, 2015

brwe mentioned this issue Feb 25, 2015

Update dynamic fields in mapping on master even if parsing fails for the rest of the doc #9874

Closed

brwe closed this as completed in d9a1540 Mar 6, 2015

brwe reopened this Mar 6, 2015

brwe closed this as completed in edb6319 Mar 26, 2015

jpountz mentioned this issue Apr 14, 2015

Same code path for dynamic mappings updates and updates coming from the API. #10593

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mappings] partially parsed documents can cause mapping loss #9851

[mappings] partially parsed documents can cause mapping loss #9851

brwe commented Feb 24, 2015

[mappings] partially parsed documents can cause mapping loss #9851

[mappings] partially parsed documents can cause mapping loss #9851

Comments

brwe commented Feb 24, 2015