Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mappings] partially parsed documents can cause mapping loss #9851

Closed
brwe opened this issue Feb 24, 2015 · 0 comments
Closed

[mappings] partially parsed documents can cause mapping loss #9851

brwe opened this issue Feb 24, 2015 · 0 comments

Comments

@brwe
Copy link
Contributor

brwe commented Feb 24, 2015

When a document is parsed only halfway but parsing exits with a MapperParsingException, the fields that were parsed till then are still in the local DocumentMapper but never added to the cluster state. Once the nodes are restarted the mapping is gone. I wore a test for it here brwe@52cd27c#diff-defbaaff93b959a2f9a93e7167f6f345R246

We can potentailly fix this by also update the mapping on MapperParsingExceptions (brwe@52cd27c#diff-9669e07f0556311d187e534e321a0393R422) but we would probably need to check first if the mapper was actually changed, otherwise we might end with one update for each Exception.

@brwe brwe changed the title partially parsed documents can cause mapping loss [mappings] partially parsed documents can cause mapping loss Feb 25, 2015
@brwe brwe closed this as completed in d9a1540 Mar 6, 2015
brwe added a commit that referenced this issue Mar 6, 2015
… fails for the rest of doc

The local DocumentMapper is updated while parsing and dynamic fields are added before
parsing has finished. If parsing fails after a dynamic field has been added already
then the field was not added to the cluster state but was present in the local mapper of this
node. New documents with the same field would not necessarily cause an update either and
after restarting the node the mapping for these fields were lost. Instead the new fields
should always be updated.

closes #9851
closes #9874
@brwe brwe reopened this Mar 6, 2015
brwe added a commit to brwe/elasticsearch that referenced this issue Mar 17, 2015
… fails for the rest of doc

The local DocumentMapper is updated while parsing and dynamic fields are added before
parsing has finished. If parsing fails after a dynamic field has been added already
then the field was not added to the cluster state but was present in the local mapper of this
node. New documents with the same field would not necessarily cause an update either and
after restarting the node the mapping for these fields were lost. Instead the new fields
should always be updated.

closes elastic#9851
closes elastic#9874
brwe added a commit that referenced this issue Mar 26, 2015
… fails for the rest of doc

The local DocumentMapper is updated while parsing and dynamic fields are added before
parsing has finished. If parsing fails after a dynamic field has been added already
then the field was not added to the cluster state but was present in the local mapper of this
node. New documents with the same field would not necessarily cause an update either and
after restarting the node the mapping for these fields were lost. Instead the new fields
should always be updated.

closes #9851
closes #9874
@brwe brwe closed this as completed in edb6319 Mar 26, 2015
jpountz added a commit to jpountz/elasticsearch that referenced this issue Apr 14, 2015
…ing from the API.

We have two completely different code paths for mappings updates, depending on
whether they come from the API or are guessed based on the parsed documents.
This commit makes dynamic mappings updates execute like updates from the API.

The only change in behaviour is that a document that fails parsing can not
modify mappings anymore (useful to prevent issues such as elastic#9851). Other than
that, this change should be fairly transparent to users but working this way
opens doors to other changes such as validating dynamic mappings updates on the
master node (elastic#8688).

The way it works internally is that Mapper.parse now returns a Mapper instead
of being void. The returned Mapper represents a mapping update that has been
performed in order to parse the document. Mappings updates are propagated
recursively back to the root mapper, and once parsing is finished, we check
that the mappings update can be applied, and either fail the parsing if the
update cannot be merged (eg. because of a concurrent mapping update from the
API) or merge the update into the mappings.

However not all mappings updates can be applied recursively, `copy_to` for
instance can add mappings at totally different places in the tree. Because of
it I added ParseContext.rootMapperUpdates which `copy_to` fills when the
field to copy data to does not exist in the mappings yet. These mappings
updates are merged from the ones generated by regular parsing.

One particular mapping update was the `auto_boost` setting on the `all` root
mapper. Being tricky to work on, I removed it in favour of search-time checks
that payloads have been indexed.

One interesting side-effect of the change is that concurrency on ObjectMapper
is greatly simplified since we do not have to care anymore about having
concurrent dynamic mappings and API updates.
jpountz added a commit to jpountz/elasticsearch that referenced this issue Apr 16, 2015
…ing from the API.

We have two completely different code paths for mappings updates, depending on
whether they come from the API or are guessed based on the parsed documents.
This commit makes dynamic mappings updates execute like updates from the API.

The only change in behaviour is that a document that fails parsing can not
modify mappings anymore (useful to prevent issues such as elastic#9851). Other than
that, this change should be fairly transparent to users but working this way
opens doors to other changes such as validating dynamic mappings updates on the
master node (elastic#8688).

The way it works internally is that Mapper.parse now returns a Mapper instead
of being void. The returned Mapper represents a mapping update that has been
performed in order to parse the document. Mappings updates are propagated
recursively back to the root mapper, and once parsing is finished, we check
that the mappings update can be applied, and either fail the parsing if the
update cannot be merged (eg. because of a concurrent mapping update from the
API) or merge the update into the mappings.

However not all mappings updates can be applied recursively, `copy_to` for
instance can add mappings at totally different places in the tree. Because of
it I added ParseContext.rootMapperUpdates which `copy_to` fills when the
field to copy data to does not exist in the mappings yet. These mappings
updates are merged from the ones generated by regular parsing.

One particular mapping update was the `auto_boost` setting on the `all` root
mapper. Being tricky to work on, I removed it in favour of search-time checks
that payloads have been indexed.

One interesting side-effect of the change is that concurrency on ObjectMapper
is greatly simplified since we do not have to care anymore about having
concurrent dynamic mappings and API updates.
jpountz added a commit to jpountz/elasticsearch that referenced this issue Apr 16, 2015
…ing from the API.

We have two completely different code paths for mappings updates, depending on
whether they come from the API or are guessed based on the parsed documents.
This commit makes dynamic mappings updates execute like updates from the API.

The only change in behaviour is that a document that fails parsing can not
modify mappings anymore (useful to prevent issues such as elastic#9851). Other than
that, this change should be fairly transparent to users but working this way
opens doors to other changes such as validating dynamic mappings updates on the
master node (elastic#8688).

The way it works internally is that Mapper.parse now returns a Mapper instead
of being void. The returned Mapper represents a mapping update that has been
performed in order to parse the document. Mappings updates are propagated
recursively back to the root mapper, and once parsing is finished, we check
that the mappings update can be applied, and either fail the parsing if the
update cannot be merged (eg. because of a concurrent mapping update from the
API) or merge the update into the mappings.

However not all mappings updates can be applied recursively, `copy_to` for
instance can add mappings at totally different places in the tree. Because of
it I added ParseContext.rootMapperUpdates which `copy_to` fills when the
field to copy data to does not exist in the mappings yet. These mappings
updates are merged from the ones generated by regular parsing.

One particular mapping update was the `auto_boost` setting on the `all` root
mapper. Being tricky to work on, I removed it in favour of search-time checks
that payloads have been indexed.

One interesting side-effect of the change is that concurrency on ObjectMapper
is greatly simplified since we do not have to care anymore about having
concurrent dynamic mappings and API updates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant