Mapping updates should be synchronous #8688

Closed
clintongormley opened this Issue Nov 27, 2014 · 14 comments

Projects

None yet
@clintongormley
Member

Today, a new field can be added to the local mapping of two shards simultaneously. If the detected field type is different, then each shard can end up with a different field type. These shards then send their new mapping to the master, but only of the mappings will win.

This can result in incorrect results and even data loss, eg: one shard thinks that the field is a string, and the other shard (and the master) thinks that it is a number. In this case, sorting and aggregations will be wrong (#8485). Then, when the string-shard is allocated to a new node, that node will receive the "number" mapping from the master. Replaying the transaction log can cause shard failures (eg #8684). Or new numeric doc values are written but, when we try to merge segments, it fails with a doc values exception (#8009).

The only way to ensure that dynamic fields are the same on all shards is to wait for the master to acknowledge the mapping changes before indexing the document, so:

  • parse the document
  • update local mapping
  • if changed, send to master and wait for ack
  • index document

This will potentially slow down indexing when many dynamic fields are being added, but it is the only way to automatically protect against data loss due to mapping conflicts.

Should a user wish to disable waiting for the master (and they are certain that their dynamic mapping rules are good enough to prevent these problems) then we should allow them to opt-out by setting "dynamic": "unsafe"

@rjernst rjernst self-assigned this Dec 1, 2014
@rjernst rjernst removed the adoptme label Dec 1, 2014
@elvarb
elvarb commented Jan 27, 2015

+1 I have issues with this every now and then, worst case when it happens I loose a index, not good.

@sammcj
sammcj commented Feb 8, 2015

+1 - Pretty sure this is the issue we have every day.

Screenshot: https://twitter.com/s_mcleod/status/560237698758111232

logstash-2015.02.07/qlMwQhWoRIedLZ5r23eCcw [1]
{
    state: INITIALIZING
    primary: false
    node: qlMwQhWoRIedLZ5r23eCcw
    relocating_node: null
    shard: 1
    index: logstash-2015.02.07
}
@kimchy
Member
kimchy commented Feb 8, 2015

@sammcj can you post the failure in the logs for the shards being allocated? it might be related, and might not, depends on the actual failure.

@sammcj
sammcj commented Feb 9, 2015

@kimchy it looks one of the situations where we've seen this might be caused by the timestamp field in logstash: https://gist.githubusercontent.com/sammcj/66c212311d99a9ca93fe/raw/de99f6ae149745c7b42262d6c2c26c46458e253f/logstash_shard_logs

@bleskes
Member
bleskes commented Feb 9, 2015

@sammcj it looks like your are using a timestamp field, which is correctly recognized as a date but has the data in the wrong format:

 MapperParsingException[failed to parse [timestamp]]; nested: MapperParsingException[failed to parse date field [Feb  9 11:00:01], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: "Feb  9 11:00:01"]; ]]

This shouldn't fail the recovery (as it should not have been accepted by the primary) but I don't see (yet) how async dynamic mapping creation should have caused this failure. Do you use any custom index templates, the default type or dynamic templates?

@sammcj
sammcj commented Feb 9, 2015

Hi @bleskes - I could see the error around the date and am going to track down which app is logging it to logstash - I was thinking that this shouldn't cause ES to fail recovery (as you said).

The index type for logstash is created and managed by logstash itself - as far as our other indexes we just use the default type.

This sounds like it is unrelated - so I'll log a separate bug for not handling recovery as not to pollute this ticket if required.

@bleskes
Member
bleskes commented Feb 10, 2015

I was thinking that this shouldn't cause ES to fail recovery (as you said).

@sammcj it maybe unrelated. When you say the recovery failed - did you check using GET _recovery to see there were no progress at all? (As opposed to being very slow...)

@sammcj
sammcj commented Feb 12, 2015

@bleskes It looks like ES permanently stuck intializing shard 2

Index Info: https://gist.github.com/sammcj/5128f1479df8754c1bc5
Metadata: https://gist.github.com/sammcj/a110ee0aa72f003ec45e

Resulting in:

{
  "cluster_name" : "int-elastic-cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 79,
  "active_shards" : 155,
  "relocating_shards" : 0,
  "initializing_shards" : 3,
  "unassigned_shards" : 0
}
@bleskes
Member
bleskes commented Feb 12, 2015

@sammcj thx. The index info doesn't show any initializing shard. Can you say which command you used? also , can you post the output of GET {index_name_here}/_recovery?human&pretty as well?

@jlintz
jlintz commented Feb 19, 2015

I'm seeing this as well on a dynamic field mapping that according to _mapping shows as a string but getting errors about it trying to parse it as a date field. This happens fairly frequently to us as well

shard-failed ([logstash-nginx_gator-2015.02.19][3], node[Ffuof69PSOaOxxlr4Lkosw], relocating [h9y4GeSnRiWh4fBkdrmqQA], [R], s[RELOCATING]), reason [Failed to perform [bulk/shard] on replica, message [RemoteTransportException[[logstashesfast03-logstash_es][inet[/10.45.4.115:9300]][bulk/shard/replica]]; nested: MapperParsingException[failed to parse [qs_start]]; nested: MapperParsingException[failed to parse date field [2015-02-18 17:00:46], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: "2015-02-18 17:00:46" is malformed at " 17:00:46"]; ]]

@jpountz jpountz added a commit to jpountz/elasticsearch that referenced this issue Apr 14, 2015
@jpountz jpountz Mappings: Same code path for dynamic mappings updates and updates com…
…ing from the API.

We have two completely different code paths for mappings updates, depending on
whether they come from the API or are guessed based on the parsed documents.
This commit makes dynamic mappings updates execute like updates from the API.

The only change in behaviour is that a document that fails parsing can not
modify mappings anymore (useful to prevent issues such as #9851). Other than
that, this change should be fairly transparent to users but working this way
opens doors to other changes such as validating dynamic mappings updates on the
master node (#8688).

The way it works internally is that Mapper.parse now returns a Mapper instead
of being void. The returned Mapper represents a mapping update that has been
performed in order to parse the document. Mappings updates are propagated
recursively back to the root mapper, and once parsing is finished, we check
that the mappings update can be applied, and either fail the parsing if the
update cannot be merged (eg. because of a concurrent mapping update from the
API) or merge the update into the mappings.

However not all mappings updates can be applied recursively, `copy_to` for
instance can add mappings at totally different places in the tree. Because of
it I added ParseContext.rootMapperUpdates which `copy_to` fills when the
field to copy data to does not exist in the mappings yet. These mappings
updates are merged from the ones generated by regular parsing.

One particular mapping update was the `auto_boost` setting on the `all` root
mapper. Being tricky to work on, I removed it in favour of search-time checks
that payloads have been indexed.

One interesting side-effect of the change is that concurrency on ObjectMapper
is greatly simplified since we do not have to care anymore about having
concurrent dynamic mappings and API updates.
8a3cac3
@jpountz jpountz added a commit to jpountz/elasticsearch that referenced this issue Apr 16, 2015
@jpountz jpountz Mappings: Same code path for dynamic mappings updates and updates com…
…ing from the API.

We have two completely different code paths for mappings updates, depending on
whether they come from the API or are guessed based on the parsed documents.
This commit makes dynamic mappings updates execute like updates from the API.

The only change in behaviour is that a document that fails parsing can not
modify mappings anymore (useful to prevent issues such as #9851). Other than
that, this change should be fairly transparent to users but working this way
opens doors to other changes such as validating dynamic mappings updates on the
master node (#8688).

The way it works internally is that Mapper.parse now returns a Mapper instead
of being void. The returned Mapper represents a mapping update that has been
performed in order to parse the document. Mappings updates are propagated
recursively back to the root mapper, and once parsing is finished, we check
that the mappings update can be applied, and either fail the parsing if the
update cannot be merged (eg. because of a concurrent mapping update from the
API) or merge the update into the mappings.

However not all mappings updates can be applied recursively, `copy_to` for
instance can add mappings at totally different places in the tree. Because of
it I added ParseContext.rootMapperUpdates which `copy_to` fills when the
field to copy data to does not exist in the mappings yet. These mappings
updates are merged from the ones generated by regular parsing.

One particular mapping update was the `auto_boost` setting on the `all` root
mapper. Being tricky to work on, I removed it in favour of search-time checks
that payloads have been indexed.

One interesting side-effect of the change is that concurrency on ObjectMapper
is greatly simplified since we do not have to care anymore about having
concurrent dynamic mappings and API updates.
38bec83
@jpountz jpountz added a commit to jpountz/elasticsearch that referenced this issue Apr 16, 2015
@jpountz jpountz Mappings: Same code path for dynamic mappings updates and updates com…
…ing from the API.

We have two completely different code paths for mappings updates, depending on
whether they come from the API or are guessed based on the parsed documents.
This commit makes dynamic mappings updates execute like updates from the API.

The only change in behaviour is that a document that fails parsing can not
modify mappings anymore (useful to prevent issues such as #9851). Other than
that, this change should be fairly transparent to users but working this way
opens doors to other changes such as validating dynamic mappings updates on the
master node (#8688).

The way it works internally is that Mapper.parse now returns a Mapper instead
of being void. The returned Mapper represents a mapping update that has been
performed in order to parse the document. Mappings updates are propagated
recursively back to the root mapper, and once parsing is finished, we check
that the mappings update can be applied, and either fail the parsing if the
update cannot be merged (eg. because of a concurrent mapping update from the
API) or merge the update into the mappings.

However not all mappings updates can be applied recursively, `copy_to` for
instance can add mappings at totally different places in the tree. Because of
it I added ParseContext.rootMapperUpdates which `copy_to` fills when the
field to copy data to does not exist in the mappings yet. These mappings
updates are merged from the ones generated by regular parsing.

One particular mapping update was the `auto_boost` setting on the `all` root
mapper. Being tricky to work on, I removed it in favour of search-time checks
that payloads have been indexed.

One interesting side-effect of the change is that concurrency on ObjectMapper
is greatly simplified since we do not have to care anymore about having
concurrent dynamic mappings and API updates.
563e704
@jpountz jpountz added a commit to jpountz/elasticsearch that referenced this issue Apr 16, 2015
@jpountz jpountz Mappings: Validate dynamic mappings updates on the master node.
This commit changes dynamic mappings updates so that they are synchronous on the
entire cluster and their validity is checked by the master node. There are some
important consequences of this commit:
 - a failing index request on a non-existing type does not implicitely create
   the type anymore
 - dynamic mappings updates cannot create inconsistent mappings on different
   shards
 - indexing requests that introduce new fields might induce latency spikes
   because of the overhead to update the mappings on the master node

Close #8688
608ffcf
@kevinkluge kevinkluge assigned jpountz and unassigned rjernst Apr 16, 2015
@jpountz jpountz added a commit to jpountz/elasticsearch that referenced this issue Apr 21, 2015
@jpountz jpountz Mappings: Validate dynamic mappings updates on the master node.
This commit changes dynamic mappings updates so that they are synchronous on the
entire cluster and their validity is checked by the master node. There are some
important consequences of this commit:
 - a failing index request on a non-existing type does not implicitely create
   the type anymore
 - dynamic mappings updates cannot create inconsistent mappings on different
   shards
 - indexing requests that introduce new fields might induce latency spikes
   because of the overhead to update the mappings on the master node

Close #8688
1adf232
@jpountz jpountz closed this in #10634 Apr 21, 2015
@kevinkluge kevinkluge removed the in progress label Apr 21, 2015
@garyelephant

Does this problem still exists in es1.5.2 ?

@jpountz
Contributor
jpountz commented Jul 27, 2015

Yes, this fix will be available in 2.0.

@chenryn
chenryn commented Oct 6, 2016

So, is the "dynamic":"unsafe" options exists now?

@clintongormley
Member

@chenryn no - that was never added. we're always safe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment