Mapping updates should be synchronous #8688

clintongormley · 2014-11-27T17:00:20Z

Today, a new field can be added to the local mapping of two shards simultaneously. If the detected field type is different, then each shard can end up with a different field type. These shards then send their new mapping to the master, but only of the mappings will win.

This can result in incorrect results and even data loss, eg: one shard thinks that the field is a string, and the other shard (and the master) thinks that it is a number. In this case, sorting and aggregations will be wrong (#8485). Then, when the string-shard is allocated to a new node, that node will receive the "number" mapping from the master. Replaying the transaction log can cause shard failures (eg #8684). Or new numeric doc values are written but, when we try to merge segments, it fails with a doc values exception (#8009).

The only way to ensure that dynamic fields are the same on all shards is to wait for the master to acknowledge the mapping changes before indexing the document, so:

parse the document
update local mapping
if changed, send to master and wait for ack
index document

This will potentially slow down indexing when many dynamic fields are being added, but it is the only way to automatically protect against data loss due to mapping conflicts.

Should a user wish to disable waiting for the master (and they are certain that their dynamic mapping rules are good enough to prevent these problems) then we should allow them to opt-out by setting "dynamic": "unsafe"

The text was updated successfully, but these errors were encountered:

elvarb · 2015-01-27T13:11:36Z

+1 I have issues with this every now and then, worst case when it happens I loose a index, not good.

sammcj · 2015-02-08T06:04:14Z

+1 - Pretty sure this is the issue we have every day.

Screenshot: https://twitter.com/s_mcleod/status/560237698758111232

logstash-2015.02.07/qlMwQhWoRIedLZ5r23eCcw [1]
{
    state: INITIALIZING
    primary: false
    node: qlMwQhWoRIedLZ5r23eCcw
    relocating_node: null
    shard: 1
    index: logstash-2015.02.07
}

kimchy · 2015-02-08T16:47:20Z

@sammcj can you post the failure in the logs for the shards being allocated? it might be related, and might not, depends on the actual failure.

sammcj · 2015-02-09T00:05:23Z

@kimchy it looks one of the situations where we've seen this might be caused by the timestamp field in logstash: https://gist.githubusercontent.com/sammcj/66c212311d99a9ca93fe/raw/de99f6ae149745c7b42262d6c2c26c46458e253f/logstash_shard_logs

bleskes · 2015-02-09T09:50:06Z

@sammcj it looks like your are using a timestamp field, which is correctly recognized as a date but has the data in the wrong format:

 MapperParsingException[failed to parse [timestamp]]; nested: MapperParsingException[failed to parse date field [Feb  9 11:00:01], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: "Feb  9 11:00:01"]; ]]

This shouldn't fail the recovery (as it should not have been accepted by the primary) but I don't see (yet) how async dynamic mapping creation should have caused this failure. Do you use any custom index templates, the default type or dynamic templates?

sammcj · 2015-02-09T22:10:57Z

Hi @bleskes - I could see the error around the date and am going to track down which app is logging it to logstash - I was thinking that this shouldn't cause ES to fail recovery (as you said).

The index type for logstash is created and managed by logstash itself - as far as our other indexes we just use the default type.

This sounds like it is unrelated - so I'll log a separate bug for not handling recovery as not to pollute this ticket if required.

bleskes · 2015-02-10T21:07:04Z

I was thinking that this shouldn't cause ES to fail recovery (as you said).

@sammcj it maybe unrelated. When you say the recovery failed - did you check using GET _recovery to see there were no progress at all? (As opposed to being very slow...)

sammcj · 2015-02-12T00:06:43Z

@bleskes It looks like ES permanently stuck intializing shard 2

Index Info: https://gist.github.com/sammcj/5128f1479df8754c1bc5
Metadata: https://gist.github.com/sammcj/a110ee0aa72f003ec45e

Resulting in:

{
  "cluster_name" : "int-elastic-cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 79,
  "active_shards" : 155,
  "relocating_shards" : 0,
  "initializing_shards" : 3,
  "unassigned_shards" : 0
}

bleskes · 2015-02-12T18:06:27Z

@sammcj thx. The index info doesn't show any initializing shard. Can you say which command you used? also , can you post the output of GET {index_name_here}/_recovery?human&pretty as well?

jlintz · 2015-02-19T22:22:54Z

I'm seeing this as well on a dynamic field mapping that according to _mapping shows as a string but getting errors about it trying to parse it as a date field. This happens fairly frequently to us as well

shard-failed ([logstash-nginx_gator-2015.02.19][3], node[Ffuof69PSOaOxxlr4Lkosw], relocating [h9y4GeSnRiWh4fBkdrmqQA], [R], s[RELOCATING]), reason [Failed to perform [bulk/shard] on replica, message [RemoteTransportException[[logstashesfast03-logstash_es][inet[/10.45.4.115:9300]][bulk/shard/replica]]; nested: MapperParsingException[failed to parse [qs_start]]; nested: MapperParsingException[failed to parse date field [2015-02-18 17:00:46], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: "2015-02-18 17:00:46" is malformed at " 17:00:46"]; ]]

…ing from the API. We have two completely different code paths for mappings updates, depending on whether they come from the API or are guessed based on the parsed documents. This commit makes dynamic mappings updates execute like updates from the API. The only change in behaviour is that a document that fails parsing can not modify mappings anymore (useful to prevent issues such as elastic#9851). Other than that, this change should be fairly transparent to users but working this way opens doors to other changes such as validating dynamic mappings updates on the master node (elastic#8688). The way it works internally is that Mapper.parse now returns a Mapper instead of being void. The returned Mapper represents a mapping update that has been performed in order to parse the document. Mappings updates are propagated recursively back to the root mapper, and once parsing is finished, we check that the mappings update can be applied, and either fail the parsing if the update cannot be merged (eg. because of a concurrent mapping update from the API) or merge the update into the mappings. However not all mappings updates can be applied recursively, `copy_to` for instance can add mappings at totally different places in the tree. Because of it I added ParseContext.rootMapperUpdates which `copy_to` fills when the field to copy data to does not exist in the mappings yet. These mappings updates are merged from the ones generated by regular parsing. One particular mapping update was the `auto_boost` setting on the `all` root mapper. Being tricky to work on, I removed it in favour of search-time checks that payloads have been indexed. One interesting side-effect of the change is that concurrency on ObjectMapper is greatly simplified since we do not have to care anymore about having concurrent dynamic mappings and API updates.

This commit changes dynamic mappings updates so that they are synchronous on the entire cluster and their validity is checked by the master node. There are some important consequences of this commit: - a failing index request on a non-existing type does not implicitely create the type anymore - dynamic mappings updates cannot create inconsistent mappings on different shards - indexing requests that introduce new fields might induce latency spikes because of the overhead to update the mappings on the master node Close elastic#8688

garyelephant · 2015-07-27T10:26:17Z

Does this problem still exists in es1.5.2 ?

jpountz · 2015-07-27T10:27:24Z

Yes, this fix will be available in 2.0.

chenryn · 2016-10-06T16:12:08Z

So, is the "dynamic":"unsafe" options exists now?

clintongormley · 2016-10-07T17:52:03Z

@chenryn no - that was never added. we're always safe

clintongormley added resiliency >enhancement :Search/Mapping Index mappings, including merging and defining field types >feature help wanted adoptme v2.0.0-beta1 and removed >feature labels Nov 27, 2014

clintongormley mentioned this issue Nov 28, 2014

stuck initializing shards & NumberFormatException #8684

Closed

rjernst self-assigned this Dec 1, 2014

rjernst removed the help wanted adoptme label Dec 1, 2014

This was referenced Dec 24, 2014

Very strange behavior of index templates #5000

Closed

Shard initialization fails with DocValues exception #8009

Closed

jpountz mentioned this issue Jan 20, 2015

Mappings: dynamic mapping updates should use the same path as mapping updates from the API #9364

Closed

bleskes mentioned this issue Feb 23, 2015

Flush IndexWriter to disk on close and shutdown #7563

Merged

clintongormley mentioned this issue Mar 9, 2015

nested: NumberFormatException[Invalid shift value in prefixCoded bytes (is encoded value really an INT?)] #9191

Closed

jpountz mentioned this issue Mar 16, 2015

Sum Aggregation On Nested Doc Not Handling Negative Numbers #9979

Closed

clintongormley mentioned this issue Apr 5, 2015

After restart elasticsearch reads double as array of string #5705

Closed

jpountz mentioned this issue Apr 14, 2015

Same code path for dynamic mappings updates and updates coming from the API. #10593

Merged

jpountz mentioned this issue Apr 16, 2015

Validate dynamic mappings updates on the master node. #10634

Merged

kevinkluge assigned jpountz and unassigned rjernst Apr 16, 2015

kevinkluge added the in progress label Apr 16, 2015

jpountz closed this as completed in #10634 Apr 21, 2015

kevinkluge removed the in progress label Apr 21, 2015

hanneskaeufler mentioned this issue May 5, 2015

Force update of mapping elastic/elasticsearch-php#186

Closed

clintongormley mentioned this issue May 26, 2015

Exception occured when accessing elastic search #11340

Closed

awislowski mentioned this issue Jul 14, 2015

Update index.asciidoc #12223

Merged

jpountz mentioned this issue Dec 22, 2015

ClassCastException DoubleTerms$Bucket cannot be cast to LongTerms$Bucket #15609

Closed

DimonHo mentioned this issue Sep 8, 2017

关于elasticsearch动态映射同名字段类型不一致的问题 DimonHo/DH_Note#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapping updates should be synchronous #8688

Mapping updates should be synchronous #8688

clintongormley commented Nov 27, 2014

elvarb commented Jan 27, 2015

sammcj commented Feb 8, 2015

kimchy commented Feb 8, 2015

sammcj commented Feb 9, 2015

bleskes commented Feb 9, 2015

sammcj commented Feb 9, 2015

bleskes commented Feb 10, 2015

sammcj commented Feb 12, 2015

bleskes commented Feb 12, 2015

jlintz commented Feb 19, 2015

garyelephant commented Jul 27, 2015

jpountz commented Jul 27, 2015

chenryn commented Oct 6, 2016

clintongormley commented Oct 7, 2016

Mapping updates should be synchronous #8688

Mapping updates should be synchronous #8688

Comments

clintongormley commented Nov 27, 2014

elvarb commented Jan 27, 2015

sammcj commented Feb 8, 2015

kimchy commented Feb 8, 2015

sammcj commented Feb 9, 2015

bleskes commented Feb 9, 2015

sammcj commented Feb 9, 2015

bleskes commented Feb 10, 2015

sammcj commented Feb 12, 2015

bleskes commented Feb 12, 2015

jlintz commented Feb 19, 2015

garyelephant commented Jul 27, 2015

jpountz commented Jul 27, 2015

chenryn commented Oct 6, 2016

clintongormley commented Oct 7, 2016