Mappings: Update mapping on master in async manner #6648

kimchy · 2014-06-30T10:44:16Z

Today, when a new mapping is introduced, the mapping is rebuilt (refreshSource) on the thread that performs the indexing request. This can become heavier and heavier if new mappings keeps on being introduced, we can move this process to another thread that will be responsible to refresh the source and then send the update mapping to the master (note, this doesn't change the semantics of new mapping introduction, since they are async anyhow).
When doing so, the thread can also try and batch as much updates as possible, this is handy especially when multiple shards for the same index exists on the same node. An internal setting that can control the time to wait for batches is also added (defaults to 0).

Testing wise, a new support method on ElasticsearchIntegrationTest#waitForConcreteMappingsOnAll to allow to wait for the concrete manifestation of mappings on all relevant nodes is added. Some tests mistakenly rely on the fact that there are no more pending tasks to mean mappings have been updated, so if we see, timing related, failures down later (all tests pass), then those will need to be fixed to wither awaitBusy on the master for the new mapping, or in the rare case, wait for the concrete mapping on all the nodes using the new method.

Note, this change also removes action.wait_on_mapping_change, this is an internal setting, and is not recommended to set it. It was used using the old test infrastructure to validate if the problem was due to mapping propagation, but we have a much better infra for this now.

martijnvg · 2014-06-30T11:22:57Z

src/main/java/org/elasticsearch/cluster/action/index/MappingUpdatedAction.java

+     * and sent to master for heavy single index requests that each introduce a new mapping, and when
+     * multiple shards exists on the same nodes, allowing to work on the index level in this case.
+     */
+    private class MasterMappingUpdater extends Thread {


Can we implement Runnable instead of extending from Thread?

I thought about it, but then we need to have the runnable around, with the thread as variables. I like this encapsulation, I don't think extending Thread is such a bad idea for this case. Will switch if there is strong sentiment about it

martijnvg · 2014-06-30T11:43:16Z

Left two comments, this change looks good to me. Maybe someone else can also take a look?

bleskes · 2014-06-30T13:53:51Z

src/main/java/org/elasticsearch/node/internal/InternalNode.java

@@ -323,6 +324,8 @@ public void close() {
        stopWatch.stop().start("rivers");
        injector.getInstance(RiversManager.class).close();

+        stopWatch.stop().start("mapping");
+        injector.getInstance(MappingUpdatedAction.class).close();


I think it's safer to introduce a stop() method and call it in the stop phase as the masterMappingUpdater thread might try to speak to the cluster service which has been stopped, leading to errors & warning logs

bleskes · 2014-06-30T14:55:05Z

I went through the change. Bulk of it looks good. Left some minor comments. I also wonder if we should mark it as breaking because we removed the action.wait_on_mapping_change option.

kimchy · 2014-06-30T16:02:05Z

@bleskes used the support method, added a note on breaking, also bit the bullet and cleaned all calls to update mapping to include doc mapper and UUID actually used

bleskes · 2014-06-30T18:10:26Z

src/main/java/org/elasticsearch/index/mapper/object/ObjectMapper.java

-            sortedMappers.put(cursor.key, cursor.value);
-        }
+        Mapper[] sortedMappers = mappers.values().toArray(Mapper.class);
+        Arrays.sort(sortedMappers, new Comparator<Mapper>() {


cool. did this come out of profiling?

bleskes · 2014-06-30T19:36:17Z

+1

Today, when a new mapping is introduced, the mapping is rebuilt (refreshSource) on the thread that performs the indexing request. This can become heavier and heavier if new mappings keeps on being introduced, we can move this process to another thread that will be responsible to refresh the source and then send the update mapping to the master (note, this doesn't change the semantics of new mapping introduction, since they are async anyhow). When doing so, the thread can also try and batch as much updates as possible, this is handy especially when multiple shards for the same index exists on the same node. An internal setting that can control the time to wait for batches is also added (defaults to 0). Testing wise, a new support method on ElasticsearchIntegrationTest#waitForConcreteMappingsOnAll to allow to wait for the concrete manifestation of mappings on all relevant nodes is added. Some tests mistakenly rely on the fact that there are no more pending tasks to mean mappings have been updated, so if we see, timing related, failures down later (all tests pass), then those will need to be fixed to wither awaitBusy on the master for the new mapping, or in the rare case, wait for the concrete mapping on all the nodes using the new method. closes elastic#6648

also, no need to call nodes info in test, we already have the node names

also use the internal cluster support method to get the list of nodes an index is on

Today, when a new mapping is introduced, the mapping is rebuilt (refreshSource) on the thread that performs the indexing request. This can become heavier and heavier if new mappings keeps on being introduced, we can move this process to another thread that will be responsible to refresh the source and then send the update mapping to the master (note, this doesn't change the semantics of new mapping introduction, since they are async anyhow). When doing so, the thread can also try and batch as much updates as possible, this is handy especially when multiple shards for the same index exists on the same node. An internal setting that can control the time to wait for batches is also added (defaults to 0). Testing wise, a new support method on ElasticsearchIntegrationTest#waitForConcreteMappingsOnAll to allow to wait for the concrete manifestation of mappings on all relevant nodes is added. Some tests mistakenly rely on the fact that there are no more pending tasks to mean mappings have been updated, so if we see, timing related, failures down later (all tests pass), then those will need to be fixed to wither awaitBusy on the master for the new mapping, or in the rare case, wait for the concrete mapping on all the nodes using the new method. closes #6648

During phase1 we copy over all lucene segments. These make refer to mapping updates that are still queued up to be sent to master. We must make sure those pending updates are sent before completing the relocation. Relates to elastic#6648

During phase1 we copy over all lucene segments. These may refer to mapping updates that are still queued up to be sent to master. We must make sure those pending updates are processed before completing the relocation. Relates to #6648 Closes #6762

Config option "action.wait_on_mapping_change" was removed since ES 1.3.0 elastic/elasticsearch#6648

kimchy mentioned this pull request Jun 30, 2014

Don't create mapping entry for dynamic templates #6619

Closed

jpountz assigned kimchy Jun 30, 2014

martijnvg reviewed Jun 30, 2014
View reviewed changes

kimchy added review labels Jun 30, 2014

bleskes reviewed Jun 30, 2014
View reviewed changes

bleskes removed the review label Jun 30, 2014

kimchy added the breaking label Jun 30, 2014

bleskes reviewed Jun 30, 2014
View reviewed changes

kimchy added 4 commits June 30, 2014 21:37

allow to change the additional time window dynamically

8878132

better sorting on mappers when refreshing source

c12ba7b

also, no need to call nodes info in test, we already have the node names

clean calls to mapping update to provide doc mapper and UUID always

14b418a

also use the internal cluster support method to get the list of nodes an index is on

kimchy added 5 commits June 30, 2014 21:37

reverse the order to pick the latest change first

0e6d249

remove unused field

47a72fe

and fix constructor param

712069c

move to start/stop on mapping update action

ec1e3c8

randomize INDICES_MAPPING_ADDITIONAL_MAPPING_CHANGE_TIME

bdfce12

kimchy closed this in 5273410 Jun 30, 2014

kimchy deleted the update_mapping_master_async branch June 30, 2014 20:24

bleskes mentioned this pull request Jul 7, 2014

During relocation, process pending mapping update in phase 2 #6762

Closed

clintongormley changed the title ~~Update mapping on master in async manner~~ Mappings: Update mapping on master in async manner Jul 16, 2014

im-denisenko added a commit to im-denisenko/Elastica that referenced this pull request Jan 5, 2015

Remove ES_WAIT_ON_MAPPING_CHANGE from travis

9a75bdf

Config option "action.wait_on_mapping_change" was removed since ES 1.3.0 elastic/elasticsearch#6648

im-denisenko added a commit to im-denisenko/Elastica that referenced this pull request Jan 5, 2015

Remove ES_WAIT_ON_MAPPING_CHANGE from travis

72c3987

Config option "action.wait_on_mapping_change" was removed since ES 1.3.0 elastic/elasticsearch#6648

clintongormley added :Search/Mapping Index mappings, including merging and defining field types v2.0.0-beta1 and removed v2.0.0-beta1 labels Jun 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mappings: Update mapping on master in async manner #6648

Mappings: Update mapping on master in async manner #6648

kimchy commented Jun 30, 2014

martijnvg Jun 30, 2014

kimchy Jun 30, 2014

martijnvg commented Jun 30, 2014

bleskes Jun 30, 2014

bleskes commented Jun 30, 2014

kimchy commented Jun 30, 2014

bleskes Jun 30, 2014

kimchy Jun 30, 2014

bleskes commented Jun 30, 2014

Mappings: Update mapping on master in async manner #6648

Mappings: Update mapping on master in async manner #6648

Conversation

kimchy commented Jun 30, 2014

martijnvg Jun 30, 2014

Choose a reason for hiding this comment

kimchy Jun 30, 2014

Choose a reason for hiding this comment

martijnvg commented Jun 30, 2014

bleskes Jun 30, 2014

Choose a reason for hiding this comment

bleskes commented Jun 30, 2014

kimchy commented Jun 30, 2014

bleskes Jun 30, 2014

Choose a reason for hiding this comment

kimchy Jun 30, 2014

Choose a reason for hiding this comment

bleskes commented Jun 30, 2014