Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mappings: Update mapping on master in async manner #6648

Closed
wants to merge 9 commits into from

Conversation

kimchy
Copy link
Member

@kimchy kimchy commented Jun 30, 2014

Today, when a new mapping is introduced, the mapping is rebuilt (refreshSource) on the thread that performs the indexing request. This can become heavier and heavier if new mappings keeps on being introduced, we can move this process to another thread that will be responsible to refresh the source and then send the update mapping to the master (note, this doesn't change the semantics of new mapping introduction, since they are async anyhow).
When doing so, the thread can also try and batch as much updates as possible, this is handy especially when multiple shards for the same index exists on the same node. An internal setting that can control the time to wait for batches is also added (defaults to 0).

Testing wise, a new support method on ElasticsearchIntegrationTest#waitForConcreteMappingsOnAll to allow to wait for the concrete manifestation of mappings on all relevant nodes is added. Some tests mistakenly rely on the fact that there are no more pending tasks to mean mappings have been updated, so if we see, timing related, failures down later (all tests pass), then those will need to be fixed to wither awaitBusy on the master for the new mapping, or in the rare case, wait for the concrete mapping on all the nodes using the new method.

Note, this change also removes action.wait_on_mapping_change, this is an internal setting, and is not recommended to set it. It was used using the old test infrastructure to validate if the problem was due to mapping propagation, but we have a much better infra for this now.

* and sent to master for heavy single index requests that each introduce a new mapping, and when
* multiple shards exists on the same nodes, allowing to work on the index level in this case.
*/
private class MasterMappingUpdater extends Thread {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we implement Runnable instead of extending from Thread?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but then we need to have the runnable around, with the thread as variables. I like this encapsulation, I don't think extending Thread is such a bad idea for this case. Will switch if there is strong sentiment about it

@martijnvg
Copy link
Member

Left two comments, this change looks good to me. Maybe someone else can also take a look?

@@ -323,6 +324,8 @@ public void close() {
stopWatch.stop().start("rivers");
injector.getInstance(RiversManager.class).close();

stopWatch.stop().start("mapping");
injector.getInstance(MappingUpdatedAction.class).close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safer to introduce a stop() method and call it in the stop phase as the masterMappingUpdater thread might try to speak to the cluster service which has been stopped, leading to errors & warning logs

@bleskes
Copy link
Contributor

bleskes commented Jun 30, 2014

I went through the change. Bulk of it looks good. Left some minor comments. I also wonder if we should mark it as breaking because we removed the action.wait_on_mapping_change option.

@bleskes bleskes removed the review label Jun 30, 2014
@kimchy
Copy link
Member Author

kimchy commented Jun 30, 2014

@bleskes used the support method, added a note on breaking, also bit the bullet and cleaned all calls to update mapping to include doc mapper and UUID actually used

sortedMappers.put(cursor.key, cursor.value);
}
Mapper[] sortedMappers = mappers.values().toArray(Mapper.class);
Arrays.sort(sortedMappers, new Comparator<Mapper>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. did this come out of profiling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

@bleskes
Copy link
Contributor

bleskes commented Jun 30, 2014

+1

Today, when a new mapping is introduced, the mapping is rebuilt (refreshSource) on the thread that performs the indexing request. This can become heavier and heavier if new mappings keeps on being introduced, we can move this process to another thread that will be responsible to refresh the source and then send the update mapping to the master (note, this doesn't change the semantics of new mapping introduction, since they are async anyhow).
When doing so, the thread can also try and batch as much updates as possible, this is handy especially when multiple shards for the same index exists on the same node. An internal setting that can control the time to wait for batches is also added (defaults to 0).

Testing wise, a new support method on ElasticsearchIntegrationTest#waitForConcreteMappingsOnAll to allow to wait for the concrete manifestation of mappings on all relevant nodes is added. Some tests mistakenly rely on the fact that there are no more pending tasks to mean mappings have been updated, so if we see, timing related, failures down later (all tests pass), then those will need to be fixed to wither awaitBusy on the master for the new mapping, or in the rare case, wait for the concrete mapping on all the nodes using the new method.
closes elastic#6648
also, no need to call nodes info in test, we already have the node names
also use the internal cluster support method to get the list of nodes an index is on
@kimchy kimchy closed this in 5273410 Jun 30, 2014
kimchy added a commit that referenced this pull request Jun 30, 2014
Today, when a new mapping is introduced, the mapping is rebuilt (refreshSource) on the thread that performs the indexing request. This can become heavier and heavier if new mappings keeps on being introduced, we can move this process to another thread that will be responsible to refresh the source and then send the update mapping to the master (note, this doesn't change the semantics of new mapping introduction, since they are async anyhow).
When doing so, the thread can also try and batch as much updates as possible, this is handy especially when multiple shards for the same index exists on the same node. An internal setting that can control the time to wait for batches is also added (defaults to 0).

Testing wise, a new support method on ElasticsearchIntegrationTest#waitForConcreteMappingsOnAll to allow to wait for the concrete manifestation of mappings on all relevant nodes is added. Some tests mistakenly rely on the fact that there are no more pending tasks to mean mappings have been updated, so if we see, timing related, failures down later (all tests pass), then those will need to be fixed to wither awaitBusy on the master for the new mapping, or in the rare case, wait for the concrete mapping on all the nodes using the new method.
closes #6648
@kimchy kimchy deleted the update_mapping_master_async branch June 30, 2014 20:24
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Jul 7, 2014
During phase1 we copy over all lucene segments. These make refer to mapping updates that are still queued up to be sent to master. We must make sure those pending updates are sent before completing the relocation.

Relates to elastic#6648
bleskes added a commit that referenced this pull request Jul 7, 2014
During phase1 we copy over all lucene segments. These may refer to mapping updates that are still queued up to be sent to master. We must make sure those pending updates are processed before completing the relocation.

Relates to #6648

Closes #6762
bleskes added a commit that referenced this pull request Jul 7, 2014
During phase1 we copy over all lucene segments. These may refer to mapping updates that are still queued up to be sent to master. We must make sure those pending updates are processed before completing the relocation.

Relates to #6648

Closes #6762
@clintongormley clintongormley changed the title Update mapping on master in async manner Mappings: Update mapping on master in async manner Jul 16, 2014
im-denisenko added a commit to im-denisenko/Elastica that referenced this pull request Jan 5, 2015
Config option "action.wait_on_mapping_change" was removed since ES 1.3.0
elastic/elasticsearch#6648
im-denisenko added a commit to im-denisenko/Elastica that referenced this pull request Jan 5, 2015
Config option "action.wait_on_mapping_change" was removed since ES 1.3.0
elastic/elasticsearch#6648
@clintongormley clintongormley added :Search/Mapping Index mappings, including merging and defining field types v2.0.0-beta1 and removed v2.0.0-beta1 labels Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking >enhancement :Search/Mapping Index mappings, including merging and defining field types v1.3.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants