Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate dynamic mappings updates on the master node. #10634

Merged
merged 2 commits into from Apr 21, 2015

Conversation

@jpountz
Copy link
Contributor

commented Apr 16, 2015

This commit changes dynamic mappings updates so that they are synchronous on the
entire cluster and their validity is checked by the master node. There are some
important consequences of this commit:

  • a failing index request on a non-existing type does not implicitely create
    the type anymore
  • dynamic mappings updates cannot create inconsistent mappings on different
    shards
  • indexing requests that introduce new fields might induce latency spikes
    because of the overhead to update the mappings on the master node

Closes #8688
Closes #8650

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Apr 16, 2015

This is still a work-in-progress as I have test failures on RiverTests. I am suspecting this is because this test triggers lots of concurrent dynamic mappings updates and there is a deadlock somehow because of my ignorange of how our transport works (eg. it is ok to update the mappings like I do in MapperUtils.validateDynamicMappingsUpdateOnMaster?). Any help would be greatly appreciated.

try {
MapperUtils.validateDynamicMappingsUpdateOnMaster(client, logger, indexService.index().name(), mappingUpdate.getKey(), mappingUpdate.getValue(), waitForMappingUpdatePostRecovery);
} catch (Throwable t) {
logger.debug("failed to send mapping update post recovery to master for [{}]", t, mappingUpdate.getKey());

This comment has been minimized.

Copy link
@rjernst

rjernst Apr 17, 2015

Member

This seems like a bad case to get into? At least should it be warn or error level? This means we reapplied some updates from the translog, but the master rejected those updates...but we dont seem to do anything with the validation, so does that mean the mappings have already been updated in place? Also, could we limit the catch to just whatever exception would mean failed validation? Otherwise a bug as simple as an NPE in validate would get caught and just logged?

* Merge {@code mergeWith} into {@code mergeTo}. Note: this method only
* merges mappings, not lookup structures. Conflicts are returned as exceptions.
*/
public static void merge(Mapping mergeInto, Mapping mergeWith) {

This comment has been minimized.

Copy link
@rjernst

rjernst Apr 17, 2015

Member

This seems like a duplicate of the method above?

This comment has been minimized.

Copy link
@jpountz

jpountz Apr 20, 2015

Author Contributor

It does the same thing indeed, but on a Mapping object instead of a Mapper.

}
final PutMappingResponse response = requestBuilder.get();
if (response.isAcknowledged() == false) {
throw new ElasticsearchIllegalStateException("Failed to validate dynamic mappings updates on the master node - response was not acknowledged");

This comment has been minimized.

Copy link
@rjernst

rjernst Apr 17, 2015

Member

Is illegal state the right exception to use? I would normally use this for inconsistent state (meaning we have broken code)?

}
}

if (!mergeContext.mergeFlags().simulate()) {

This comment has been minimized.

Copy link
@rjernst

rjernst Apr 17, 2015

Member

Use == false like we do in many other places?

if (docMapper.v2() != null) {
doc.addDynamicMappingsUpdate(docMapper.v2());
}
if (doc.dynamicMappingsUpdate() != null) {

This comment has been minimized.

Copy link
@rjernst

rjernst Apr 17, 2015

Member

could you not merge this with the if above?

This comment has been minimized.

Copy link
@jpountz

jpountz Apr 20, 2015

Author Contributor

I don't think we can? It might happen that the first condition is false and the second is true if you only got dynamic updates through copy_to directives (which needs to be handled differently since it can insert dynamic mappings at arbitrary places in the mappings).

{"type":{"_timestamp":{"enabled":false},"_index":{"enabled":false},"_size":{"enabled":false}}}

This comment has been minimized.

Copy link
@rjernst

rjernst Apr 17, 2015

Member

removed newline?

This comment has been minimized.

Copy link
@jpountz

jpountz Apr 20, 2015

Author Contributor

Yes, so that tests can check "foo".equals(string) instead of "foo\n".equals(string)

@rjernst

This comment has been minimized.

Copy link
Member

commented Apr 17, 2015

The mappings side looks great! My main question is about what to do on translog replay if sending mapping updates to the master fails.

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Apr 17, 2015

Good question. This should hopefully not happen once mappings updates are synchronous but I believe this could happen when upgrading from 1.x. In that case I don't think we can do better than warning a scary message to the logs?

@rjernst

This comment has been minimized.

Copy link
Member

commented Apr 19, 2015

@jpountz Ok, let's put it at warn level then?

@bleskes

This comment has been minimized.

Copy link
Member

commented Apr 19, 2015

I think we should fail the shard in this case. It means that some data was indexed in a way that is inconsistent with the “master” mapping in which case you probably have rogue lucene indices. Picking up on another copy is a better alternative. If we ending up having no copies we are at least clearly communicating things are bad (master have a different mapping that is inconsistent with any shard). If people want to still recover they can intervene and delete the translog (which may loose some data but I think it’s an acceptable solution for this bad place to be in). Also note that we flush on close during node shutdown, so the chance this will happen is small...

On 19 Apr 2015, at 04:01, Ryan Ernst notifications@github.com wrote:

@jpountz Ok, let's put it at warn level then?


Reply to this email directly or view it on GitHub.

@jpountz jpountz removed the WIP label Apr 20, 2015
@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Apr 20, 2015

I added (and beasted) more tests about concurrent indexing requests and the rivers issue seems to be specific to rivers actually so I left mapping updates handled like today when it comes to rivers (ie. updated locally first and then propagated to the master).

Also failing to apply a mapping update on recovery now fails the shard.

logger.debug("waited for mapping update on master for [{}], yet timed out", type);
private void validateMappingUpdate(final String type, Mapping update) {
final CountDownLatch latch = new CountDownLatch(1);
final Throwable[] error = new Throwable[1];

This comment has been minimized.

Copy link
@kimchy

kimchy Apr 20, 2015

Member

can this be an AtomicReference?

@Override
public void run() throws InterruptedException {
try {
if (latch.await(waitForMappingUpdatePostRecovery.millis(), TimeUnit.MILLISECONDS) == false) {

This comment has been minimized.

Copy link
@bleskes

bleskes Apr 20, 2015

Member

waitForMappingUpdatePostRecovery defaults to 30s, give that the timeout now results in a failed shard (good!) I think we should be more lenient. Especially given the fact that local gateway recovery runs on full cluster restart where the master might be overloaded by things to do. How about something very conservative like 15m (which is what we use for the same update mapping - see RecoverySourceHandler#updateMappingOnMaster)

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Apr 20, 2015

@bleskes @kimchy Pushed a new commit to address your comments

jpountz added 2 commits Apr 16, 2015
This commit changes dynamic mappings updates so that they are synchronous on the
entire cluster and their validity is checked by the master node. There are some
important consequences of this commit:
 - a failing index request on a non-existing type does not implicitely create
   the type anymore
 - dynamic mappings updates cannot create inconsistent mappings on different
   shards
 - indexing requests that introduce new fields might induce latency spikes
   because of the overhead to update the mappings on the master node

Close #8688
…ization.

As requested on #10399
@jpountz jpountz force-pushed the jpountz:fix/validate_mappings_on_master branch from e5da1df to 1adf232 Apr 21, 2015
jpountz added a commit that referenced this pull request Apr 21, 2015
Mappings: Validate dynamic mappings updates on the master node.
@jpountz jpountz merged commit ac74247 into elastic:master Apr 21, 2015
1 check passed
1 check passed
CLA Commit author is a member of Elasticsearch
Details
@purduemike

This comment has been minimized.

Copy link

commented Apr 22, 2015

+1 this is happening everyday for me during an index rollover.
Looks like this fix is slated for v2.0.0. Can we instead merged into the next minor release?

@clintongormley clintongormley changed the title Mappings: Validate dynamic mappings updates on the master node. Validate dynamic mappings updates on the master node. Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.