Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleRoutingIT.testRequiredRoutingMapping fails because a doc remains undeleted #16645

Closed
polyfractal opened this issue Feb 12, 2016 · 4 comments · Fixed by #16675
Closed

SimpleRoutingIT.testRequiredRoutingMapping fails because a doc remains undeleted #16645

polyfractal opened this issue Feb 12, 2016 · 4 comments · Fixed by #16675
Assignees

Comments

@polyfractal
Copy link
Member

@polyfractal polyfractal commented Feb 12, 2016

http://build-us-00.elastic.co/job/es_core_master_metal/12355/

elastic/elasticsearch
master

SUMMARY

Please direct your attention to the attached stacktraces associated with some or all of these tests:

org.elasticsearch.routing.SimpleRoutingIT testRequiredRoutingMapping
FAILED: 1
ERROR: 0
SKIPPED: 80
TOTAL: 5644

BUILD INFO

Build   20160212182431-BE4C2D89
Log https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/406/console
Duration    12m 33s (753008ms)
Started 2016-02-12T18:24:32.000Z
Ended   2016-02-12T18:37:05.008Z
Exit Code   1
Host    slave-3740eaed (up 43 days)
OS  Ubuntu 15.04, Linux 3.19.0-42-generic
Specs   4 CPUs, 15.67GB RAM
java.version    1.8.0_45-internal
java.vm.name    OpenJDK 64-Bit Server VM
java.vm.version 25.45-b02
java.runtime.version    1.8.0_45-internal-b14
java.home   /usr/lib/jvm/java-8-openjdk-amd64

Failure replicates for me.

Due to #10136, when a _routing is defined in the mapping, "Broadcast Deletes" are no longer allowed. The testRequiredRoutingMapping test was updated in the single-delete case so that isExists(), equalTo(true) each time through the loop, since the document should not be deleted.

However, it looks like the Bulk Delete case wasn't changed, so the test is still trying to verify that the bulk delete "broadcasted" despite no _routing being specified:

for (int i = 0; i < 5; i++) {
    try {
        client().prepareGet(indexOrAlias(), "type1", "1").execute().actionGet().isExists();
        fail();
    } catch (RoutingMissingException e) {
        assertThat(e.status(), equalTo(RestStatus.BAD_REQUEST));
        assertThat(e.getMessage(), equalTo("routing is required for [test]/[type1]/[1]"));
    }
    assertThat(
      client().prepareGet(indexOrAlias(), "type1", "1")
        .setRouting("0")
        .execute()
        .actionGet()
        .isExists(),
      equalTo(false));  // <-- Here
}

I assumed this was just a test problem, so I updated it to match the single-delete case (isExists(), equalTo(true)). But that begins to fail under other seeds:

gradle :core:integTest -Dtests.seed=42425D480F1F72E8 -Dtests.class=org.elasticsearch.routing.SimpleRoutingIT -Dtests.method="testRequiredRoutingMapping" -Dtests.locale=es-CL -Dtests.timezone=AET

So either:

  1. The test was correct, Bulk broadcast deletes without _routing are allowed, and the document isn't being deleted for some reason
  2. Or the test is incorrect, bulk broadcast deletes shouldn't be allowed, but they are working anyway under certain conditions.

/cc @javanna any thoughts?

@polyfractal polyfractal added the >test label Feb 12, 2016
polyfractal added a commit that referenced this issue Feb 12, 2016
@clintongormley

This comment has been minimized.

Copy link
Member

@clintongormley clintongormley commented Feb 13, 2016

Or the test is incorrect, bulk broadcast deletes shouldn't be allowed, but they are working anyway under certain conditions.

I'd say the test is incorrect. Could they be working when the custom routing just happens to map to the same shard as default routing would?

@javanna javanna self-assigned this Feb 15, 2016
@javanna

This comment has been minimized.

Copy link
Member

@javanna javanna commented Feb 15, 2016

Wow, the change was made almost a year ago, yet I see there is something missing in bulk. Seems like bulk has always worked differently around routing required hence why the test was left unchanged. This looks like an actual bug, digging and coming up with a PR.

@javanna

This comment has been minimized.

Copy link
Member

@javanna javanna commented Feb 15, 2016

I did some digging, here are my findings.

As part of #10136 I should have removed broadcast delete from bulk as well, but I didn't realize it had a completely different code path for bulk, so it was left behind, which is why the test was still testing the broadcast delete.

The leftover broadcast delete for bulk is broken in many ways though:

  • one delete item becomes multiple delete requests, one per shard. but the response item is only one, so if one fails, you'll see the failure in the bulk response only if it was the last one returned from the shards.
  • the same delete request object was reused throughout the different shards. As a result, the version in the request was updated with the result of the last delete executed, which affects the following delete operations (if the shards are on the same node) by setting a version that is not -3 (match_any) but rather 1. This may cause stuff like VersionConflictEngineException[[type1][1]: version conflict, current version [2] is higher or equal to the one provided [1]] depending on the execution order and the number of shards.

I have no idea why this test started failing only recently, but the reason why it failed was that in some cases the broadcast delete on the shard that contained the document to delete caused a version conflict, which may get returned or not as part of the bulk response (anyways the response wasn't checked in the test). I think the follow-up is simply to remove any leftover of broadcast delete and return a bulk item failure in case routing is required but it is not specified.

@itcuihao

This comment has been minimized.

Copy link

@itcuihao itcuihao commented Feb 22, 2016

This code is what role ?Thanks.

javanna added a commit to javanna/elasticsearch that referenced this issue Feb 27, 2016
As part of elastic#10136 we removed the transport action for broadcast deletes in case routing is required but not specified. Bulk api worked differently though and kept on doing the broadcast delete internally in that case. This commit makes sure that delete items are marked as failed in such cases. Also the check has been moved up in the code together with the existing check for the update api, and we now make sure that the exception is the same as the one thrown for single document apis (delete/update).

Note that the failure for the update api contained the wrong optype (the type of the document rather than "update"), that's been fixed too and tested.

Closes elastic#16645
javanna added a commit to javanna/elasticsearch that referenced this issue Feb 29, 2016
As part of elastic#10136 we removed the transport action for broadcast deletes in case routing is required but not specified. Bulk api worked differently though and kept on doing the broadcast delete internally in that case. This commit makes sure that delete items are marked as failed in such cases. Also the check has been moved up in the code together with the existing check for the update api, and we now make sure that the exception is the same as the one thrown for single document apis (delete/update).

Note that the failure for the update api contained the wrong optype (the type of the document rather than "update"), that's been fixed too and tested.

Closes elastic#16645
javanna added a commit to javanna/elasticsearch that referenced this issue Feb 29, 2016
As part of elastic#10136 we removed the transport action for broadcast deletes in case routing is required but not specified. Bulk api worked differently though and kept on doing the broadcast delete internally in that case. This commit makes sure that delete items are marked as failed in such cases. Also the check has been moved up in the code together with the existing check for the update api, and we now make sure that the exception is the same as the one thrown for single document apis (delete/update).

Note that the failure for the update api contained the wrong optype (the type of the document rather than "update"), that's been fixed too and tested.

Closes elastic#16645
@lcawl lcawl added :Distributed/CRUD and removed :Bulk labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.