Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReindexIT#testDeleteByQuery and ReindexIT#testUpdateByQuery failures #60811

Closed
DaveCTurner opened this issue Aug 6, 2020 · 3 comments · Fixed by #60834 or #73018
Closed

ReindexIT#testDeleteByQuery and ReindexIT#testUpdateByQuery failures #60811

DaveCTurner opened this issue Aug 6, 2020 · 3 comments · Fixed by #60834 or #73018
Assignees
Labels
:Distributed/Reindex Issues relating to reindex that are not caused by issues further down Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI

Comments

@DaveCTurner
Copy link
Contributor

Build scan:

https://gradle-enterprise.elastic.co/s/jbzekcqwjf7xc
https://gradle-enterprise.elastic.co/s/kbwxhs4vp764k
https://gradle-enterprise.elastic.co/s/cxcldr6dfkzba
https://gradle-enterprise.elastic.co/s/wwzzzkafjmkb4

Repro line:

e.g. REPRODUCE WITH: ./gradlew ':client:rest-high-level:asyncIntegTestRunner' --tests "org.elasticsearch.client.ReindexIT.testDeleteByQuery" -Dtests.seed=E520D301C9568B27 -Dtests.security.manager=true -Dbuild.snapshot=false -Dtests.jvm.argline="-Dbuild.snapshot=false" -Dtests.locale=ru -Dtests.timezone=EST5EDT -Druntime.java=8

Reproduces locally?:

No. There's nothing particularly helpful in the node logs either AFAICT.

Applicable branches:

Only seen on 7.8 and 7.9.

Failure history:

Four failures over the last 90 days. Maybe duplicates #46301?

Failure excerpt:

org.elasticsearch.client.ReindexIT > testDeleteByQuery FAILED |  
-- | --
  | java.lang.AssertionError: |  
  | Expected: a collection with size <1> |  
  | but: collection size was <0> |  
  | at __randomizedtesting.SeedInfo.seed([E520D301C9568B27:EBEFD62A9099F4B1]:0) |  
  | at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) |  
  | at org.junit.Assert.assertThat(Assert.java:956) |  
  | at org.junit.Assert.assertThat(Assert.java:923) |  
  | at org.elasticsearch.client.ReindexIT.testDeleteByQuery(ReindexIT.java:421)

The failure indicates that the rethrottle request did not return the expected task. However we know it's running before the rethrottle request since we call findTaskToRethrottle, and we also know it's running afterwards since subsequent tests report There are still tasks running after this test that might break subsequent tests [indices:data/write/delete/byquery].

@DaveCTurner DaveCTurner added >test-failure Triaged test failures from CI :Distributed/Reindex Issues relating to reindex that are not caused by issues further down labels Aug 6, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Reindex)

@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Aug 6, 2020
@henningandersen
Copy link
Contributor

I think the reason this happens is other background delete and update by query jobs. findTaskToRethrottle could catch a wrong task and try to rethrottle that instead, which fits the symptoms (including that the task continues to run).

henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Aug 6, 2020
ReindexIT would rethrottle any delete or update by query task, fixed to
more precisely match the task started by the test.

#Closes elastic#60811
henningandersen added a commit that referenced this issue Aug 11, 2020
ReindexIT would rethrottle any delete or update by query task, fixed to
more precisely match the task started by the test.

Closes #60811
henningandersen added a commit that referenced this issue Aug 11, 2020
ReindexIT would rethrottle any delete or update by query task, fixed to
more precisely match the task started by the test.

Closes #60811
@tlrx
Copy link
Member

tlrx commented Apr 30, 2021

The test failed again today with a similar failure:

org.elasticsearch.client.ReindexIT > testDeleteByQuery FAILED
    java.lang.AssertionError: 
    Expected: a collection with size <1>
         but: collection size was <0>
        at __randomizedtesting.SeedInfo.seed([EA8533146648813C:E44A363F3F87FEAA]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.junit.Assert.assertThat(Assert.java:923)
        at org.elasticsearch.client.ReindexIT.testDeleteByQuery(ReindexIT.java:266)

Build scan:

Build stats history revels more failure over the past two months:

I'm reopening this issue for investigation and I'll mute the test.

@tlrx tlrx reopened this Apr 30, 2021
tlrx added a commit that referenced this issue Apr 30, 2021
tlrx added a commit that referenced this issue Apr 30, 2021
chengyang14 pushed a commit to chengyang14/elasticsearch that referenced this issue May 7, 2021
henningandersen added a commit to henningandersen/elasticsearch that referenced this issue May 12, 2021
Reindex and friends have tasks that start but are not ready to
rethrottle before they figured out if they are leader or worker
tasks. Now wait for the task to fully start before rethrottling.

Also added additional assertions to help see if the inability
to rethrottle is caused by some failure.

Closes elastic#60811
henningandersen added a commit that referenced this issue Jul 1, 2021
Reindex and friends have tasks that start but are not ready to
rethrottle before they figured out if they are leader or worker
tasks. Now wait for the task to fully start before rethrottling.

Also added additional assertions to help see if the inability
to rethrottle is caused by some failure.

Closes #60811
henningandersen added a commit that referenced this issue Jul 1, 2021
Reindex and friends have tasks that start but are not ready to
rethrottle before they figured out if they are leader or worker
tasks. Now wait for the task to fully start before rethrottling.

Also added additional assertions to help see if the inability
to rethrottle is caused by some failure.

Closes #60811
henningandersen added a commit that referenced this issue Jul 1, 2021
Reindex and friends have tasks that start but are not ready to
rethrottle before they figured out if they are leader or worker
tasks. Now wait for the task to fully start before rethrottling.

Also added additional assertions to help see if the inability
to rethrottle is caused by some failure.

Closes #60811
henningandersen added a commit that referenced this issue Jul 1, 2021
Reindex and friends have tasks that start but are not ready to
rethrottle before they figured out if they are leader or worker
tasks. Now wait for the task to fully start before rethrottling.

Also added additional assertions to help see if the inability
to rethrottle is caused by some failure.

Closes #60811
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Reindex Issues relating to reindex that are not caused by issues further down Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI
Projects
None yet
4 participants