Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] TransformInsufficientPermissionsIT testTransformPermissionsDeferUnattendedNoDest failing #105683

Closed
idegtiarenko opened this issue Feb 21, 2024 · 5 comments · Fixed by #105759
Assignees
Labels
blocker :ml/Transform Transform Team:ML Meta label for the ML team >test-failure Triaged test failures from CI

Comments

@idegtiarenko
Copy link
Contributor

idegtiarenko commented Feb 21, 2024

Build scan:
https://gradle-enterprise.elastic.co/s/q6zf5ubzzzd2a/tests/:x-pack:plugin:transform:qa:multi-node-tests:javaRestTest/org.elasticsearch.xpack.transform.integration.TransformInsufficientPermissionsIT/testTransformPermissionsDeferUnattendedNoDest

Reproduction line:

./gradlew ':x-pack:plugin:transform:qa:multi-node-tests:javaRestTest' --tests "org.elasticsearch.xpack.transform.integration.TransformInsufficientPermissionsIT.testTransformPermissionsDeferUnattendedNoDest" -Dtests.seed=79583EF32FA6B70B -Dtests.locale=el-GR -Dtests.timezone=Pacific/Enderbury -Druntime.java=21

Applicable branches:
main, 8.13

Reproduces locally?:
Didn't try

Failure history:
Failure dashboard for org.elasticsearch.xpack.transform.integration.TransformInsufficientPermissionsIT#testTransformPermissionsDeferUnattendedNoDest

Failure excerpt:

java.lang.AssertionError: Stats were: {checkpointing={last={checkpoint=1, timestamp_millis=1708495449698, time_upper_bound_millis=1708495448698}}, node={transport_address=127.0.0.1:40149, name=javaRestTest-1, attributes={}, id=mkiRJ3gTS7CPfwTEIHwrvA, ephemeral_id=sm-DQ1zAQWa_utYiFknKzA}, stats={pages_processed=1, index_time_in_ms=0, documents_deleted=0, search_failures=0, index_failures=0, search_total=1, processing_total=1, delete_time_in_ms=0, documents_indexed=0, trigger_count=2, documents_processed=0, search_time_in_ms=12, index_total=0, exponential_avg_checkpoint_duration_ms=53.0, exponential_avg_documents_processed=0.0, processing_time_in_ms=0, exponential_avg_documents_indexed=0.0}, health={issues=[{issue=Privileges check failed, count=1, details=Cannot create transform [transform-permissions-defer-unattended] because user john_junior lacks the required permissions [transform-permissions-defer-unattended-dest:[create_index, index, read], transform-permissions-defer-unattended-index:[read, view_index_metadata]], type=privileges_check_failed}], status=red}, id=transform-permissions-defer-unattended, state=started}
Expected: a collection with size <2>
     but: collection size was <1>

  at __randomizedtesting.SeedInfo.seed([79583EF32FA6B70B:DC424CAF6C04221A]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2123)
  at org.elasticsearch.xpack.transform.integration.TransformInsufficientPermissionsIT.assertRed(TransformInsufficientPermissionsIT.java:591)
  at org.elasticsearch.xpack.transform.integration.TransformInsufficientPermissionsIT.testTransformPermissionsDeferUnattendedNoDest(TransformInsufficientPermissionsIT.java:430)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

@idegtiarenko idegtiarenko added :ml/Transform Transform >test-failure Triaged test failures from CI labels Feb 21, 2024
@elasticsearchmachine elasticsearchmachine added blocker Team:ML Meta label for the ML team labels Feb 21, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@prwhelan prwhelan self-assigned this Feb 21, 2024
@prwhelan
Copy link
Member

Expected: a collection with size <2>
     but: collection size was <1>
{
  "checkpointing": {
    "last": {
      "checkpoint": "1",
      "timestamp_millis": "1708495449698",
      "time_upper_bound_millis": "1708495448698"
    }
  },
  "node": {
    "transport_address": "127.0.0.1:40149",
    "name": "javaRestTest-1",
    "attributes": {},
    "id": "mkiRJ3gTS7CPfwTEIHwrvA",
    "ephemeral_id": "sm-DQ1zAQWa_utYiFknKzA"
  },
  "stats": {
    "pages_processed": "1",
    "index_time_in_ms": "0",
    "documents_deleted": "0",
    "search_failures": "0",
    "index_failures": "0",
    "search_total": "1",
    "processing_total": "1",
    "delete_time_in_ms": "0",
    "documents_indexed": "0",
    "trigger_count": "2",
    "documents_processed": "0",
    "search_time_in_ms": "12",
    "index_total": "0",
    "exponential_avg_checkpoint_duration_ms": "53.0",
    "exponential_avg_documents_processed": "0.0",
    "processing_time_in_ms": "0",
    "exponential_avg_documents_indexed": "0.0"
  },
  "health": {
    "issues": [
      {
        "issue": "Privileges check failed",
        "count": "1",
        "details": "Cannot create transform [transform-permissions-defer-unattended] because user john_junior lacks the required permissions [transform-permissions-defer-unattended-dest:[create_index, index, read], transform-permissions-defer-unattended-index:[read, view_index_metadata]]",
        "type": "privileges_check_failed"
      }
    ],
    "status": "red"
  },
  "id": "transform-permissions-defer-unattended",
  "state": "started"
}

Looks like we were expecting a second issues to exist, odd because we do have the call in the logs:

  1> [2024-02-14T14:32:59,181][INFO ][o.e.x.t.i.TransformInsufficientPermissionsIT] [testTransformPermissionsDeferUnattendedNoDest] Transform audit: [2024-02-14T14:32:58.747Z] [transform-permissions-defer-unattended] [Created transform.] [javaRestTest-1]	
  1> [2024-02-14T14:32:59,182][INFO ][o.e.x.t.i.TransformInsufficientPermissionsIT] [testTransformPermissionsDeferUnattendedNoDest] Transform audit: [2024-02-14T14:32:59.016Z] [transform-permissions-defer-unattended] [Updated transform state to [STARTED].] [javaRestTest-1]	
  1> [2024-02-14T14:32:59,182][INFO ][o.e.x.t.i.TransformInsufficientPermissionsIT] [testTransformPermissionsDeferUnattendedNoDest] Transform audit: [2024-02-14T14:32:59.037Z] [transform-permissions-defer-unattended] [Transform encountered an exception: [Could not create destination index [transform-permissions-defer-unattended-dest] for transform [transform-permissions-defer-unattended]]; Will automatically retry [1/-1]] [javaRestTest-1]	

And we see at least 2 iterations: "trigger_count": "2"

Can't repro with 100 iterations, will try again tomorrow =)

@prwhelan
Copy link
Member

Our audit log for this test

  1> [2024-02-21T19:04:09,813][INFO ][o.e.x.t.i.TransformInsufficientPermissionsIT] [testTransformPermissionsDeferUnattendedNoDest] Transform audit: [2024-02-21T06:04:04.682Z] [transform-permissions-defer-unattended] [Transform encountered an exception: [Could not create destination index [transform-permissions-defer-unattended-dest] for transform [transform-permissions-defer-unattended]]; Will automatically retry [1/-1]] [javaRestTest-1]	

Corresponding stack trace:

» [2024-02-21T06:04:04,663][ERROR][o.e.x.t.p.TransformIndex ] [javaRestTest-1] Could not create destination index [transform-permissions-defer-unattended-dest] for transform [transform-permissions-defer-unattended] org.elasticsearch.ElasticsearchSecurityException: action [indices:admin/create] is unauthorized for user [john_junior] with effective roles [transform_admin] on indices [transform-permissions-defer-unattended-dest], this action is granted by the index privileges [create_index,manage,all]
»  

The next subsequent stack trace is here:

» [2024-02-21T06:04:09,729][WARN ][o.e.x.t.t.TransformIndexer] [javaRestTest-1] [transform-permissions-defer-unattended] unable to load progress information for task. org.elasticsearch.ElasticsearchSecurityException: action [indices:data/read/search] is unauthorized for user [john_junior] with effective roles [transform_admin] on indices [transform-permissions-defer-unattended-index], this action is granted by the index privileges [read,all]	

Which shouldn't be run, we should be calling the failure handler every time because the destination index cannot be created.

Confirmed in tests that we can cycle from 1 to 2 then back to 1 issues in the health blob. We seem to be correctly logging the issue, but then incorrectly overwriting the issue?

@prwhelan
Copy link
Member

https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/TransformIndexer.java#L367

When we retry, this check now evaluates to true, so we do not reattempt creating the index, and the next permission check will not invoke the failure listener so the previous failure gets reset.

prwhelan added a commit to prwhelan/elasticsearch that referenced this issue Feb 22, 2024
For Unattended Transforms, if we fail to create the destination index on
the first run, we will retry the transformation iteration, but we will
not retry the destination index creation on that next iteration.

This change stops the Unattended Transform from progressing beyond the
0th checkpoint, so all retries will include the destination index
creation.

Fix elastic#105683
prwhelan added a commit that referenced this issue Feb 23, 2024
For Unattended Transforms, if we fail to create the destination index on
the first run, we will retry the transformation iteration, but we will
not retry the destination index creation on that next iteration.

This change stops the Unattended Transform from progressing beyond the
0th checkpoint, so all retries will include the destination index
creation.

Fix #105683
Relate #104146
prwhelan added a commit to prwhelan/elasticsearch that referenced this issue Feb 23, 2024
For Unattended Transforms, if we fail to create the destination index on
the first run, we will retry the transformation iteration, but we will
not retry the destination index creation on that next iteration.

This change stops the Unattended Transform from progressing beyond the
0th checkpoint, so all retries will include the destination index
creation.

Fix elastic#105683
Relate elastic#104146
elasticsearchmachine pushed a commit that referenced this issue Feb 23, 2024
For Unattended Transforms, if we fail to create the destination index on
the first run, we will retry the transformation iteration, but we will
not retry the destination index creation on that next iteration.

This change stops the Unattended Transform from progressing beyond the
0th checkpoint, so all retries will include the destination index
creation.

Fix #105683
Relate #104146
@przemekwitek
Copy link
Contributor

When we retry, this check now evaluates to true, so we do not reattempt creating the index, and the next permission check will not invoke the failure listener so the previous failure gets reset.

Good catch, Pat!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker :ml/Transform Transform Team:ML Meta label for the ML team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants