Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeleteExpiredDataIT.testDeleteExpiredData failure #39575

Closed
gwbrown opened this issue Mar 1, 2019 · 8 comments · Fixed by #39683 or #39574
Closed

DeleteExpiredDataIT.testDeleteExpiredData failure #39575

gwbrown opened this issue Mar 1, 2019 · 8 comments · Fixed by #39683 or #39574
Assignees
Labels
:ml Machine learning >test-failure Triaged test failures from CI

Comments

@gwbrown
Copy link
Contributor

gwbrown commented Mar 1, 2019

Failed build at https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/813/

Does not reproduce locally.
Reproduce block:

./gradlew :x-pack:plugin:ml:qa:native-multi-node-tests:integTestRunner \
  -Dtests.seed=26207D14EDCE60BC \
  -Dtests.class=org.elasticsearch.xpack.ml.integration.DeleteExpiredDataIT \
  -Dtests.method="testDeleteExpiredData" \
  -Dtests.security.manager=true \
  -Dtests.locale=pl-PL \
  -Dtests.timezone=Etc/GMT \
  -Dcompiler.java=11 \
  -Druntime.java=8

There are a few interesting stack traces here, but this one jumped out at me, and happens a lot:


[2019-03-01T07:39:15,383][DEBUG][o.e.a.b.TransportShardBulkAction] [node-1] [.ml-state][3] failed to execute bulk item (index) index {[.ml-state-write][doc][non_existing_job_categorizer_state#859], source[{}]}
--
java.lang.IllegalArgumentException: Rejecting mapping update to [.ml-state] as the final mapping would have more than 1 type: [_doc, doc]
at org.elasticsearch.index.mapper.MapperService.internalMerge(MapperService.java:449) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.index.mapper.MapperService.internalMerge(MapperService.java:398) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:331) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.applyRequest(MetaDataMappingService.java:315) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.execute(MetaDataMappingService.java:238) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:687) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:210) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]

But this is also present:


[2019-03-01T07:46:45,108][WARN ][o.e.p.PersistentTasksNodeService] [node-2] task datafeed-realtime-job-given-process-is-killed-datafeed failed with an exception
--
org.elasticsearch.xpack.ml.datafeed.DatafeedJob$AnalysisProblemException: ElasticsearchStatusException[Cannot perform requested action because job [realtime-job-given-process-is-killed] is not open]
at org.elasticsearch.xpack.ml.datafeed.DatafeedJob.flushJob(DatafeedJob.java:457) ~[x-pack-ml-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.xpack.ml.datafeed.DatafeedJob.run(DatafeedJob.java:396) ~[x-pack-ml-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.xpack.ml.datafeed.DatafeedJob.runRealtime(DatafeedJob.java:177) ~[x-pack-ml-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.xpack.ml.datafeed.DatafeedManager$Holder.executeRealTime(DatafeedManager.java:397) ~[x-pack-ml-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.xpack.ml.datafeed.DatafeedManager$Holder.access$600(DatafeedManager.java:290) [x-pack-ml-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.xpack.ml.datafeed.DatafeedManager$3.doRun(DatafeedManager.java:227) [x-pack-ml-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.elasticsearch.ElasticsearchStatusException: Cannot perform requested action because job [realtime-job-given-process-is-killed] is not open
at org.elasticsearch.xpack.core.ml.utils.ExceptionsHelper.conflictStatusException(ExceptionsHelper.java:50) ~[?:?]
at org.elasticsearch.xpack.ml.action.TransportJobTaskAction.doExecute(TransportJobTaskAction.java:55) ~[?:?]
at org.elasticsearch.xpack.ml.action.TransportJobTaskAction.doExecute(TransportJobTaskAction.java:31) ~[?:?]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:145) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$apply$0(SecurityActionFilter.java:86) ~[?:?]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$authorizeRequest$4(SecurityActionFilter.java:171) ~[?:?]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$4(AuthorizationService.java:237) ~[?:?]
at org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:599) ~[?:?]
at org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:574) ~[?:?]
at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.xpack.security.authz.RBACEngine.authorizeClusterAction(RBACEngine.java:140) ~[?:?]
at org.elasticsearch.xpack.security.authz.AuthorizationService.authorizeAction(AuthorizationService.java:239) ~[?:?]
at org.elasticsearch.xpack.security.authz.AuthorizationService.maybeAuthorizeRunAs(AuthorizationService.java:223) ~[?:?]
at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorize$1(AuthorizationService.java:189) ~[?:?]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.xpack.security.authz.RBACEngine.lambda$resolveAuthorizationInfo$1(RBACEngine.java:113) ~[?:?]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.getRoles(CompositeRolesStore.java:193) ~[?:?]
at org.elasticsearch.xpack.security.authz.RBACEngine.getRoles(RBACEngine.java:119) ~[?:?]
at org.elasticsearch.xpack.security.authz.RBACEngine.resolveAuthorizationInfo(RBACEngine.java:107) ~[?:?]
at org.elasticsearch.xpack.security.authz.AuthorizationService.authorize(AuthorizationService.java:191) ~[?:?]
at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.authorizeRequest(SecurityActionFilter.java:171) ~[?:?]
at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$3(SecurityActionFilter.java:157) ~[?:?]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:245) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$6(AuthenticationService.java:305) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:316) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:243) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:195) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:138) ~[?:?]
at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.applyInternal(SecurityActionFilter.java:154) ~[?:?]
at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$apply$2(SecurityActionFilter.java:100) ~[?:?]
at org.elasticsearch.xpack.core.security.SecurityContext.executeAsUser(SecurityContext.java:97) ~[?:?]
at org.elasticsearch.xpack.security.authz.AuthorizationUtils.switchUserBasedOnActionOriginAndExecute(AuthorizationUtils.java:115) ~[?:?]
at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:98) ~[?:?]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:143) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:121) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:64) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:83) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:72) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:394) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:383) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.xpack.ml.datafeed.DatafeedJob.flushJob(DatafeedJob.java:444) ~[x-pack-ml-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
... 10 more
[2019-03-01T07:46:45,112][INFO ][o.e.x.m.d.DatafeedManager] [node-2] [realtime_analysis_error] datafeed [realtime-job-given-process-is-killed-datafeed] for job [realtime-job-given-process-is-killed] has been stopped
@gwbrown gwbrown added >test-failure Triaged test failures from CI :ml Machine learning labels Mar 1, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@davidkyle
Copy link
Member

This will be closed by #39574

@droberts195
Copy link
Contributor

There are two interrelated problems:

  1. The type conflict makes the test very slow - the error gets logged 10010 times which takes a long time - see also ML QA tasks are slow, take up 20 minutes of build time #37339 (comment)
  2. If the slowness results in the job taking longer than 10 seconds to run then an assertBusy times out and results in the test failing, but not in the most obvious way because the C++ process gets killed by the cleanup and that appears as the cause of the failure rather than the fact that the assertBusy timed out

@droberts195
Copy link
Contributor

droberts195 commented Mar 1, 2019

I hope and believe that this is only broken in the master branch at the moment. For 7.x and 7.0 we'll incorporate the fixes of #39574 into the backport PRs for #39256, namely #39573 and #39572.

@gwbrown
Copy link
Contributor Author

gwbrown commented Mar 3, 2019

@gwbrown gwbrown reopened this Mar 3, 2019
droberts195 added a commit that referenced this issue Mar 3, 2019
droberts195 added a commit that referenced this issue Mar 3, 2019
droberts195 added a commit that referenced this issue Mar 3, 2019
@droberts195
Copy link
Contributor

I have muted this test in the following commits:

master: fcd72ec
7.x: 085ff38
7.0: bea2628

@dimitris-athanasiou
Copy link
Contributor

The new failure is that the test timed out while waiting for the jobs to close. This was probably caused due to slow workers and my changes that we now run the 5 jobs in parallel. I'm looking into what should the best solution be.

@original-brownbear
Copy link
Member

link #39658 which seems to cause this. Maybe it's an option to simply turn off the audit logging for this test?

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this issue Mar 5, 2019
Investigating how to make DeleteExpiredDataIT faster, it was
revealed that the security audit trail threads were quite hot.
Disabling that seems to be helping quite a bit with making this
test faster. This commit also unmutes the test to see how it goes
with the audit trail disabled.

Relates elastic#39658
Closes elastic#39575
dimitris-athanasiou added a commit that referenced this issue Mar 5, 2019
Investigating how to make DeleteExpiredDataIT faster, it was
revealed that the security audit trail threads were quite hot.
Disabling that seems to be helping quite a bit with making this
test faster. This commit also unmutes the test to see how it goes
with the audit trail disabled.

Relates #39658
Closes #39575
dimitris-athanasiou added a commit that referenced this issue Mar 5, 2019
Investigating how to make DeleteExpiredDataIT faster, it was
revealed that the security audit trail threads were quite hot.
Disabling that seems to be helping quite a bit with making this
test faster. This commit also unmutes the test to see how it goes
with the audit trail disabled.

Relates #39658
Closes #39575
dimitris-athanasiou added a commit that referenced this issue Mar 5, 2019
Investigating how to make DeleteExpiredDataIT faster, it was
revealed that the security audit trail threads were quite hot.
Disabling that seems to be helping quite a bit with making this
test faster. This commit also unmutes the test to see how it goes
with the audit trail disabled.

Relates #39658
Closes #39575
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >test-failure Triaged test failures from CI
Projects
None yet
6 participants