CI failures due to indices not being cleaned up as expected #46091

gwbrown · 2019-08-28T18:07:25Z

There appears to be some kind of problem with the cleanup code that ensures all indices are cleaned up between tests. It's difficult to be sure, but there seems to be a trend of a wide variety of tests failing because of a ResourceAlreadyExistsException when creating an index with no visible cause in the test itself.

I noticed this in this master intake build, which is IndexLifecycleIT.testStartStopILM, which creates an index as one of the first operations it does - the cluster should be blank at this point.

Looking at build stats for failures with messages containing resource_already_exists_exception, we can see it was very quiet prior to July 23, and has built more and more, suggesting this may be related to recent build changes:

[Note: This graph excludes a 10 minute period on Aug. 15 which had 608 failures, to make the scale clearer]

There is no clear relation between the tests which fail due to indices already existing that shouldn't and a specific functional area. It appears to happen most often to client and YAML tests, although not exclusively. In a few spot checks I've seen:

IndexLifecycleIT.testStartStopILM
CCSDuelIT.testPagination
test {yaml=search/240_date_nanos/date_nanos requires dates after 1970 and before 2262}
IndicesClientIT.testCloseExistingIndex
DatafeedJobsRestIT.testLookbackWithGeo

All of these appear to have failed while creating indices when the cluster should be a blank slate.

I believe this is also the cause of #45605 and #45805

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-08-28T18:07:27Z

Pinging @elastic/es-core-infra

polyfractal · 2019-08-28T18:17:27Z

Some doc test failures that are probably the same / related:

droberts195 · 2019-08-28T18:48:23Z

Also #45600

droberts195 · 2019-08-28T18:55:14Z

All of these appear to have failed while creating indices when the cluster should be a blank slate.

Have a look at the server side logs. Do they show something like:

[2019-08-14T18:07:42,945][WARN ][o.e.i.IndicesService     ] [integTest-0] [test/ee-7zmiUQGayvOq6LgEShg] failed to delete index
org.elasticsearch.env.ShardLockObtainFailedException: [test][0]: obtaining shard lock timed out after 0ms, previous lock details: [shard creation] trying to lock for [deleting index directory]

If so then it’s not just the test client cleanup code that’s to blame, but the same server code that customers will be running in production, and this is looking like a big worry for 7.4. See #45600 (comment)

original-brownbear · 2019-08-28T19:33:58Z

@DaveCTurner I believe you're working on this now as well. So ping :)

DaveCTurner · 2019-08-28T19:40:38Z

Yep I'm seeing if I can reproduce this, specifically the test failure #45956. I did see that test fail on my machine reporting a similar ShardLockObtainFailedException after 1400 iterations while concurrently running stress -c 16 -i 4 -m 8 -d 4, but unfortunately I was running on a commit from before my holidays (302d29c). I've added more logging and updated to a more recent master and am trying again.

gwbrown · 2019-08-28T20:25:07Z

The testStartStopILM failure did not have any indication that the deletion failed. There was an exception roughly aligned with the failure, but it was caused by an IndexNotFoundException on the index that was double-created, after it's logged as being deleted:

[2019-08-28T17:05:49,025][WARN ][o.e.i.s.RetentionLeaseBackgroundSyncAction] [integTest-0] unexpected error during the primary phase for action [indices:admin/seq_no/retention_lease_background_sync], request [RetentionLeaseBackgroundSyncAction.Request{retentionLeases=RetentionLeases{primaryTerm=1, version=1, leases={peer_recovery/sIFqVv9GQ86guDpZYRI0pg=RetentionLease{id='peer_recovery/sIFqVv9GQ86guDpZYRI0pg', retainingSequenceNumber=0, timestamp=1567011946292, source='peer recovery'}}}, shardId=[baz][0], timeout=1m, index='baz', waitForActiveShards=0}]
org.elasticsearch.index.IndexNotFoundException: no such index [baz]
	at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:190) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:116) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteSingleIndex(IndexNameExpressionResolver.java:278) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.concreteIndex(TransportReplicationAction.java:234) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:651) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onNewClusterState(TransportReplicationAction.java:795) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onNewClusterState(ClusterStateObserver.java:311) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:169) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:120) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:112) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retry(TransportReplicationAction.java:792) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$1.handleException(TransportReplicationAction.java:771) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1091) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1200) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1174) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:60) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:56) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onFailure(TransportReplicationAction.java:408) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.handleException(TransportReplicationAction.java:402) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$runWithPrimaryShardReference$3(TransportReplicationAction.java:370) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:70) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:70) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:70) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:64) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:145) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:253) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.seqno.RetentionLeaseBackgroundSyncAction.shardOperationOnPrimary(RetentionLeaseBackgroundSyncAction.java:97) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.seqno.RetentionLeaseBackgroundSyncAction.shardOperationOnPrimary(RetentionLeaseBackgroundSyncAction.java:54) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:916) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:108) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:393) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:315) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$21(IndexShard.java:2753) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:112) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:285) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:237) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2727) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:857) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:311) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.handlePrimaryRequest(TransportReplicationAction.java:274) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:228) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:196) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.lambda$messageReceived$0(SecurityServerTransportInterceptor.java:277) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.authorizeSystemUser(AuthorizationService.java:376) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.authorize(AuthorizationService.java:184) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter.lambda$inbound$1(ServerTransportFilter.java:112) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:246) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$6(AuthenticationService.java:306) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:317) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:244) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:139) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter.inbound(ServerTransportFilter.java:103) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:284) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:724) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:769) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]

dakrone · 2019-08-29T14:29:35Z

This failure appears to be related: https://gradle-enterprise.elastic.co/s/6q3xz77vvfo7m/console-log#L3863

alpar-t · 2019-09-02T08:27:27Z

Duplicate of #45605 ?

benwtrent · 2019-09-03T15:15:28Z

Another build failure, seems to be related: https://gradle-enterprise.elastic.co/s/wbwkzcamo7a4e

tvernum · 2019-09-04T03:23:27Z

Another: https://gradle-enterprise.elastic.co/s/p2au2dmcrjjm4/tests/p5swj3yau5og2-tno7eubuthtis

alpar-t · 2019-09-05T06:35:22Z

@original-brownbear looks like this is still causing builds to fail. Any thoughts ?

original-brownbear · 2019-09-05T06:37:22Z

@atorok the team is looking into this, this morning.

In order to track down #46091: * Enables debug logging in REST tests for `master` and `coordination` packages since we suspect that issues are caused by failed and then retried publications

) In order to track down elastic#46091: * Enables debug logging in REST tests for `master` and `coordination` packages since we suspect that issues are caused by failed and then retried publications

…46374) In order to track down #46091: * Enables debug logging in REST tests for `master` and `coordination` packages since we suspect that issues are caused by failed and then retried publications

Further investigation into elastic#46091, expanding on elastic#46363, to add even more detailed logging around the retry behaviour during index creation.

Further investigation into #46091, expanding on #46363, to add even more detailed logging around the retry behaviour during index creation.

DaveCTurner · 2019-09-06T14:59:28Z

We have an explanation for why index creation might result in a resource_already_exists_exception response. By default the HttpAsyncClient in use will retry if it hasn't received a response within 30 seconds. We have observed situations where the CI machine grinds to a halt for a while, causing the creation of an index to take over 30 seconds (recalling that it doesn't just create the index, but also waits for the primaries to start). This triggers a retry from the client, but the second attempt discovers that the first attempt already created the index and returns the resource_already_exists_exception that we're seeing.

We can reproduce this by waiting after the index creation completed for the client to time out:

diff --git a/server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataCreateIndexService.java b/server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataCreateIndexService.java
index 9bfbec9..1dc3163 100644
--- a/server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataCreateIndexService.java
+++ b/server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataCreateIndexService.java
@@ -215,6 +215,11 @@ public class MetaDataCreateIndexService {
                         } else {
                             logger.trace("[{}] index created and shards acknowledged", request.index());
                         }
+                        try {
+                            Thread.sleep(30000);
+                        } catch (InterruptedException e) {
+                            throw new AssertionError(e);
+                        }
                         listener.onResponse(new CreateIndexClusterStateUpdateResponse(response.isAcknowledged(), shardsAcknowledged));
                     }, listener::onFailure);
             } else {

We can extend the client's timeout to work around this, and I think it would be good to stop it from retrying at all to save us from going through the same deeply confusing investigation the next time round. However the fundamental issue is why it is taking over 30 seconds to create an index in CI.

The one that we investigated in detail took ~6sec to write the metadata to disk when the index was first created (confirmed by TRACE logging of MetaStateService) then ~4sec to apply the resulting cluster state and then ~22 sec to actually start the primary. The slow metadata writing suggests there's an issue with IO on the CI worker.

We are seeing requests take more than the default 30s which leads to requests being retried and returning unexpected failures like e.g. "index already exists" because the initial requests that timed out, worked out functionally anyway. => double the timeout to reduce the likelihood of the failures described in elastic#46091 => As suggested in the issue, we should in a follow-up turn off retrying all-together probably

mark-vieira · 2019-09-10T16:32:22Z

@mark-vieira does the caching implementation push as the build progresses or it's something that happens at the end of the build. It would be nice if there were a way to delay it so it doesn't interfere with it.

Cache pushing happens at the end of execution of each task, so it happens as the build progresses. Because it's part of task execution it's effectively limited by existing parallelism. With the local cache disabled the additional IO here should be minimal as we've already built the outputs and Gradle streams the result to the remote server. The only IO overhead is in the form of read, and this is probably going to be bottlenecked by network IO to the remote cache anyhow. From what I can tell it's mostly write IOPS hurting tests.

In otherwords, I suspect the addition of the remote build cache amounts to a negligible IO overhead compared to the build/tests themselves. I've done experiments in the past that confirm this suspicion as well.

matriv · 2019-09-20T13:40:25Z

Another relevant failure: https://gradle-enterprise.elastic.co/s/drahj2lhx6rvu/console-log?task=:docs:integTestRunner

matriv · 2019-09-20T13:54:42Z

And one more: https://gradle-enterprise.elastic.co/s/tzcow47lnifdq/console-log?task=:docs:integTestRunner

alpar-t · 2019-09-20T14:38:43Z

Not sure those are the same failure @matriv it's more likely something with that particular test. I suggest we open a different issue for it and mute

alpar-t · 2019-09-20T14:39:48Z

I'm going ahead and close this issue to avoid confusion.
After moving to a ram-disk, this has not happened again in over a week.
@original-brownbear we can track the removal of retries from the client in a different ticket if you would like. That's the reason I initially left this one open.

The trace logging was added for #46091. Now that it's closed we can remove it.

Older versions don't support component / composable index templates and/or data streams. Yet the test base class tries to remove objects after each test, which adds a significant number of lines to the log files (which slows the tests down). The ESRestTestCase will now check whether all nodes have a specific version and then decide whether data streams and component / composable index templates will be deleted. Also removed old debug log config that was enabled to investigate a build failure (elastic#46091), but has been closed. However the debug logging added many lines log lines to the log files. Relates to elastic#69973

…0361) Backport of the testing related changes from #70314: Older versions don't support component / composable index templates and/or data streams. Yet the test base class tries to remove objects after each test, which adds a significant number of lines to the log files (which slows the tests down). The ESRestTestCase will now check whether all nodes have a specific version and then decide whether data streams and component / composable index templates will be deleted. Also ensured that the logstash-index-template and security-index-template aren't deleted between tests, these templates are builtin templates that ES will install if missing. So if tests remove these templates between tests then ES will add these template back almost immediately. These causes many log lines and a lot of cluster state updates, which slow tests down. Also removed old debug log config that was enabled to investigate a build failure (#46091), but has been closed. However the debug logging added many lines log lines to the log files. Note this change wasn't part of #70314. Relates to #69973

…astic#70361) Backport of the testing related changes from elastic#70314: Older versions don't support component / composable index templates and/or data streams. Yet the test base class tries to remove objects after each test, which adds a significant number of lines to the log files (which slows the tests down). The ESRestTestCase will now check whether all nodes have a specific version and then decide whether data streams and component / composable index templates will be deleted. Also ensured that the logstash-index-template and security-index-template aren't deleted between tests, these templates are builtin templates that ES will install if missing. So if tests remove these templates between tests then ES will add these template back almost immediately. These causes many log lines and a lot of cluster state updates, which slow tests down. Also removed old debug log config that was enabled to investigate a build failure (elastic#46091), but has been closed. However the debug logging added many lines log lines to the log files. Note this change wasn't part of elastic#70314. Relates to elastic#69973

…70364) Backporting #70361 to 7.11 branch. Backport of the testing related changes from #70314: Older versions don't support component / composable index templates and/or data streams. Yet the test base class tries to remove objects after each test, which adds a significant number of lines to the log files (which slows the tests down). The ESRestTestCase will now check whether all nodes have a specific version and then decide whether data streams and component / composable index templates will be deleted. Also ensured that the logstash-index-template and security-index-template aren't deleted between tests, these templates are builtin templates that ES will install if missing. So if tests remove these templates between tests then ES will add these template back almost immediately. These causes many log lines and a lot of cluster state updates, which slow tests down. Also removed old debug log config that was enabled to investigate a build failure (#46091), but has been closed. However the debug logging added many lines log lines to the log files. Note this change wasn't part of #70314. Relates to #69973

…0363) Backport of #70361 to 7.12 branch. Backport of the testing related changes from #70314: Older versions don't support component / composable index templates and/or data streams. Yet the test base class tries to remove objects after each test, which adds a significant number of lines to the log files (which slows the tests down). The ESRestTestCase will now check whether all nodes have a specific version and then decide whether data streams and component / composable index templates will be deleted. Also ensured that the logstash-index-template and security-index-template aren't deleted between tests, these templates are builtin templates that ES will install if missing. So if tests remove these templates between tests then ES will add these template back almost immediately. These causes many log lines and a lot of cluster state updates, which slow tests down. Also removed old debug log config that was enabled to investigate a build failure (#46091), but has been closed. However the debug logging added many lines log lines to the log files. Note this change wasn't part of #70314. Relates to #69973

gwbrown added >test Issues or PRs that are addressing/adding tests :Delivery/Build Build or test infrastructure >test-failure Triaged test failures from CI labels Aug 28, 2019

alpar-t mentioned this issue Sep 2, 2019

[CI] SmokeTestMultiNodeClientYamlTestSuiteIT fails at indices.stats/10_index #46129

Closed

jkakavas mentioned this issue Sep 4, 2019

File based role definition documentation additions #46304

Merged

original-brownbear mentioned this issue Sep 5, 2019

Enable Debug Logging for Master and Coordination Packages #46363

Merged

original-brownbear mentioned this issue Sep 5, 2019

Enable Debug Logging for Master and Coordination Packages (#46363) #46374

Merged

DaveCTurner mentioned this issue Sep 6, 2019

Add yet more logging around index creation #46431

Merged

DaveCTurner added a commit that referenced this issue Sep 6, 2019

Add yet more logging around index creation (#46431)

96b4f3d

Further investigation into #46091, expanding on #46363, to add even more detailed logging around the retry behaviour during index creation.

original-brownbear mentioned this issue Sep 6, 2019

Increase REST-Test Client Timeout to 60s #46455

Merged

This was referenced Sep 10, 2019

[CI] resource_already_exists_exception failures #45605

Closed

Test reference/query-dsl/has-parent-query/line_124 failing on CI #45805

Closed

gwbrown mentioned this issue Sep 10, 2019

Fix class used to initialize logger in Watcher #46467

Merged

original-brownbear mentioned this issue Sep 10, 2019

[CI] RepositoryS3ClientYamlTestSuiteIT failure due to index already existing #45336

Closed

alpar-t mentioned this issue Sep 11, 2019

Test search/160_exists_query fails #45581

Closed

alpar-t closed this as completed Sep 20, 2019

alpar-t added a commit that referenced this issue Sep 23, 2019

Remove trace logging from testclusters

50d95e1

The trace logging was added for #46091. Now that it's closed we can remove it.

imotov mentioned this issue Nov 12, 2019

Docs test failed tophits-aggregation #45572

Closed

mark-vieira mentioned this issue Jan 6, 2020

[CI] Index creation fails in yml rest tests with ResourceAlreadyExistsException #50598

Closed

dnhatn mentioned this issue Feb 7, 2020

[CI] Multiple tests failing with "some shards are still open" error #52021

Closed

mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020

martijnvg mentioned this issue Mar 14, 2021

Improve ESRestTestCase when running with different node versions. #70361

Merged

martijnvg mentioned this issue Mar 15, 2021

[7.12] Improve ESRestTestCase when running with different node versions. #70363

Merged

martijnvg mentioned this issue Mar 15, 2021

[7.11] Improve ESRestTestCase when running with different node versions. #70364

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI failures due to indices not being cleaned up as expected #46091

CI failures due to indices not being cleaned up as expected #46091

gwbrown commented Aug 28, 2019 •

edited

Loading

elasticmachine commented Aug 28, 2019

polyfractal commented Aug 28, 2019

droberts195 commented Aug 28, 2019

droberts195 commented Aug 28, 2019

original-brownbear commented Aug 28, 2019

DaveCTurner commented Aug 28, 2019

gwbrown commented Aug 28, 2019

dakrone commented Aug 29, 2019

alpar-t commented Sep 2, 2019

benwtrent commented Sep 3, 2019

tvernum commented Sep 4, 2019

alpar-t commented Sep 5, 2019

original-brownbear commented Sep 5, 2019

DaveCTurner commented Sep 6, 2019

mark-vieira commented Sep 10, 2019

matriv commented Sep 20, 2019

matriv commented Sep 20, 2019

alpar-t commented Sep 20, 2019

alpar-t commented Sep 20, 2019

CI failures due to indices not being cleaned up as expected #46091

CI failures due to indices not being cleaned up as expected #46091

Comments

gwbrown commented Aug 28, 2019 • edited Loading

elasticmachine commented Aug 28, 2019

polyfractal commented Aug 28, 2019

droberts195 commented Aug 28, 2019

droberts195 commented Aug 28, 2019

original-brownbear commented Aug 28, 2019

DaveCTurner commented Aug 28, 2019

gwbrown commented Aug 28, 2019

dakrone commented Aug 29, 2019

alpar-t commented Sep 2, 2019

benwtrent commented Sep 3, 2019

tvernum commented Sep 4, 2019

alpar-t commented Sep 5, 2019

original-brownbear commented Sep 5, 2019

DaveCTurner commented Sep 6, 2019

mark-vieira commented Sep 10, 2019

matriv commented Sep 20, 2019

matriv commented Sep 20, 2019

alpar-t commented Sep 20, 2019

alpar-t commented Sep 20, 2019

gwbrown commented Aug 28, 2019 •

edited

Loading