Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI failures due to indices not being cleaned up as expected #46091

Closed
gwbrown opened this issue Aug 28, 2019 · 48 comments
Closed

CI failures due to indices not being cleaned up as expected #46091

gwbrown opened this issue Aug 28, 2019 · 48 comments
Labels
:Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team >test Issues or PRs that are addressing/adding tests >test-failure Triaged test failures from CI

Comments

@gwbrown
Copy link
Contributor

gwbrown commented Aug 28, 2019

There appears to be some kind of problem with the cleanup code that ensures all indices are cleaned up between tests. It's difficult to be sure, but there seems to be a trend of a wide variety of tests failing because of a ResourceAlreadyExistsException when creating an index with no visible cause in the test itself.

I noticed this in this master intake build, which is IndexLifecycleIT.testStartStopILM, which creates an index as one of the first operations it does - the cluster should be blank at this point.

Looking at build stats for failures with messages containing resource_already_exists_exception, we can see it was very quiet prior to July 23, and has built more and more, suggesting this may be related to recent build changes:
Screen Shot 2019-08-28 at 11 56 40 AM
[Note: This graph excludes a 10 minute period on Aug. 15 which had 608 failures, to make the scale clearer]

There is no clear relation between the tests which fail due to indices already existing that shouldn't and a specific functional area. It appears to happen most often to client and YAML tests, although not exclusively. In a few spot checks I've seen:

  • IndexLifecycleIT.testStartStopILM
  • CCSDuelIT.testPagination
  • test {yaml=search/240_date_nanos/date_nanos requires dates after 1970 and before 2262}
  • IndicesClientIT.testCloseExistingIndex
  • DatafeedJobsRestIT.testLookbackWithGeo

All of these appear to have failed while creating indices when the cluster should be a blank slate.

I believe this is also the cause of #45605 and #45805

@gwbrown gwbrown added >test Issues or PRs that are addressing/adding tests :Delivery/Build Build or test infrastructure >test-failure Triaged test failures from CI labels Aug 28, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@droberts195
Copy link
Contributor

Also #45600

@droberts195
Copy link
Contributor

All of these appear to have failed while creating indices when the cluster should be a blank slate.

Have a look at the server side logs. Do they show something like:

[2019-08-14T18:07:42,945][WARN ][o.e.i.IndicesService     ] [integTest-0] [test/ee-7zmiUQGayvOq6LgEShg] failed to delete index
org.elasticsearch.env.ShardLockObtainFailedException: [test][0]: obtaining shard lock timed out after 0ms, previous lock details: [shard creation] trying to lock for [deleting index directory]

If so then it’s not just the test client cleanup code that’s to blame, but the same server code that customers will be running in production, and this is looking like a big worry for 7.4. See #45600 (comment)

@original-brownbear
Copy link
Member

@DaveCTurner I believe you're working on this now as well. So ping :)

@DaveCTurner
Copy link
Contributor

Yep I'm seeing if I can reproduce this, specifically the test failure #45956. I did see that test fail on my machine reporting a similar ShardLockObtainFailedException after 1400 iterations while concurrently running stress -c 16 -i 4 -m 8 -d 4, but unfortunately I was running on a commit from before my holidays (302d29c). I've added more logging and updated to a more recent master and am trying again.

@gwbrown
Copy link
Contributor Author

gwbrown commented Aug 28, 2019

The testStartStopILM failure did not have any indication that the deletion failed. There was an exception roughly aligned with the failure, but it was caused by an IndexNotFoundException on the index that was double-created, after it's logged as being deleted:

[2019-08-28T17:05:49,025][WARN ][o.e.i.s.RetentionLeaseBackgroundSyncAction] [integTest-0] unexpected error during the primary phase for action [indices:admin/seq_no/retention_lease_background_sync], request [RetentionLeaseBackgroundSyncAction.Request{retentionLeases=RetentionLeases{primaryTerm=1, version=1, leases={peer_recovery/sIFqVv9GQ86guDpZYRI0pg=RetentionLease{id='peer_recovery/sIFqVv9GQ86guDpZYRI0pg', retainingSequenceNumber=0, timestamp=1567011946292, source='peer recovery'}}}, shardId=[baz][0], timeout=1m, index='baz', waitForActiveShards=0}]
org.elasticsearch.index.IndexNotFoundException: no such index [baz]
	at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:190) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:116) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteSingleIndex(IndexNameExpressionResolver.java:278) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.concreteIndex(TransportReplicationAction.java:234) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:651) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onNewClusterState(TransportReplicationAction.java:795) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onNewClusterState(ClusterStateObserver.java:311) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:169) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:120) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:112) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retry(TransportReplicationAction.java:792) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$1.handleException(TransportReplicationAction.java:771) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1091) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1200) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1174) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:60) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:56) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onFailure(TransportReplicationAction.java:408) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.handleException(TransportReplicationAction.java:402) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$runWithPrimaryShardReference$3(TransportReplicationAction.java:370) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:70) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:70) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:70) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:64) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:145) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:253) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.seqno.RetentionLeaseBackgroundSyncAction.shardOperationOnPrimary(RetentionLeaseBackgroundSyncAction.java:97) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.seqno.RetentionLeaseBackgroundSyncAction.shardOperationOnPrimary(RetentionLeaseBackgroundSyncAction.java:54) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:916) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:108) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:393) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:315) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$21(IndexShard.java:2753) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:112) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:285) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:237) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2727) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:857) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:311) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.handlePrimaryRequest(TransportReplicationAction.java:274) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:228) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:196) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.lambda$messageReceived$0(SecurityServerTransportInterceptor.java:277) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.authorizeSystemUser(AuthorizationService.java:376) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.authorize(AuthorizationService.java:184) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter.lambda$inbound$1(ServerTransportFilter.java:112) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:246) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$6(AuthenticationService.java:306) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:317) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:244) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:139) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter.inbound(ServerTransportFilter.java:103) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:284) [x-pack-security-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:724) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:769) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]

@dakrone
Copy link
Member

dakrone commented Aug 29, 2019

This failure appears to be related: https://gradle-enterprise.elastic.co/s/6q3xz77vvfo7m/console-log#L3863

@alpar-t
Copy link
Contributor

alpar-t commented Sep 2, 2019

Duplicate of #45605 ?

@benwtrent
Copy link
Member

Another build failure, seems to be related: https://gradle-enterprise.elastic.co/s/wbwkzcamo7a4e

@tvernum
Copy link
Contributor

tvernum commented Sep 4, 2019

@alpar-t
Copy link
Contributor

alpar-t commented Sep 5, 2019

@original-brownbear looks like this is still causing builds to fail. Any thoughts ?

@original-brownbear
Copy link
Member

@atorok the team is looking into this, this morning.

original-brownbear added a commit that referenced this issue Sep 5, 2019
In order to track down #46091:
* Enables debug logging in REST tests for `master` and `coordination` packages
since we suspect that issues are caused by failed and then retried publications
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Sep 5, 2019
)

In order to track down elastic#46091:
* Enables debug logging in REST tests for `master` and `coordination` packages
since we suspect that issues are caused by failed and then retried publications
original-brownbear added a commit that referenced this issue Sep 5, 2019
…46374)

In order to track down #46091:
* Enables debug logging in REST tests for `master` and `coordination` packages
since we suspect that issues are caused by failed and then retried publications
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Sep 6, 2019
Further investigation into elastic#46091, expanding on elastic#46363, to add even more
detailed logging around the retry behaviour during index creation.
DaveCTurner added a commit that referenced this issue Sep 6, 2019
Further investigation into #46091, expanding on #46363, to add even more
detailed logging around the retry behaviour during index creation.
@DaveCTurner
Copy link
Contributor

We have an explanation for why index creation might result in a resource_already_exists_exception response. By default the HttpAsyncClient in use will retry if it hasn't received a response within 30 seconds. We have observed situations where the CI machine grinds to a halt for a while, causing the creation of an index to take over 30 seconds (recalling that it doesn't just create the index, but also waits for the primaries to start). This triggers a retry from the client, but the second attempt discovers that the first attempt already created the index and returns the resource_already_exists_exception that we're seeing.

We can reproduce this by waiting after the index creation completed for the client to time out:

diff --git a/server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataCreateIndexService.java b/server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataCreateIndexService.java
index 9bfbec9..1dc3163 100644
--- a/server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataCreateIndexService.java
+++ b/server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataCreateIndexService.java
@@ -215,6 +215,11 @@ public class MetaDataCreateIndexService {
                         } else {
                             logger.trace("[{}] index created and shards acknowledged", request.index());
                         }
+                        try {
+                            Thread.sleep(30000);
+                        } catch (InterruptedException e) {
+                            throw new AssertionError(e);
+                        }
                         listener.onResponse(new CreateIndexClusterStateUpdateResponse(response.isAcknowledged(), shardsAcknowledged));
                     }, listener::onFailure);
             } else {

We can extend the client's timeout to work around this, and I think it would be good to stop it from retrying at all to save us from going through the same deeply confusing investigation the next time round. However the fundamental issue is why it is taking over 30 seconds to create an index in CI.

The one that we investigated in detail took ~6sec to write the metadata to disk when the index was first created (confirmed by TRACE logging of MetaStateService) then ~4sec to apply the resulting cluster state and then ~22 sec to actually start the primary. The slow metadata writing suggests there's an issue with IO on the CI worker.

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Sep 6, 2019
We are seeing requests take more than the default 30s
which leads to requests being retried and returning
unexpected failures like e.g. "index already exists"
because the initial requests that timed out, worked
out functionally anyway.
=> double the timeout to reduce the likelihood of
the failures described in elastic#46091
=> As suggested in the issue, we should in a follow-up
turn off retrying all-together probably
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Sep 6, 2019
We are seeing requests take more than the default 30s
which leads to requests being retried and returning
unexpected failures like e.g. "index already exists"
because the initial requests that timed out, worked
out functionally anyway.
=> double the timeout to reduce the likelihood of
the failures described in elastic#46091
=> As suggested in the issue, we should in a follow-up
turn off retrying all-together probably
@mark-vieira
Copy link
Contributor

@mark-vieira does the caching implementation push as the build progresses or it's something that happens at the end of the build. It would be nice if there were a way to delay it so it doesn't interfere with it.

Cache pushing happens at the end of execution of each task, so it happens as the build progresses. Because it's part of task execution it's effectively limited by existing parallelism. With the local cache disabled the additional IO here should be minimal as we've already built the outputs and Gradle streams the result to the remote server. The only IO overhead is in the form of read, and this is probably going to be bottlenecked by network IO to the remote cache anyhow. From what I can tell it's mostly write IOPS hurting tests.

In otherwords, I suspect the addition of the remote build cache amounts to a negligible IO overhead compared to the build/tests themselves. I've done experiments in the past that confirm this suspicion as well.

@matriv
Copy link
Contributor

matriv commented Sep 20, 2019

@matriv
Copy link
Contributor

matriv commented Sep 20, 2019

@alpar-t
Copy link
Contributor

alpar-t commented Sep 20, 2019

Not sure those are the same failure @matriv it's more likely something with that particular test. I suggest we open a different issue for it and mute

@alpar-t
Copy link
Contributor

alpar-t commented Sep 20, 2019

I'm going ahead and close this issue to avoid confusion.
After moving to a ram-disk, this has not happened again in over a week.
@original-brownbear we can track the removal of retries from the client in a different ticket if you would like. That's the reason I initially left this one open.

@alpar-t alpar-t closed this as completed Sep 20, 2019
alpar-t added a commit that referenced this issue Sep 23, 2019
The trace logging was added for #46091.
Now that it's closed we can remove it.
jkakavas pushed a commit to jkakavas/elasticsearch that referenced this issue Sep 25, 2019
The trace logging was added for elastic#46091.
Now that it's closed we can remove it.
@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Mar 14, 2021
Older versions don't support component / composable index templates
and/or data streams. Yet the test base class tries to remove objects
after each test, which adds a significant number of lines to the
log files (which slows the tests down). The ESRestTestCase will
now check whether all nodes have a specific version and then decide
whether data streams and component / composable index templates will
be deleted.

Also removed old debug log config that was enabled to investigate
a build failure (elastic#46091), but has been closed. However the debug logging
added many lines log lines to the log files.

Relates to elastic#69973
martijnvg added a commit that referenced this issue Mar 15, 2021
…0361)

Backport of the testing related changes from #70314:

Older versions don't support component / composable index templates
and/or data streams. Yet the test base class tries to remove objects
after each test, which adds a significant number of lines to the
log files (which slows the tests down). The ESRestTestCase will
now check whether all nodes have a specific version and then decide
whether data streams and component / composable index templates will
be deleted.

Also ensured that the logstash-index-template and security-index-template
aren't deleted between tests, these templates are builtin templates that
ES will install if missing. So if tests remove these templates between tests
then ES will add these template back almost immediately. These causes
many log lines and a lot of cluster state updates, which slow tests down.

Also removed old debug log config that was enabled to investigate
a build failure (#46091), but has been closed. However the debug logging
added many lines log lines to the log files. Note this change wasn't part
of #70314.

Relates to #69973
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Mar 15, 2021
…astic#70361)

Backport of the testing related changes from elastic#70314:

Older versions don't support component / composable index templates
and/or data streams. Yet the test base class tries to remove objects
after each test, which adds a significant number of lines to the
log files (which slows the tests down). The ESRestTestCase will
now check whether all nodes have a specific version and then decide
whether data streams and component / composable index templates will
be deleted.

Also ensured that the logstash-index-template and security-index-template
aren't deleted between tests, these templates are builtin templates that
ES will install if missing. So if tests remove these templates between tests
then ES will add these template back almost immediately. These causes
many log lines and a lot of cluster state updates, which slow tests down.

Also removed old debug log config that was enabled to investigate
a build failure (elastic#46091), but has been closed. However the debug logging
added many lines log lines to the log files. Note this change wasn't part
of elastic#70314.

Relates to elastic#69973
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Mar 15, 2021
…astic#70361)

Backport of the testing related changes from elastic#70314:

Older versions don't support component / composable index templates
and/or data streams. Yet the test base class tries to remove objects
after each test, which adds a significant number of lines to the
log files (which slows the tests down). The ESRestTestCase will
now check whether all nodes have a specific version and then decide
whether data streams and component / composable index templates will
be deleted.

Also ensured that the logstash-index-template and security-index-template
aren't deleted between tests, these templates are builtin templates that
ES will install if missing. So if tests remove these templates between tests
then ES will add these template back almost immediately. These causes
many log lines and a lot of cluster state updates, which slow tests down.

Also removed old debug log config that was enabled to investigate
a build failure (elastic#46091), but has been closed. However the debug logging
added many lines log lines to the log files. Note this change wasn't part
of elastic#70314.

Relates to elastic#69973
martijnvg added a commit that referenced this issue Mar 15, 2021
…70364)

Backporting #70361 to 7.11 branch.

Backport of the testing related changes from #70314:

Older versions don't support component / composable index templates
and/or data streams. Yet the test base class tries to remove objects
after each test, which adds a significant number of lines to the
log files (which slows the tests down). The ESRestTestCase will
now check whether all nodes have a specific version and then decide
whether data streams and component / composable index templates will
be deleted.

Also ensured that the logstash-index-template and security-index-template
aren't deleted between tests, these templates are builtin templates that
ES will install if missing. So if tests remove these templates between tests
then ES will add these template back almost immediately. These causes
many log lines and a lot of cluster state updates, which slow tests down.

Also removed old debug log config that was enabled to investigate
a build failure (#46091), but has been closed. However the debug logging
added many lines log lines to the log files. Note this change wasn't part
of #70314.

Relates to #69973
martijnvg added a commit that referenced this issue Mar 15, 2021
…0363)

Backport of #70361 to 7.12 branch.

Backport of the testing related changes from #70314:

Older versions don't support component / composable index templates
and/or data streams. Yet the test base class tries to remove objects
after each test, which adds a significant number of lines to the
log files (which slows the tests down). The ESRestTestCase will
now check whether all nodes have a specific version and then decide
whether data streams and component / composable index templates will
be deleted.

Also ensured that the logstash-index-template and security-index-template
aren't deleted between tests, these templates are builtin templates that
ES will install if missing. So if tests remove these templates between tests
then ES will add these template back almost immediately. These causes
many log lines and a lot of cluster state updates, which slow tests down.

Also removed old debug log config that was enabled to investigate
a build failure (#46091), but has been closed. However the debug logging
added many lines log lines to the log files. Note this change wasn't part
of #70314.

Relates to #69973
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team >test Issues or PRs that are addressing/adding tests >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests