- 
                Notifications
    
You must be signed in to change notification settings  - Fork 25.6k
 
Fix: ILMDownsampleDisruptionIT.testILMDownsampleRollingRestart #137423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 
           Pinging @elastic/es-storage-engine (Team:StorageEngine)  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the investigation, Mary!!
And sorry that this ended up being ILM/Data Management after all...
| // We assert that ILM to successfully completed the phase | ||
| assertBusy(() -> { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this with ClusterServiceUtils#addTemporaryStateListener rather than polling any API? We can check the API response afterwards ofc but every assertBusy adds a little extra idle time to tests and in total that's a pretty big contributor to overall test runtime.
        
          
                ...c/internalClusterTest/java/org/elasticsearch/xpack/downsample/ILMDownsampleDisruptionIT.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | }, 60, TimeUnit.SECONDS); | ||
| // We assert that ILM successfully completed the phase | ||
| logger.info("Waiting for ILM to complete the phase for index [{}]", targetIndex); | ||
| ClusterServiceUtils.addTemporaryStateListener(clusterState -> { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just adds (and returns) the listener - you have to safeAwait on it too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
face palm.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also use awaitClusterState instead :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
        
          
                ...c/internalClusterTest/java/org/elasticsearch/xpack/downsample/ILMDownsampleDisruptionIT.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | ensureGreen(targetIndex); | ||
| // We assert that ILM successfully completed the phase | ||
| logger.info("Waiting for ILM to complete the phase for index [{}]", targetIndex); | ||
| awaitClusterState(clusterState -> { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh yes I'd completely forgotten we had an awaitClusterState to do this 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too, credits to @nielsbauman
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have less of an excuse tho 😇
$ git annotate -- test/framework/src/main/java/org/elasticsearch/test/ClusterServiceUtils.java | grep awaitClusterState
1337476bc9efc	(David Turner	2025-08-26 12:54:45 +0100	241)    public static void awaitClusterState(Predicate<ClusterState> statePredicate, ClusterService clusterService) {
…ticsearch/xpack/downsample/ILMDownsampleDisruptionIT.java Co-authored-by: David Turner <david.turner@elastic.co>
This PR stabilises
testILMDownsampleRollingRestartand closes #136585.This test started failing more often after #135834 because this change increased the chance of manifesting #137422.
To reduce this chance we wait for ILM to finish the phase before we end the test.
Closes #136585