ILM use Priority.IMMEDIATE for stop ILM cluster update #54909

andreidan · 2020-04-07T17:08:09Z

This changes the priority of the cluster state update that stops ILM
altogether to IMMEDIATE. We've chosen to change this as it can be useful to
temporarily stop ILM if a cluster is overwhelmed, but a NORMAL
priority can see the "stop ilm update" not make it up the tasks queue.

On the same note, we're keeping the start ILM cluster update priority
to NORMAL on purpose such that we only start ILM if the cluster can
handle it.

This changes the priority of the cluster state update that stops ILM altogether to `URGENT`. We've chosen to change this as it can be useful to temporarily stop ILM if a cluster is overwhelemed, but a `NORMAL` priority can see the "stop ilm update" not make it up the tasks queue. On the same note, we're keeping the `start ILM` cluster update priority to `NORMAL` on purpose such that we only start `ILM` if the cluster can handle it.

elasticmachine · 2020-04-07T17:08:10Z

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

andreidan · 2020-04-07T17:08:21Z

@elasticmachine update branch

dakrone

Thanks @andreidan, this needs to update the second half of the shutdown to also use the priority though, so both parts are urgent

x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/ilm/action/TransportStopILMAction.java

DaveCTurner · 2020-04-08T10:47:33Z

I suggest IMMEDIATE rather than URGENT for this task. I don't know of any starvation issues at URGENT priority but my concern is that we may uncover such an issue in future which prevents us from stopping ILM in a timely fashion. This action is very lightweight, only available to admins, rarely called, but useful when the cluster is under severe pressure, and I can't think of a reason for spamming it that isn't obviously abusive.

Please sprinkle some comments throughout these cluster state updates indicating that they may be running at IMMEDIATE priority, as a defence against accidentally making them more heavyweight in future.

DaveCTurner · 2020-04-08T10:57:16Z

Also FWIW you can test that the task really runs at a high enough priority with something like this:

elasticsearch/server/src/test/java/org/elasticsearch/cluster/ClusterHealthIT.java

Lines 268 to 287 in 95a7eed

    
           final AtomicBoolean keepSubmittingTasks = new AtomicBoolean(true); 
        
           final ClusterService clusterService = internalCluster().getInstance(ClusterService.class, internalCluster().getMasterName()); 
        
           clusterService.submitStateUpdateTask("looping task", new ClusterStateUpdateTask(Priority.LOW) { 
        
                   @Override 
        
                   public ClusterState execute(ClusterState currentState) { 
        
                       return currentState; 
        
                   } 
        
                   @Override 
        
                   public void onFailure(String source, Exception e) { 
        
                       throw new AssertionError(source, e); 
        
                   } 
        
                   @Override 
        
                   public void clusterStateProcessed(String source, ClusterState oldState, ClusterState newState) { 
        
                       if (keepSubmittingTasks.get()) { 
        
                           clusterService.submitStateUpdateTask("looping task", this); 
        
                       } 
        
                   } 
        
               });

(replace Priority.LOW with Priority.IMMEDIATE) -- even if the cluster is processing this looping task it should still process other IMMEDIATE tasks but nothing lower. I prefer that to simply asserting that it returns the right .priority().

andreidan · 2020-04-08T12:15:19Z

@DaveCTurner thanks for pointing out the ClusterHealthIT. I wasn't aware of it.

If you don't have a strong opinion on adding an integration test, I'd rather not, as ILM is a client of the ClusterService, using the ClusterService API according to the specification (I think ensuring the order and fairness of task execution is something that we should, and probably are, test as part of the cluster module - I think the PrioritizedExecutorsTests is making sure the executor invariants hold)

dakrone

LGTM, thanks Andrei

This changes the priority of the cluster state update that stops ILM altogether to `IMMEDIATE`. We've chosen to change this as it can be useful to temporarily stop ILM if a cluster is overwhelmed, but a `NORMAL` priority can see the "stop ILM update" not make it up the tasks queue. On the same note, we're keeping the `start ILM` cluster update priority to `NORMAL` on purpose such that we only start `ILM` if the cluster can handle it. (cherry picked from commit d67df3a) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>

…#55017) * ILM use Priority.IMMEDIATE for stop ILM cluster update (#54909) This changes the priority of the cluster state update that stops ILM altogether to `IMMEDIATE`. We've chosen to change this as it can be useful to temporarily stop ILM if a cluster is overwhelmed, but a `NORMAL` priority can see the "stop ILM update" not make it up the tasks queue. On the same note, we're keeping the `start ILM` cluster update priority to `NORMAL` on purpose such that we only start `ILM` if the cluster can handle it. (cherry picked from commit d67df3a) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>

…#55018) * ILM use Priority.IMMEDIATE for stop ILM cluster update (#54909) This changes the priority of the cluster state update that stops ILM altogether to `IMMEDIATE`. We've chosen to change this as it can be useful to temporarily stop ILM if a cluster is overwhelmed, but a `NORMAL` priority can see the "stop ILM update" not make it up the tasks queue. On the same note, we're keeping the `start ILM` cluster update priority to `NORMAL` on purpose such that we only start `ILM` if the cluster can handle it. (cherry picked from commit d67df3a) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>

…#55016) This changes the priority of the cluster state update that stops ILM altogether to `IMMEDIATE`. We've chosen to change this as it can be useful to temporarily stop ILM if a cluster is overwhelmed, but a `NORMAL` priority can see the "stop ILM update" not make it up the tasks queue. On the same note, we're keeping the `start ILM` cluster update priority to `NORMAL` on purpose such that we only start `ILM` if the cluster can handle it. (cherry picked from commit d67df3a) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>

andreidan added :Data Management/ILM+SLM Index and Snapshot lifecycle management v8.0.0 v7.8.0 v7.7.1 labels Apr 7, 2020

Merge branch 'master' into stop-ilm-cluster-update-priority

6d6aade

andreidan requested a review from dakrone April 7, 2020 17:08

dakrone requested changes Apr 7, 2020

View reviewed changes

x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/ilm/action/TransportStopILMAction.java Outdated Show resolved Hide resolved

andreidan added 2 commits April 8, 2020 09:49

ILM Make STOPPED cluster updates URGENT

791e602

Fix tests

fc6c65f

Make stop ILM updates run at IMMEDIATE priority

04b08eb

andreidan changed the title ~~ILM use Priority.URGENT for stop ILM cluster update~~ ILM use Priority.IMMEDIATE for stop ILM cluster update Apr 8, 2020

andreidan requested a review from dakrone April 8, 2020 13:11

dakrone approved these changes Apr 8, 2020

View reviewed changes

andreidan merged commit d67df3a into elastic:master Apr 8, 2020

andreidan added backport pending v7.6.3 labels Apr 8, 2020

This was referenced Apr 9, 2020

[7.6] ILM use Priority.IMMEDIATE for stop ILM cluster update (#54909) #55016

Merged

[7.7] ILM use Priority.IMMEDIATE for stop ILM cluster update (#54909) #55017

Merged

[7.x] ILM use Priority.IMMEDIATE for stop ILM cluster update (#54909) #55018

Merged

andreidan removed the backport pending label Apr 11, 2020

bpintea added v7.7.0 and removed v7.7.1 labels Apr 21, 2020

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ILM use Priority.IMMEDIATE for stop ILM cluster update #54909

ILM use Priority.IMMEDIATE for stop ILM cluster update #54909

andreidan commented Apr 7, 2020 •

edited

Loading

elasticmachine commented Apr 7, 2020

andreidan commented Apr 7, 2020

dakrone left a comment

DaveCTurner commented Apr 8, 2020

DaveCTurner commented Apr 8, 2020

andreidan commented Apr 8, 2020

dakrone left a comment

ILM use Priority.IMMEDIATE for stop ILM cluster update #54909

ILM use Priority.IMMEDIATE for stop ILM cluster update #54909

Conversation

andreidan commented Apr 7, 2020 • edited Loading

elasticmachine commented Apr 7, 2020

andreidan commented Apr 7, 2020

dakrone left a comment

Choose a reason for hiding this comment

DaveCTurner commented Apr 8, 2020

DaveCTurner commented Apr 8, 2020

andreidan commented Apr 8, 2020

dakrone left a comment

Choose a reason for hiding this comment

andreidan commented Apr 7, 2020 •

edited

Loading