Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log cluster health state (green/yellow/red) change #11657

Closed
markwalkom opened this issue Jun 15, 2015 · 10 comments

Comments

Projects
None yet
5 participants
@markwalkom
Copy link
Member

commented Jun 15, 2015

If we go from one state to another we should explicitly log this.

If not at info/warn, even though that'd be super dooper handy, at least at trace/debug.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Jun 15, 2015

We do already, as DEBUG:

[2015-06-15 11:13:16,339][DEBUG][cluster.service          ] [Gladiator] set local cluster state to version 10
@bleskes

This comment has been minimized.

Copy link
Member

commented Jun 15, 2015

few more handy tips:

debug logging also gives you the reason for the change - we try to give it good names. He it says it got a new cluster state from the master.

1> [2015-06-14 04:51:45,068][DEBUG][cluster.service          ] [node_t1] processing [zen-disco-receive(from master [[node_t0][UUL8MVMFT1iz_2G5po6Haw][hotel][local[node_0]]{enable_custom_paths=true, mode=local}])]: execute

if you set the logging level of cluster.service to TRACE you will get the complete cluster state with every change. Note though that it can be huge and so use with care.

@markwalkom

This comment has been minimized.

Copy link
Member Author

commented Jun 16, 2015

Sorry, by state I meant green/yellow/red :)

@markwalkom markwalkom reopened this Jun 16, 2015

@clintongormley clintongormley changed the title Log cluster state change Log cluster health status change Jun 18, 2015

@ppf2 ppf2 changed the title Log cluster health status change Log cluster health state (green/yellow/red) change Oct 23, 2015

@danielmitterdorfer

This comment has been minimized.

Copy link
Member

commented Nov 3, 2015

I've looked into this and thought how it could be implemented. The problem is that currently the cluster health status is calculated on a variety of parameters (see ClusterHealthResponse). The cluster health status is only evaluated when a cluster health REST request is handled. So this calculation is triggered by some user action but not based on an event from within ES itself currently.

So we have basically two problems to solve:

  1. Where do we calculate the cluster health status?
  2. How to recognize cluster state changes in general?

Regarding (1), I can think of different approaches:

  • We could precalculate the cluster health status and put it into ClusterState. However, I am not really convinced that this a good solution as it increases the size of ClusterState further.
  • We could factor out the common logic and recalculate it every time it's needed (i.e. on state change and when handling the before-mentioned REST request).

Regarding (2) I think we'd need to implement a ClusterStateListener which logs the message.

Apart from that, the old cluster health status could be kept around to suppress logging when the cluster health status did not change.

What do think @clintongormley, @bleskes?

@bleskes

This comment has been minimized.

Copy link
Member

commented Nov 3, 2015

I'm +1 on have a generic method to calculate the cluster status (it is fairly encapsulated in ClusterHealthResponse ). Note that status change only happens after specific events. Going from green to yellow to red happens on shard failures and nodes leaving the cluster. Going from red to yellow to green happens on shard started event. We can just do the calculation there and do some logging there (including the reason / what have changed).

@danielmitterdorfer

This comment has been minimized.

Copy link
Member

commented Nov 3, 2015

Thanks for your thoughts. I'll give it a shot and ping you.

@nik9000

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

We can just do the calculation there and do some logging there (including the reason / what have changed).

I wonder if we could go so far as to always log something at those events and raise the log level if the cluster state changed. Something like

public static void logClusterStateChangeEvent(String eventMessage) {
  ClusterHealth oldHealth = fetchOldHealth();
  ClusterHealth newHealth = calculateNewHealth();
  if (oldHealth != newHealth) {
    logger.info("{} and cluster state changed from {} to {}", eventMessage, oldHealth, newHealth);
  } else {
    logger.debug("{} and cluster state stayed {}", eventMessage, oldHealth);
  }
}

That way if you switched to DEBUG level logging you'd see this stream of events and it'd be really obvious which one might impact the cluster state. I kinda figure we already have logging at these events so it'd be a matter of hunting them down.

@bleskes

This comment has been minimized.

Copy link
Member

commented Nov 3, 2015

@nik9000 I think we can defiantly do better but watch out with logging cluster states. They can be huge.

@nik9000

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

@nik9000 I think we can defiantly do better but watch out with logging cluster states. They can be huge.

Oh I didn't mean the whole thing! I just meant the health. Yeah. I know full well that they can take up megabytes of text.....

@nik9000

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

Oh I didn't mean the whole thing! I just meant the health. Yeah. I know full well that they can take up megabytes of text.....

And I've fixed the comment so it looks like what I meant the first time around.

danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this issue Nov 16, 2015

Log cluster health status changes
With this commit the cluster health status changes are logged
on INFO level. The change is only logged on master and actively
triggered in AllocationService in order to minimize the impact of
constantly reevaluating ClusterState in a ClusterStateListener
although we know that no health-relevant change happened.

Closes elastic#11657

danielmitterdorfer added a commit that referenced this issue Nov 16, 2015

Log cluster health status changes
With this commit the cluster health status changes are logged
on INFO level. The change is only logged on master and actively
triggered in AllocationService in order to minimize the impact of
constantly reevaluating ClusterState in a ClusterStateListener
although we know that no health-relevant change happened.

Closes #11657

@danielmitterdorfer danielmitterdorfer removed their assignment Nov 16, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.