Skip to content

lichyc/clusterWatchdog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Cluster Watchdog for JBoss EAP6

If a JEE-AppServer clustering fails, e.g. due to network outage, this can cause harm to your application.

  • By default distributed caches get out of sync, so you have to decide whether you can trust/use cache content.
  • How to deal e.g. with Hibernate L2-Cache?
  • If the size of an application cluster falls below a threshold a reduction of functionality might get forced to survive
  • and plenty more scenarios

The intention of this watchdog application is to provide a framework and core service to allow an active management on little effort. The service provides 2 interfaces which can get implemented to react on changes in cluster topology.

  • To take action if the number of cluster members change, please implement JgroupsViewChangeListener.
  • To listen to changes on the state of the local node, please implement ChannelListener.

Cluster Watchdog Architecture

This basically is an EJB-Jar wrapped by a trivial EAR. So you can use it as a separate deployment-unit or re-use the EJB-Jar inside your application. The Watchdog creates a JGroups channel based on the default JGroups configuration of the JBoss EAP instance it gets deployed into. So it uses the same configuration as used by the web, ejb, infispan cluster, unless you tweaked the default JBoss configuration. Status changes (changeView, disconnect, etc.) on this JGroups channel will trigger events the listener can process. A new channel is used to not predict any established channel or channel name, while reusing the default configuration this channel should behave in-sync with your working channels. The Watchdog is not simply using a org.infinispan.notifications.Listener as this is limited to view changes. Only on the JGroups level we have a chance to get closer to the reason of a state change, that might be of interest to derive the required action.

Please note: Whether a shutdown or a network issue causes a disconnect of a node can get detected on this node only. This is due to the fact that a ping is used to test the connections. So a node is able to detect whether another node is connected or not. If a node doesn't answer any more, there is no chance from outside to figure out why. Only locally it is possible to check whether the connection is closed (intentionally) or disconnected (failure). Same as applies to a ping on OS level.

To get around this limitation the Watchdog is sending it's EAP server state to all members in the cluster when it get destroyed by the server. If a node is sending stopping, the other Watchdog instances can assume that this node will shutdown. The channel is also used to send a assumeNormalOperationsMode to all active clsuter members. All other notifications and no notification before a leaving the view will be processed as failure.

How to extend

If you like to take action on:

  • a another member is joining||disappearing from the cluster: Implement com.redhat.gss.eap6.clustering.JgroupsViewChangeListener and register your class.
  • com.redhat.gss.eap6.clustering.jmx.AbstractJmxViewChangeListener can get used for implementations using JMX.
  • local channel status change (connect/disconnect/close): Implement org.jgroups.ChannelListener and register your class. Please use it for a notofication only and start a new thread for the real operation execution as jgroups is very sensitive on getting blocked.
  • com.redhat.gss.eap6.clustering.jmx.AbstractJmxChannelListener can get used for implementations using JMX.

How to configure

The configuration of the Watchdog is in clusterWatchdog.properties. Simply edit to register you classes. The parameters can get overwritten by using System-Properties, so feel free to use command-line parameters.

How to operate

The usual mode is deploy and let the Watchdog do its job. Nevertheless an MBean is exposed to to ask a cluster to assume normal operation mode. Let's assume in cluster one node had to get hardly killed. The ClusterWatchdog} will recognise this as failure of this node. In case the node can't get restarted to allow a analysis, calling this operation allow all other node to resume to normal operations mode.

Example Implementations of Listeners

  • com.redhat.gss.eap6.clustering.LoggingClusterChannelListener: a trivial implementation of org.jgroups.ChannelListener, which simply logs the state changes on the channel fired by JGroups.
  • com.redhat.gss.eap6.clustering.infinispan.EapInfinispanCacheCleaner: invokes a clear() operation via JMX on an Infinispan entity cache.
  • com.redhat.gss.eap6.clustering.infinispan.InfinispanEapHibernateSecondLevelCacheClear: invokes a clear() operation via JMX on an Infinispan entity cache.

How to build

It's a Maven project, so arrest the usual suspects.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages