New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for JMX notifications (events) #644 #646
base: master
Are you sure you want to change the base?
Conversation
First attempt
…ification to attribute to be able to re-use JmxResultProcessor (WIP) + extended integration test
❌ Build jmxtrans 99 failed (commit db34444d0e by @dgloeckner) |
|
||
@ToString | ||
@ThreadSafe | ||
public class JMXConnection implements Closeable { | ||
@Nullable private final JMXConnector connector; | ||
@Nonnull @Getter private final MBeanServerConnection mBeanServerConnection; | ||
@Nullable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to not reformat code that you don't touch. It makes reviewing it much easier...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry was a mistake and yes makes reviewing a lot easier ;) I'll double check next time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, that's really a minor issue...
AttributeChangeNotification changeNotification = (AttributeChangeNotification) notification; | ||
return attributes.contains(changeNotification.getAttributeName()); | ||
} | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style issue: I prefer to have preconditions with early return instead of nested blocks. In this case, I think it would be nicer to have:
if (! notification instanceof AttributeChangeNotification) return false;
[...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the style issue, do you think that limiting the support to AttributeChangeNotifications is OK for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that supporting other kind of notification could have value. But starting with just AttributeChange and building up from there is perfectly fine (even better!). We can always improve later if there is a need.
private boolean markedAsDestroyed; | ||
private static final Logger logger = LoggerFactory.getLogger(JMXConnection.class); | ||
|
||
private final IdentityHashMap<Object, QueryNotificationListener> notificationListeners = new IdentityHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that JMXConnections are pooled and should be considered ephemeral. Keeping a list of notificationListeneners as an instance member will probably cause issues when connections are recycled by the pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the design with polling (see the other comment) I needed to map received notifications to the query (List getNotifications(Object query)). We would not need such a mapping if we implement a push-based approach as you suggested. Let's see how it goes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the real issue is that JMX notifications rely on a long lived JMX connection. Which to some extend goes against the model we have. And in all cases, JMX connection can break for a number of reason (transient network issues seems the most obvious). So we need to account for that in some way. Not an easy task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to re-create listeners when re-creating connections. We would need to detect the liveness of JMX connections and deal with listener re-creation. Any hints appreciated :)
private static final Logger logger = LoggerFactory.getLogger(Query.class); | ||
|
||
/** The JMX object representation: java.lang:type=Memory */ | ||
@Nonnull @Getter private final ObjectName objectName; | ||
@Getter private final QueryType queryType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queryType should be @nonnull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
@@ -241,6 +256,41 @@ public String makeTypeNameValueString(List<String> typeNames, String typeNameStr | |||
return ImmutableList.of(); | |||
} | |||
|
|||
public synchronized boolean isNotificationListenerRegistered(JMXConnection jmxConnection) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synchronization is probably not necessary here. The Query itself is immutable, so you're not protecting any state that is local to that class. If synchronization needs to happen, it is in the JMXConnection itself.
|
||
for (ObjectName queryName : query.queryNames(connection)) { | ||
results.addAll(query.fetchResults(connection, queryName)); | ||
if(query.getQueryType() == Query.QueryType.POLL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An Enum looks like a good candidate for a switch
statement instead of if
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure... final cleanup pending :)
} | ||
} else { | ||
// Subscribe to notifications | ||
if(!query.isNotificationListenerRegistered(jmxConnection)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A notification should probably not be executed, but registered at startup. The processing of results should be done when the notification is received, not polled. Basically, the notification listener should directly send results to the ResultProcessor...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design with polling was the simplest starting point I could think of. Just for my understanding, why do you think polling is not a good idea in this case?
Pushing (converted) notifications to registered ResultProcessors could also be implemented.
I need to understand
- where I could register the notification listeners at startup
- how to retrieve the output writers for a given query
Are output writers thread safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Polling an intermediate list of notification requires to keep that state for some time. As you noted in your comment, that state can grow unbounded and end in an OOM (so at least that queue should be bounded). Also, having a queue of notification will delay those notifications from being processed for a somewhat arbitrary reason. If in the end, we're functionally just doing polling of an attribute value, then why not do just that and use the current query mechanics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree to the points about polling. Whether the delay is acceptable depends on the use case I guess. For me having a few seconds delay is acceptable. I disagree that polling from a list of received notifications is equivalent to polling MBeans. In the former case, the publisher sends explicit events and we should receive all of them (if we stay connected). This is not guaranteed with polling of MBeans as we might miss intermediate attribute values.
One problem with the pushing design is that we need to re-register listeners when re-creating JMX connections. I guess MBeanServerConnectionFactory can't do this for us. On the other hand, it's the only one who knows when a connection is created. I was thinking that the initial listener registration could happen from JmxTransformer when we iterate over servers and queries. But how to handle broken and re-created connections is still a mistery to me. Would you have any idea?
@@ -52,6 +52,9 @@ protected void before() throws Throwable { | |||
protected void after() { | |||
System.setOut(originalOut); | |||
System.setErr(originalErr); | |||
// I guess it's useful to print this after completion | |||
System.out.println("What was captured:"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would disagree... It is the job of the client of OutputCapture to print stuff if it make sense. In most cases, it probably does not. And even if it does, it should be sent to a logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found that debugging the integration test was painful. After adding the above output, I was able to see that my test needed to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in this case, you should probably add the output to the integration test itself, via a logger. Or use a debugger when debugging, step through the code and check the state...
*/ | ||
class QueryNotificationListener implements NotificationListener { | ||
|
||
private final List<Notification> notifications = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to use a concurrent queue instead of an ArrayList here. You could then get rid of all the synchronization of this class.
Great to see this moving forward! I added a few comments inline. I really need to take more time to review this (I know next to nothing about JMX notifications). But here is already some feedback. Thanks a lot for digging into this! That's a great addition to jmxtrans!
Not sure I understand the question. I need to spend some more time reading that code.
The
I think that we should treat notifications as events, and process them as soon as they arrive. This might be problematic for some output writers, which have the assumption that results arrive at regular intervals. But for most, it should work and would remove the delay. If we do need to do time based batches, this should be taken out of the
No project should allow package cycles :) In this case, it looks like we have mixed responsibilities a bit. I have not looked into details, but it looks like the code related to notifications should not be in JMXConnection, but probably extracted to its own class, and moved probably to its own package.
Short answer: we deal badly with config changes. We should drop everything and restart, but the connection pools are probably not re-initialized correctly. Note that JMXConnection is pooled and should be considered as an ephemeral object. It can be closed, killed, ... at any time. |
…ners are subscribed when a new JMX connection for a server is created. NotificationProcessor translates notifications to results and passes them to output writers of the query.
❌ Build jmxtrans 101 failed (commit 0b3489ae08 by @dgloeckner) |
Hi, I've just pushed a commit where I changed my design to use pushing instead of polling. Here's how it works:
Some questions popped up:
Keen to hear your feedback. Cheers, |
See #648 for problem with AppVeyorCI |
✅ Build jmxtrans 106 completed (commit aee8030fc9 by @dgloeckner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting
} else { | ||
JMXConnector connection = server.getServerConnection(); | ||
return new JMXConnection(connection, connection.getMBeanServerConnection()); | ||
newJMXConnection = new JMXConnection(connection, connection.getMBeanServerConnection()); | ||
// TODO: also for local??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not local? Can you see an issue?
executor.submit(pqt); | ||
} catch (RejectedExecutionException ree) { | ||
logger.error("Could not submit query {}. You could try to size the 'queryProcessorExecutor' to a larger size.", pqt, ree); | ||
if(query.getQueryType() != Query.QueryType.NOTIFICATIONS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(query.getQueryType() == Query.QueryType.NOTIFICATIONS) {
continue;
}
would be clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we could abstract how values are fetched using some strategy pattern: local poll, remote poll, notifications. It may avoid all the if local, if notification... It may be a pure utopia...
} | ||
}; | ||
|
||
jmxConnection.addNotificationListener(oi.getObjectName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NotificationListener is never removed, except when connection is closed. Isn't there a risk of memory/resource leak in the monitored app?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of a broken connection the monitored app would need to remove the (dead) listeners. I haven't checked the JMX server code yet but there must be some logic for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't any logic yet. It is probably possible to plug into the connection pool to process the removal of listener before a JMXConnection is killed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which connection pool are you referring to here? From our pov the monitored app is a pure JAVA app which exposes metrics via JMX. About listener removal in jmxtrans, I'm taking care of it (I hope).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JMX Connections are pooled. The pool is managed by a Guice module, and is used in Server.execute(). Having a pool for the JMX connection is mainly useful to ensure that connections are validated, recycled, and overall managed correctly. It make things a bit awkward for notifications. You can't assume that a there is a single JMX Connection to any application. You also can't assume that those connections are long lived (well, you should never assume they are, things can and will break, connections will be reset outside of your control).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't assume that a there is a single JMX Connection to any application.
That's true, but can I assume that there's a single JMX connection per configured server (might be multiple "server configs" for the same application)? If yes, my design should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, you can't even assume that there is a single connection per server. The GenericKeyedObjectPool that we use works as a map of sub pools. This allows multiple threads to work in parallel to execute queries. Since JMXConnections are threadsafe and probably have very low contention, we could use a smarter way of managing them, getting away from the KeyedObjectPool and implementing connection validation / recycling. That might be a solution, but it requires a non trivial amount of work.
|
||
public JmxResultProcessor(Query query, ObjectInstance objectInstance, List<Attribute> attributes, String className, String objDomain) { | ||
this(query, objectInstance, attributes, className, objDomain, System.currentTimeMillis()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, it makes more sense to leave this specific instantiation in NotificationProcessorImpl. I didn't understand the purpose of this constructor at first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! Also, don't use System.currentTimeMillis()
, there is a Clock
abstraction in jmxtrans, you should use it. (and we should replace it with java.time.Clock
at some point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading that class again, it is a mess (it was already before you touched it!). The time of query should be injected, not instantiated here.
@@ -52,6 +52,9 @@ protected void before() throws Throwable { | |||
protected void after() { | |||
System.setOut(originalOut); | |||
System.setErr(originalErr); | |||
// I guess it's useful to print this after completion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug
@gquintana @gehel I appreciate if you could have a look at my questions posted after the latest commit :) |
A few comments:
That is problematic as multiple JMX Connections can be opened to a single application (see inline comment about connection pool). We need to abstract the subscription even more. Also I think that the pool has a default min idle of 0, which means that the connections are created only on demand. If we only have notification queries, the connection will probably never be created and the notifications never registered.
That looks fine.
Yes, you seem to use it correctly. A query can resolve to multiple mbeans.
I'm not sure what you are referring to. Could you add a link? |
|
||
import javax.management.Notification; | ||
|
||
public interface NotificationProcessor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface has a single implementation, and does not provide any additional abstraction. Just merge it with the implementation class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to avoid package cycle. You might suggest another design but there are similar examples in jmxtrans already (also with single implementation):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jmxtrans is full of examples of things you should never do :) I'll have a look to see if I can find a better solution.
|
||
import javax.management.ObjectInstance; | ||
|
||
public interface NotificationProcessorFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this interface does not add any value, please remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to avoid package cycle. You might suggest another design but there are similar examples in jmxtrans already (also with single implementation):
private List<Result> transform(Notification notification) { | ||
if (notification instanceof AttributeChangeNotification) { | ||
AttributeChangeNotification changeNotification = (AttributeChangeNotification) notification; | ||
List<Attribute> attributes = new ArrayList<>(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Collections.singletonList()
instead of an explicitly sized ArrayList.
Despite all the comments, congratulation on the nice job! You are trying to address a complex problem, within a code base that could be much cleaner than it is! I hope we'll manage to make all that work! |
…otFoundException "no security manager: RMI class loader disabled"
❌ Build jmxtrans 160 failed (commit 1501042351 by @dgloeckner) |
…ion handling + using an object pool with max size set to 1. Server + query are used as a compound key for the pool. That way, the same NotificationProcessor instance is returned each time we ask for a certain server / query combination. + periodically subscribing, re-subscribing to notifications. + using a dedicated JMX connection per server and query to deal with notifications. Makes the implementation simpler as we know which listeners in the JMX connection are registered already.
✅ Build jmxtrans 161 completed (commit 23fefe097e by @dgloeckner) |
@gehel I've just pushed a major change. There's now an explicit object pool which handles a notification processor per server-query tuple. A notification processor has a private JMX connection so managing the subscriptions (also with varying nr of returned object names) is much easier. The notification listeners will be subscribed / re-subscribed at a regular interval. |
I'm sure there are more reviews for "cosmetic" changes. I hope we get closer to the final design now and than the fine tuning can start. |
…s not exist any more in the remote + Improved JavaDoc
Here's the gist of the re-design: public void subscribeToNotifications(final Server server) throws Exception {
NotificationProcessor processor = null;
ServerQuery serverQuery = new ServerQuery(server, this);
try {
processor = notificationProcessors.borrowObject(serverQuery);
processor.subscribeToNotifications();
} catch (Exception e) {
if(processor != null) {
notificationProcessors.invalidateObject(serverQuery, processor);
}
throw e;
} finally {
notificationProcessors.returnObject(serverQuery, processor);
}
} Calling processor.subscribeToNotifications() takes care of adding new notification listeners and removing notification listeners for which the remote object name does not exist any more. The notification processor contains a JMXConnection. Since the notification processor relies on state in the JMXConnection (registered listeners) the 1:1 relation between processor and connection simplifies things a lot. If anything goes wrong during subscribeToNotifications(), we tell the object pool to invalidate the processor. The object pool is configured with max active = 1 so the periodic call to server. subscribeToNotifications() will always get the same notification processor for a given server - query tuple. What I like is that the state for notification listeners is isolated in one object, the NotificationProcessor. That object is responsible for bookkeeping. It features "let it crash" as we can simply throw away a notification processor which threw an exception (possibly due to a connection issue) and we'll try again from scratch without any extra code. Calling processor.subscribeToNotifications() periodically also acts as a "heart beat" to ensure that the JMX connection is still alive so we can be sure that we're able to receive notifications. It also takes care of new object names appearing at run time which could match our query. |
TODOs:
|
…tr" values as enabled attributes. If not "attr" are provided, no filter will be passed.
@gehel would be cool if you could review the re-design, specifically handling of JMX connections. Maybe one JMX connection per server and query to receive notifications is too extreme. On the other hand, I wanted to keep state management simple which is a feature of the current design. |
@@ -178,7 +178,7 @@ public void doWrite(Server server, Query query, Iterable<Result> results) throws | |||
|
|||
HashMap<String, Object> filteredValues = newHashMap(); | |||
Object value = result.getValue(); | |||
if (isValidNumber(value)) { | |||
if (isValidNumber(value) || value instanceof String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems strange to only allow "valid numbers". InfluxDB for sure allows to store Strings. I'm now also accepting Strings but I guess we should review the whole filtering logic.
Was this merged into JMXTrans? Does the latest JMXTrans release support mbean notifications? |
I've started to implement support for JMX notifications as discussed in the issue.
It's working end-to-end as can be seen in JmxTransformerIT integration test.
For now the only kind of Notification which is supported is AttributeChangeNotification. Applications could of course implement their own Notifications but jmxtrans would need to have related classes on its class path. It's probably not the common case.
My use case is the following:
Example publisher:
Example jmxtrans configuration:
The query will subscribe to nofitications of the CacheSize attribute which are published by the configured MBean. queryType is set to NOTIFICATIONS to tell jmxtrans that we want to use events here and not polling.
Example data from InfluxDB:
We can of course discuss the overall design and limitations. I'm keen to hear feedback.