Skip to content

Instrumentation feature#471

Closed
Aki-7 wants to merge 1 commit into
jenkinsci:masterfrom
Aki-7:instrumentation
Closed

Instrumentation feature#471
Aki-7 wants to merge 1 commit into
jenkinsci:masterfrom
Aki-7:instrumentation

Conversation

@Aki-7
Copy link
Copy Markdown
Member

@Aki-7 Aki-7 commented Jul 14, 2021

Proposal to introduce remoting instrumentation feature.

Related to GSoC '21 Remoting Monitoring project.

In the Remoting Monitoring project, we are trying to observe the behavior of remoting modules using OpenTelemetry. And users will be able to visualize the remoting behavior like this.

image

To collect more accurate and valuable data, we want to introduce a new feature to observe the remoting module.

What I did

  1. Adjust Launcher's -cp flag for Java9+
  • It doesn't behave as Java's -cp flag in Java9+, but somehow valuable.
  • In Java9+, we use a new URLClassLoader to load and set it to context class loader.
  1. Introduce LauncherInstrumentationListener and EngineInstrumentationListener
  • They issue an event. e.g. onRunWithStdinStdout
  1. Enable to load user-defined LauncherInstrumentationListener and EngineInstrumentationListener
  • Users have to pass the JAR file using the -cp flag
    • the JAR file will include user-defined listener classes, which inherit LauncherInstrumentationListenerAdapter or EngineInstrumentationListenerAdapter
  • Users have to set System properties.
    • value is the comma-separated canonical names of the listener classes users created.

Command example

java \
-Dhudson.remoting.Engine.engineInstrumentationListenerCanonicalNames=user.defined.engine.instrumentation.listener \
-Dhudson.remoting.Launcher.launcherInstrumentationListenerCanonicalNames=user.defined.launcher.instrumentation.listener \
-jar agent.jar -jnlpUrl $JENKINS_ROOT/computer/$NODE_NAME/jenkins-agent.jnlp \
-workDir $WORK_DIR -cp user-defined-instrumentations.jar

Some references regarding the -cp flag.
#468
#443
https://issues.jenkins.io/browse/JENKINS-64831

I would appreciate any negative or positive feedback!

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@oleg-nenashev oleg-nenashev self-requested a review July 20, 2021 11:50
@oleg-nenashev oleg-nenashev added the enhancement For changelog: An enhancement providing new capability. label Jul 20, 2021
Copy link
Copy Markdown
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not rely on -cp.


ClassLoader currentThreadClassLoader = Thread.currentThread().getContextClassLoader();
URLClassLoader urlClassLoader = new URLClassLoader(urls.toArray(new URL[]{}), currentThreadClassLoader);
Thread.currentThread().setContextClassLoader(urlClassLoader);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not rely on Thread.contextClassLoader please.

If you want to offer a flag allowing custom listener code to be loaded, do not do it this way. First of all, stop using system properties; introduce a real option with documentation. Then have it somehow specify the location of a JAR and/or class name(s) to load from it in a fresh class loader.

(Nicest would be to use SezPoz but adding a new library dep for this purpose is likely overkill, and anyway it would potentially clash with the copy loaded from jenkins-core.jar.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, define a hook as a JAR whose manifest has a Main-Class entry giving the FQN of a class implementing whichever listener interface makes sense, with a public no-arg constructor. Then define a CLI arg

-listener /path/to/impl.jar

When specified, the hook would be loaded in its own private URLClassLoader parented to that of agent.jar.

Comment on lines +355 to +362
} catch (ClassNotFoundException e) {
LOGGER.log(Level.WARNING, String.format("Failed to load %s", listenerName), e);
success = false;
} catch (NoSuchMethodException e) {
LOGGER.log(Level.WARNING, String.format("Failed to get constructor of %s", listenerName), e);
success = false;
} catch (InvocationTargetException | InstantiationException | IllegalAccessException e) {
LOGGER.log(Level.WARNING, String.format("Failed to instantiate %s", listenerName), e);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KISS

Suggested change
} catch (ClassNotFoundException e) {
LOGGER.log(Level.WARNING, String.format("Failed to load %s", listenerName), e);
success = false;
} catch (NoSuchMethodException e) {
LOGGER.log(Level.WARNING, String.format("Failed to get constructor of %s", listenerName), e);
success = false;
} catch (InvocationTargetException | InstantiationException | IllegalAccessException e) {
LOGGER.log(Level.WARNING, String.format("Failed to instantiate %s", listenerName), e);
} catch (Exception e) {
LOGGER.log(Level.WARNING, "Failed to load " + listenerName, e);
success = false;

Comment on lines +348 to +354
if (listener instanceof EngineInstrumentationListener) {
instrumentation.add((EngineInstrumentationListener) listener);
LOGGER.log(Level.INFO, String.format("Loaded an external engine instrumentation listener (%s)", listenerName));
} else {
LOGGER.log(Level.WARNING, String.format("Failed to use %s. Not an instance of %s", listenerName, EngineInstrumentationListener.class.getCanonicalName()));
success = false;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java’s built-in CCEs in newer versions are better than any message you come up with.

Suggested change
if (listener instanceof EngineInstrumentationListener) {
instrumentation.add((EngineInstrumentationListener) listener);
LOGGER.log(Level.INFO, String.format("Loaded an external engine instrumentation listener (%s)", listenerName));
} else {
LOGGER.log(Level.WARNING, String.format("Failed to use %s. Not an instance of %s", listenerName, EngineInstrumentationListener.class.getCanonicalName()));
success = false;
}
instrumentation.add((EngineInstrumentationListener) listener);
LOGGER.log(Level.INFO, "Loaded an external engine instrumentation listener ({0})", listenerName);

@jglick
Copy link
Copy Markdown
Member

jglick commented Jul 20, 2021

I would actually suggest considering a different design altogether. Use the existing

public interface EngineListener {
/**
* Status message that indicates the progress of the operation.
*/
void status(String msg);
/**
* Status message, with additional stack trace that indicates an error that was recovered.
*/
void status(String msg, Throwable t);
/**
* Fatal error that's non recoverable.
*/
void error(Throwable t);
/**
* Called when a connection is terminated.
*/
void onDisconnect();
/**
* Called when a re-connection is about to be attempted.
* @since 2.0
*/
void onReconnect();
}
and/or
public static abstract class Listener {
/**
* When the channel was closed normally or abnormally due to an error.
*
* @param cause
* if the channel is closed abnormally, this parameter
* represents an exception that has triggered it.
* Otherwise null.
*/
public void onClosed(Channel channel, IOException cause) {}
/**
* Called when a command is successfully received by a channel.
* @param channel a channel
* @param cmd a command
* @param blockSize the number of bytes used to read this command
* @since 3.17
*/
public void onRead(Channel channel, Command cmd, long blockSize) {}
/**
* Called when a command is successfully written to a channel.
* See {@link #onRead} for general usage guidelines.
* @param channel a channel
* @param cmd a command
* @param blockSize the number of bytes used to write this command
* @since 3.17
*/
public void onWrite(Channel channel, Command cmd, long blockSize) {}
/**
* Called when a response has been read from a channel.
* @param channel a channel
* @param req the original request
* @param rsp the resulting response
* @param totalTime the total time in nanoseconds taken to service the request
* @since 3.17
*/
public void onResponse(Channel channel, Request<?, ?> req, Response<?, ?> rsp, long totalTime) {}
/**
* Called when a JAR file is being sent to the remote side.
* @param channel a channel
* @param jar the JAR file from which code is being loaded remotely
* @see Capability#supportsPrefetch
* @since 3.17
*/
public void onJar(Channel channel, File jar) {}
}
(adding some default methods as needed for finer-grained or more informative events) and otherwise do nothing in remoting. Rather, make a Jenkins plugin which injects listener code into the agent when it connects. There are various examples of this already. This would make it much easier to distribute various listener implementations (just use the update center); there is no need to adjust the startup arguments for agent.jar; the Remoting code stays cleaner. The only downside is that you would not get access to details of what is going on before the connection is established.

@jglick
Copy link
Copy Markdown
Member

jglick commented Jul 20, 2021

you would not get access to details of what is going on before the connection is established

I see this survey but it is not obvious to me that the extra bit of diagnostics possible with the current approach is worth making the distribution and configuration of the listener much more awkward. Typically the problem is sporadic disconnection; if you cannot connect at all, there are better ways to diagnose that directly, not using general telemetry.

Not sure what “robust against agent restart” is about. With the plugin approach, if the agent restarts, Jenkins will see a new Channel, and send the listener to it again. So that should not be an issue.

@Aki-7
Copy link
Copy Markdown
Member Author

Aki-7 commented Jul 20, 2021

Thank you so much for your review and advice!
I'm very grateful for your positive suggestions!

I'll provide a short answer for

Not sure what “robust against agent restart” is about.

before going to bed.

Typically the problem is sporadic disconnection

I agree with this and I thought the plugin approach would work well but I found that, in the UNIX agent, UnixSlaveRestarter restarts the agent with execv when disconnection happens.

code: https://github.com/jenkinsci/jenkins/blob/95e85495ca6b26ba61ff4a1bf18d181d5e218bc9/core/src/main/java/jenkins/slaves/restarter/UnixSlaveRestarter.java#L50-L63
issue in my repo: jenkinsci/remoting-opentelemetry-plugin#65

So if we take the plugin approach, the monitoring program will be killed every time sporadic disconnection happens. And we cannot collect the trace of the next connection and metrics after the sporadic disconnection. I think this information can be useful for troubleshooting and that's what I mean by "not robust against agent restart".

@jeffret-b
Copy link
Copy Markdown
Contributor

Generally, I agree with Jesse's comments and recommendations, particularly about keeping it simple. Launching the agent is already complicated enough -- it's important not to further complicate that. If the feature is valuable we need to implement reasonable support for it.

While I understand users would like to have diagnostics about what happens before a connection is completed, I doubt they would actually find anything useful there. I wouldn't prioritize that as a high-value feature or sacrifice other improvements for that.

@Aki-7
Copy link
Copy Markdown
Member Author

Aki-7 commented Jul 21, 2021

Thank you so much for a lot of feedback!
I'll try to things more simple!

@Aki-7
Copy link
Copy Markdown
Member Author

Aki-7 commented Jul 27, 2021

I'll close this PR, thank you so much for your feedback!

context: jenkinsci/remoting-opentelemetry-plugin#65 (comment)

@Aki-7 Aki-7 closed this Jul 27, 2021
@oleg-nenashev
Copy link
Copy Markdown
Member

oleg-nenashev commented Jul 27, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement For changelog: An enhancement providing new capability.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants