Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-72163] Retry on initial connection failure occurs in one entrypoint but not the other #675

Merged
merged 1 commit into from Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
18 changes: 14 additions & 4 deletions src/main/java/hudson/remoting/Engine.java
Expand Up @@ -694,6 +694,7 @@
}
events.onDisconnect();
while (true) {
// TODO refactor various sleep statements into a common method

Check warning on line 697 in src/main/java/hudson/remoting/Engine.java

View check run for this annotation

ci.jenkins.io / Open Tasks Scanner

TODO

NORMAL: refactor various sleep statements into a common method
TimeUnit.SECONDS.sleep(10);
// Unlike JnlpAgentEndpointResolver, we do not use $jenkins/tcpSlaveAgentListener/, as that will be a 404 if the TCP port is disabled.
URL ping = new URL(hudsonUrl, "login");
Expand Down Expand Up @@ -758,11 +759,18 @@
final JnlpAgentEndpoint endpoint;
try {
endpoint = resolver.resolve();
} catch (Exception e) {
if (Boolean.getBoolean(Engine.class.getName() + ".nonFatalJnlpAgentEndpointResolutionExceptions")) {
events.status("Could not resolve JNLP agent endpoint", e);
} catch (IOException e) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor cleanup while I was here. The original code was catching any exception type, including InterruptedException (which it made no attempt to handle). Following generic advice such as that given on this page, prefer specific exceptions in catch blocks.

if (!noReconnect) {
events.status("Could not locate server among " + candidateUrls + "; waiting 10 seconds before retry", e);
// TODO refactor various sleep statements into a common method

Check warning on line 765 in src/main/java/hudson/remoting/Engine.java

View check run for this annotation

ci.jenkins.io / Open Tasks Scanner

TODO

NORMAL: refactor various sleep statements into a common method
TimeUnit.SECONDS.sleep(10);
continue;
} else {
events.error(e);
if (Boolean.getBoolean(Engine.class.getName() + ".nonFatalJnlpAgentEndpointResolutionExceptions")) {
events.status("Could not resolve JNLP agent endpoint", e);
Comment on lines +769 to +770
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to readers excluding the author (not a review): this property derives from #449, where SwarmClient wraps hudson.remoting.jnlp.Main in-JVM and thus the call to System.exit would be “fatal”. For normal uses of Remoting, the distinction is merely between logging a stack trace at SEVERE and exiting with -1 vs. logging a stack trace at INFO and exiting with 0 (but still effectively treating the error as fatal).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thus the call to System.exit would be “fatal”

Not sure why scare quotes were used here, but yes a call to System.exit is fatal to the life of the process.

} else {
events.error(e);
}
}
return;
}
Expand Down Expand Up @@ -891,6 +899,7 @@

private void onConnectionRejected(String greeting) throws InterruptedException {
events.status("reconnect rejected, sleeping 10s: ", new Exception("The server rejected the connection: " + greeting));
// TODO refactor various sleep statements into a common method

Check warning on line 902 in src/main/java/hudson/remoting/Engine.java

View check run for this annotation

ci.jenkins.io / Open Tasks Scanner

TODO

NORMAL: refactor various sleep statements into a common method
TimeUnit.SECONDS.sleep(10);
}

Expand All @@ -913,6 +922,7 @@
if(retry++>10) {
throw e;
}
// TODO refactor various sleep statements into a common method

Check warning on line 925 in src/main/java/hudson/remoting/Engine.java

View check run for this annotation

ci.jenkins.io / Open Tasks Scanner

TODO

NORMAL: refactor various sleep statements into a common method
TimeUnit.SECONDS.sleep(10);
events.status(msg+" (retrying:"+retry+")",e);
}
Expand Down
1 change: 1 addition & 0 deletions src/main/java/hudson/remoting/Launcher.java
Expand Up @@ -569,6 +569,7 @@
System.err.println("Failed to obtain "+ agentJnlpURL);
e.printStackTrace(System.err);
System.err.println("Waiting 10 seconds before retry");
// TODO refactor various sleep statements into a common method

Check warning on line 572 in src/main/java/hudson/remoting/Launcher.java

View check run for this annotation

ci.jenkins.io / Open Tasks Scanner

TODO

NORMAL: refactor various sleep statements into a common method
TimeUnit.SECONDS.sleep(10);
// retry
} finally {
Expand Down
Expand Up @@ -378,6 +378,7 @@
try {
int retries = 0;
while (true) {
// TODO refactor various sleep statements into a common method

Check warning on line 381 in src/main/java/org/jenkinsci/remoting/engine/JnlpAgentEndpointResolver.java

View check run for this annotation

ci.jenkins.io / Open Tasks Scanner

TODO

NORMAL: refactor various sleep statements into a common method
Thread.sleep(1000 * 10);
try {
// Jenkins top page might be read-protected. see http://www.nabble
Expand Down