Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle empty port to fix RealJenkinsRule flakiness #465

Merged
merged 4 commits into from Aug 8, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
44 changes: 31 additions & 13 deletions src/main/java/org/jvnet/hudson/test/RealJenkinsRule.java
Expand Up @@ -385,10 +385,11 @@ public RealJenkinsRule includeTestClasspathPlugins(boolean includeTestClasspathP
}
System.out.println("Will load plugins: " + Stream.of(plugins.list()).filter(n -> n.matches(".+[.][hj]p[il]")).sorted().collect(Collectors.joining(" ")));
base.evaluate();
} catch (Throwable t) {
LOGGER.log(Level.WARNING, "Something went wrong", t);
jglick marked this conversation as resolved.
Show resolved Hide resolved
throw t;
Comment on lines +388 to +390
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the purpose of this. If an error is thrown out of a test, the test fails and the error is displayed by the test runner right?

Suggested change
} catch (Throwable t) {
LOGGER.log(Level.WARNING, "Something went wrong", t);
throw t;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lost a lot of time understanding root cause with empty port because first exception was lost when finally throw it’s own exception, I still suggest leave it to at least log an exception.

Copy link
Member

@jtnord jtnord Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lost a lot of time understanding root cause with empty port because first exception was lost when finally throw it’s own exception

Sounds like a bug somewhere, the first exception should be a supressed exception of whatever was thrown in the finally

Perhaps a bug in junit runner or surefire or somewhere else where it is ignoring supressed exceptions (I expect there could be multiple cases of this).

https://issues.apache.org/jira/browse/SUREFIRE-1772 possibly.

} finally {
if (proc != null) {
stopJenkins();
}
stopJenkins();
try {
tmp.dispose();
} catch (Exception x) {
Expand Down Expand Up @@ -608,25 +609,42 @@ private static boolean supportsPortFileName(String war) throws IOException {

private static int readPort(File portFile) throws IOException {
String s = FileUtils.readFileToString(portFile, StandardCharsets.UTF_8);

// Work around to non-atomic write of port value in Winstone releases prior to 6.1.
// TODO When Winstone 6.2 has been widely adopted, this can be deleted.
if (s.isEmpty()) {
LOGGER.warning(() -> String.format("PortFile: %s exists, but value is still not written", portFile.getAbsolutePath()));
return 0;
}
Comment on lines +615 to +618
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, where is the code that writes this file, and can it be improved to do so atomically? Please fix here if in JTH. This logic is OK as a workaround if that code is in Winstone and you file a patch for it but it would take a long time for actual tests to pick it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so then that should be patched.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a PR was filed to fix the underlying problem in Winstone, I could accept this workaround as a temporary measure. But since no PR has been filed to fix the underlying problem in Winstone, I am concerned that accepting this workaround will decrease the incentive to implement a fix for the underlying problem in Winstone. In the absence of such a fix to Winstone this workaround may not be temporary but rather it may become permanent (i.e., technical debt). The Winstone fix seems trivial and could likely be coded in less than an hour, so my preference would be for it to be implemented prior to the addition of this workaround.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@basil I created a PR#249 in Winstone to unblock

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, and thank you! Could we now add a comment to the (indeed, temporary!) workaround being added in this PR that says something to the effect of:

Work around […] in Winstone releases prior to […]. When Winstone […] has been widely adopted, this can be deleted.

The reason I find it important to leave such comments is that without them, it is difficult to know when it is safe to remove the workaround in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment added in 1479165


try {
return Integer.parseInt(s);
} catch (NumberFormatException e) {
throw new AssertionError("Unable to parse port from " + s + ". Jenkins did not start.");
}
}

/**
* Stops Jenkins and releases any system resources associated
* with it. If Jenkins is already stopped then invoking this
* method has no effect.
*/
public void stopJenkins() throws Throwable {
endpoint("exit").openStream().close();
if (!proc.waitFor(60, TimeUnit.SECONDS) ) {
System.err.println("Jenkins failed to stop within 60 seconds, attempting to kill the Jenkins process");
proc.destroyForcibly();
throw new AssertionError("Jenkins failed to terminate within 60 seconds");
jtnord marked this conversation as resolved.
Show resolved Hide resolved
}
int exitValue = proc.exitValue();
if (exitValue != 0) {
throw new AssertionError("nonzero exit code: " + exitValue);
if (proc != null) {
Process _proc = proc;
proc = null;
Copy link
Contributor Author

@Pldi23 Pldi23 Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when test flakes and fails:

  1. test runs well and going to stopJenkins at the end:
  2. stopJenkins hit timeout:
    if (!proc.waitFor(60, TimeUnit.SECONDS) ) {
  3. AssertExceptions thown as a result
  4. finally block executed
    if (proc != null) {
    stopJenkins();
    }
    try {
    tmp.dispose();
    } catch (Exception x) {
    LOGGER.log(Level.WARNING, null, x);
    }
    and because proc != null tries to stop Jenkins again:
  5. ConnectionException thrown:

Copy link
Contributor Author

@Pldi23 Pldi23 Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to set proc = null like

proc.destroyForcibly();
proc = null;
instead of throwing AssertError

endpoint("exit").openStream().close();

if (!_proc.waitFor(60, TimeUnit.SECONDS) ) {
System.err.println("Jenkins failed to stop within 60 seconds, attempting to kill the Jenkins process");
_proc.destroyForcibly();
throw new AssertionError("Jenkins failed to terminate within 60 seconds");
}
int exitValue = _proc.exitValue();
if (exitValue != 0) {
throw new AssertionError("nonzero exit code: " + exitValue);
}
}
proc = null;
}

public void runRemotely(Step s) throws Throwable {
Expand Down