Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-67075] Strip a possible daemon-injected prefix when restarting the server. #5899

Conversation

CyrilBrulebois
Copy link

At least the Debian packages rely on the daemon command to start the
server with appropriate parameters, leveraging the /etc/init.d/jenkins
shell script and the variables set in the /etc/default/jenkins config
file.

The init script calls /usr/bin/daemon --name=$NAME […], with NAME
defaulting to jenkins. Starting with daemon 0.7.0-1 (as found in
Debian 11 “Bullseye” and in Ubuntu since Hirsute Hippo), using the
--name option results in the specified value's being prepended to
the program being started.

This can be checked with:

root@jenkins:~# pgrep -f jenkins: -a
73109 jenkins: /usr/bin/java -Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080

root@jenkins:~# tr '\0' '\n' < /proc/73109/cmdline
jenkins: /usr/bin/java
-Djava.awt.headless=true
-jar
/usr/share/jenkins/jenkins.war
--webroot=/var/cache/jenkins/war
--httpPort=8080

Therefore, detect and remove such a prefix from the first argument
(the executable). Since the daemon's name is configurable, don't
hardcode "jenkins: ", but strip everything until the first slash if
it's preceeded by a colon and a space.

Signed-off-by: Cyril Brulebois cyril@debamax.com

See JENKINS-67075.

Proposed changelog entries

  • Entry 1: JENKINS-67075 Fix server restart on recent Debian and Ubuntu environments.

Submitter checklist

  • (If applicable) Jira issue is well described
  • Changelog entries and upgrade guidelines are appropriate for the audience affected by the change (users or developer, depending on the change). Examples
  • Appropriate autotests or explanation to why this change has no tests

I've written this change on top of my local stable-2.303 branch (and it can be cherry-picked trivially on top of master, which is what I'm submitting here), and tested it by deploying the generated war file instead of the one shipped in the Debian package, then using the /safeRestart entry point to make sure the server does restart correctly. I'm not sure it warrants adding tests in the source code tree.

Desired reviewers

Maintainer checklist

Before the changes are marked as ready-for-merge:

  • There are at least 2 approvals for the pull request and no outstanding requests for change
  • Conversations in the pull request are over OR it is explicit that a reviewer does not block the change
  • Changelog entries in the PR title and/or Proposed changelog entries are correct
  • Proper changelog labels are set so that the changelog can be generated automatically
  • If the change needs additional upgrade steps from users, upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the PR title. (example)
  • If it would make sense to backport the change to LTS, a Jira issue must exist, be a Bug or Improvement, and be labeled as lts-candidate to be considered (see query).

…ng the server.

At least the Debian packages rely on the `daemon` command to start the
server with appropriate parameters, leveraging the /etc/init.d/jenkins
shell script and the variables set in the /etc/default/jenkins config
file.

The init script calls `/usr/bin/daemon --name=$NAME […]`, with NAME
defaulting to `jenkins`. Starting with daemon 0.7.0-1 (as found in
Debian 11 “Bullseye” and in Ubuntu since Hirsute Hippo), using the
`--name` option results in the specified value's being prepended to
the program being started.

This can be checked with:

    root@jenkins:~# pgrep -f jenkins: -a
    73109 jenkins: /usr/bin/java -Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080

    root@jenkins:~# tr '\0' '\n' < /proc/73109/cmdline
    jenkins: /usr/bin/java
    -Djava.awt.headless=true
    -jar
    /usr/share/jenkins/jenkins.war
    --webroot=/var/cache/jenkins/war
    --httpPort=8080

Therefore, detect and remove such a prefix from the first argument
(the executable). Since the daemon's name is configurable, don't
hardcode "jenkins: ", but strip everything until the first slash if
it's preceeded by a colon and a space.

Signed-off-by: Cyril Brulebois <cyril@debamax.com>
String exe = args.get(0);
exe = exe.replaceFirst("^(.*): /", "/");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this regular expression discard the first string that does not contain a ':' character, ending with a ':'? Shouldn't it require at least one character in the prefix string before the ':' character?

Suggested change
exe = exe.replaceFirst("^(.*): /", "/");
exe = exe.replaceFirst("^([^:]+): /", "/");

I still need to test this on a Debian 11 system before I'm ready to approve the pull request or my tentatively proposed change to the pull request.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right; I was so focused on figuring out how replacements work in the Java world that I have lost track of the actual pattern, and forgot to try and improve it. Given https://sources.debian.org/src/adduser/3.118/adduser/#L866 (and probably other constraints on different platforms), it looks like an “interesting” topic.

Anyway, it looks like:

  • *+ makes sense;
  • excluding : from the character class looks good to me;
  • maybe excluding (space) and/or / as well for good measure?

Copy link
Member

@basil basil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense, but UnixLifecycle seems like the wrong layer to introduce this logic. I think it would be better to place this logic in JavaVMArguments, where the fix would extend to any functionality consuming the Jenkins controller JVM arguments (e.g., HsErrPidList) rather than just the UnixLifecycle class.

@CyrilBrulebois
Copy link
Author

The change makes sense, but UnixLifecycle seems like the wrong layer to introduce this logic. I think it would be better to place this logic in JavaVMArguments, where the fix would extend to any functionality consuming the Jenkins controller JVM arguments (e.g., HsErrPidList) rather than just the UnixLifecycle class.

Oh, right. I took JavaVMArguments for granted, and didn't verify whether/that it came from the same source code. It would make sense to have the fix in that class, just in case it's reused elsewhere.

Also, for completeness: chatting a little with @evgeni on Twitter, we mentioned one could try and get to the actual executable (e.g. /proc/<pid>/exe) but that seems fragile for other reasons… For example, it could be some /usr/bin/java via ProcessTree, while exe points to the actual executable (that /usr/bin/java points to via symlinks), and system upgrades could have changed e.g. /usr/lib/jvm/java-11-openjdk-amd64/bin/java to some other JRE…

@jglick
Copy link
Member

jglick commented Dec 22, 2021

UnixLifecycle seems like the wrong layer to introduce this logic

UnixLifecycle seems like the wrong lifecycle altogether. This mode should only be used for

ssh me@someserver nohup java -jar jenkins.war

or something ridiculous like that, where execvp to self is the only way to get the process to restart because there is literally nothing else managing it. Any actual packaging of Jenkins should pass -Dhudson.lifecycle=hudson.lifecycle.ExitLifecycle. (This is what you must currently set in a K8s StatefulSet for example.) I am inclined to just deprecate UnixLifecycle and its scary JNA tricks, as suggested before in #5561 (comment), and make ExitLifecycle the default on all platforms.

(SolarisSMFLifecycle is basically the same, just with an exit code of 0. For whatever reason, ExitLifecycle defaults to 5.)

@timja
Copy link
Member

timja commented Dec 22, 2021

Why must you set it for StatefulSet? I can't see it being used in the official helm chart

@jglick
Copy link
Member

jglick commented Dec 22, 2021

I do not work on that chart and cannot vouch for its correctness. CloudBees CI sets -Dhudson.lifecycle=hudson.lifecycle.ExitLifecycle.

@timja
Copy link
Member

timja commented Dec 22, 2021

But why must you set it? As it seems to work fine in my experience with the default (at least on Linux x64)

@jglick
Copy link
Member

jglick commented Jan 4, 2022

Because the default is to use UnixLifecycle which tries to restart in place (same PID) using low-level tricks, rather than just exiting and letting the container orchestrator spawn a new container and a new process.

@timja
Copy link
Member

timja commented Jan 5, 2022

thanks created jenkinsci/helm-charts#529

@basil
Copy link
Member

basil commented Jan 11, 2022

Because the default is to use UnixLifecycle which tries to restart in place (same PID) using low-level tricks, rather than just exiting and letting the container orchestrator spawn a new container and a new process.

I agree with this. If Jenkins is being run in a container, we should always be using ExitLifecycle and delegating to the container orchestrator. I filed jenkinsci/docker#1268 for this.

When JENKINS-41218 is fixed, I think we should also use ExitLifecycle when running in systemd(1), assuming the fix for JENKINS-41218 specifies Restart=on-failure in the systemd(1) unit.

@jglick
Copy link
Member

jglick commented Jan 18, 2022

jenkinsci/packaging#216?

@timja
Copy link
Member

timja commented Jan 27, 2022

Is this needed after jenkinsci/packaging#261 ?

@basil
Copy link
Member

basil commented Jan 27, 2022

Yes, jenkinsci/packaging#261 should fix JENKINS-67075. My vote would be to ship it in a weekly or two and then go for LTS, but @MarkEWaite likely wants to hold off.

@timja timja closed this Jan 27, 2022
@timja
Copy link
Member

timja commented Jan 27, 2022

Yes, jenkinsci/packaging#261 should fix JENKINS-67075. My vote would be to ship it in a weekly or two and then go for LTS, but @MarkEWaite likely wants to hold off.

it'll be in the weekly next Tuesday as it's merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants