Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix [JENKINS-34281] by running shutdown as system user so we can see items to persist them in queue - Mk 2 #2276

Conversation

8 participants
@svanoort
Copy link
Member

commented Apr 19, 2016

Fixes JENKINS-34281.

Too long; Didn't read summary: Jenkins wasn't running some forms of shutdown as the system user, and when we removed anonymous read access as part of the secure-out-of-the-box PR (https://github.com/jenkinsci/jenkins/pull/2042/files#diff-f65b8a70854ca1cc6c12397eee54d279R62) then it could no longer see build items to persist them.

Details:

  • Build queue was not persisted on shutdown, when doing a new Jenkins 2.0 install with setup wizard. The file was made, but no Items (builds) were included. Result: disappearing builds.
  • Build queue was not persisted because when saving the queue, it lists the items to save with Queue queue.getItems(), which returned an empty list
  • getItems() returns an empty list, because items are only returned when permissions make them readable, and during shutdown it did not have permissions to read them (permission check failed)
  • During shutdown, read permissions were not present because secure-out-of-the-box removed anonymous read access that allowed it to work before.
  • Solution: run shutdown correctly as system user, which has full permissions
@svanoort

This comment has been minimized.

Copy link
Member Author

commented Apr 19, 2016

@kwhetstone

This comment has been minimized.

Copy link

commented Apr 19, 2016

🐝

@@ -3940,7 +3940,8 @@ public void doExit( StaplerRequest req, StaplerResponse rsp ) throws IOException
w.println("Shutting down");
w.close();
}

// Just in case we have shutdown hooks
ACL.impersonate(ACL.SYSTEM);

This comment has been minimized.

Copy link
@kzantow

kzantow Apr 19, 2016

Contributor

Pretty sure this won't do anything, shutdown hooks should run in a different thread.

This comment has been minimized.

Copy link
@svanoort

svanoort Apr 19, 2016

Author Member

Yeah, I'm not sure it's actually going to do anything for us, just playing it extra-safe here.

This comment has been minimized.

Copy link
@jtnord

jtnord Apr 19, 2016

Member

well you are already checked for ADMINISTER

@@ -367,6 +368,7 @@ public FileAndDescription getHomeDir(ServletContextEvent event) {

public void contextDestroyed(ServletContextEvent event) {
try {
ACL.impersonate(ACL.SYSTEM);

This comment has been minimized.

Copy link
@kzantow

kzantow Apr 19, 2016

Contributor

Why not put this in Jenkins.cleanUp as:

ACL.impersonate(ACL.SYSTEM, new Runnable() {
// body of method here
});

This comment has been minimized.

Copy link
@svanoort

svanoort Apr 19, 2016

Author Member

Not actually doing anything for us here (since Jenkins dies after), but if it'll ease things, sure.

@reviewbybees

This comment has been minimized.

Copy link

commented Apr 19, 2016

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

@kzantow

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2016

🐝 the plethora of ACL.impersonates before Jenkins.cleanUp() is maybe less than ideal, but this seems to be following the same patterns

@svanoort

This comment has been minimized.

Copy link
Member Author

commented Apr 19, 2016

@kohsuke

This comment has been minimized.

Copy link
Member

commented Apr 20, 2016

Does ACL.impersonate work when Jenkins.getInstance() is null?

@kohsuke

This comment has been minimized.

Copy link
Member

commented Apr 20, 2016

I don't have the context of this change, but is this really a P1 that's justified a day before a release? Isn't the impact of this limited?

@daniel-beck

This comment has been minimized.

Copy link
Member

commented Apr 20, 2016

@kohsuke

I asked Sam to file in this branch. Still undecided whether I consider this safe enough to merge. I think it is a severe problem if it's a regression.

Isn't the impact of this limited?

To anyone with a queue they want saved, yes. Remember how "You can restart Jenkins and your pipelines will survive!" is a big feature of Pipeline? It becomes much less impressive if Jenkins forgets what's in the queue at the same time.

@jglick

This comment has been minimized.

Copy link
Member

commented Apr 20, 2016

@jglick

This comment has been minimized.

Copy link
Member

commented Apr 20, 2016

I do not consider this a regression in any sense—any fully secured 1.x installation would have denied anonymous read access, and 2.x users can freely change their security configuration after setup—but anyway, sounds like a bug, 🐝 so far as it goes.

@svanoort

This comment has been minimized.

Copy link
Member Author

commented Apr 20, 2016

Compare this to the OS X Yosemite -> El Capitan, or Windows XP -> Windows Vista/7 : people perceived these upgrades as problematic and buggy, largely due to security improvements which also technically weren't regressions. Also, to be clear: the original issue was introduced between Jenkins 1.609.3 and 1.625.3

Consider also that with 2.0 we're likely to see a far larger than normal influx of users trying out fresh installations, and as mentioned, pipeline durability is likely to be front-and-center. Given the significant 2.x changes, I imagine some of these will be people trying out fresh installs before upgrading their 1.x systems.

With regards to ACL.impersonate: it's present in all the other paths triggering the cleanup method, this was just an oversight, so it is safe to say it works. Given this and the fact that the change is scoped to shutdown behavior only, the risk level is far lower than normal.

Pending a decision from @daniel-beck on this (I can re-cut the PR to apply against master), but for those reasons I would argue against postponing this to 2.1

@reviewbybees

This comment has been minimized.

Copy link

commented Apr 20, 2016

This pull request has completed our internal processes and we now respectfully request the maintainers of this repository to consider our proposal contained within this pull request for merging.

@daniel-beck

This comment has been minimized.

Copy link
Member

commented Apr 20, 2016

I decided to leave this out of 2.0. I do consider this an important fix we need to deliver ASAP, as this does constitute data loss and is at the same time a regression between 1.609.3 and 1.625.3.

However, I was unable to find bug reports about this, indicating it's not perceived as a real problem. This also will only affect new installs, as upgraders likely already have the problem.

Delivering the fix in 2.1 looks like a reasonable compromise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.