JENKINS-50556 don't check local refs if we have remote ones by jacob-keller · Pull Request #579 · jenkinsci/git-plugin

jacob-keller · 2018-04-04T23:57:08Z

JENKINS-50556 don't check local refs if we had remote ones

Fix an issue related to how polling works with remote vs local references. In the DefaultBuildChooser, we currently check branch-like references first, by converting them to remote/name lookups. This is done so that we can check the status of the branches according to the remote references. If we don't find one, then we assume that the branchSpec is actually a tag or tag-like reference.

Unfortunately, during polling, we prune these references that have already been built. This results in an accidental check of local branches, in the case where remote branches exist, but have no new changes. In some cases, this can result in polling reporting a change to build, but which will never be found by the build chooser during run time (as isPollCall would be false then, and we would not report the local branches in this case).

Fix this by instead pruning the list of references at the end, after collecting them. This ensures that the DefaultBuildChooser will work similarly in both the normal and polling cases, and avoids the possibility of permanently polling which can happen if a local branch gets a reference checked out.

Note that the polling bug may not be caught in many cases, because it happens that local branches do not get setup by the git plugin normally. Additionally, the remote polling setup is not susceptible to this issue, as it does not use the workspace for polling, instead using only ls-remote.

Add a test case which forces this sequence of events in order to highlight the issue, and make sure we don't re-introduce it.

jacob-keller · 2018-04-05T20:16:01Z

I have a possible solution to add to this shortly.

jacob-keller · 2018-04-05T22:32:14Z

Oops, the singleton collection returned by getHeadRevision is immutable, so we need to use addAll() when getting the tag-like references. Fixed that.

jacob-keller · 2018-04-06T16:18:50Z

Ok, so there's some tests failing with timeout, and they sometimes fail locally, and other times they don't. Especially if I limit the test run to a single test case, it doesn't seem to fail. I also see similar tests fail locally if I run the test suite against master. Any thoughts on how to debug this?

jacob-keller · 2018-04-06T17:30:01Z

It seems like some of the tests only appear to timeout during the full test run, but if run manually they do not timeout. Additionally, when I run the test suite locally, many tests timeout, but appear to have been marked as "flaky", and get re-run. I'm trying a local test run with the jenkins.test.timeout value set to a large number to see if it's maybe just a race condition for a few specific tests, or if it really is some deadlock that occurs.

jacob-keller · 2018-04-11T22:11:20Z

I'm also pretty sure these tests are timing out due to the lack of entropy on the virtual machine (SSHD fails to start up when generating private keys due to blocking on /dev/random). It fails intermittently. Is there a way to re-run the tests?

MarkEWaite · 2018-04-11T23:49:45Z

The simplest way that I've found to run tests on a pull request again is to git commit --amend the most recent commit, change the text of the commit message, then git push --force to update the pull request.

jacob-keller · 2018-04-12T23:03:01Z

Looks like the same problem again with the failing to generate SSHD server keys in the JenkinsRule timeout.

jacob-keller · 2018-04-13T22:06:42Z

Looks like @jglick tried something to re-run the tests,but they still timeout:

Error test timed out after 180 seconds Stacktrace org.junit.runners.model.TestTimedOutException: test timed out after 180 seconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:267) at jenkins.InitReactorRunner.run(InitReactorRunner.java:44) at jenkins.model.Jenkins.executeReactor(Jenkins.java:916) at jenkins.model.Jenkins.<init>(Jenkins.java:815) at hudson.model.Hudson.<init>(Hudson.java:85) at org.jvnet.hudson.test.JenkinsRule.newHudson(JenkinsRule.java:611) at org.jvnet.hudson.test.JenkinsRule.before(JenkinsRule.java:384) at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:537) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) Standard Output

This code was commented out by 52d75b9 ("Comment out changes from 4f963ad, b55df6b, and 46f610e.", 2011-07-22), which was done 7 years ago. The code remains in the history for those that need it. We don't need to pollute the remaining implementation with code that is not used and probably does not work. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

A future patch is going to migrate the code which ignores already built elements from outside of getHeadRevision directly into the getCandidateRevisions. However, currently when we check for tag-like refs we just assign revisions to the collection returned by the getHeadRevision. This will cause a problem because the Collections.singleton() is immutable and does not support removing elements. Fix this by using .addAll to put the elements into the HashSet rather than continuing to use the singleton collection. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Title: Use addAll instead of replacing the revisions collection Change-type: DefectResolution

When the git plugin attempts to determine if there are new changes using the DefaultBuildChooser it can sometimes accidentally report changes even when there are no new changes. This can occur in rare cases due to the way the code handles non-branch references. Currently, we first check the reference pre-fixed with the remote name, assuming that its a branch. getHeadRevision is used to find all the proper revisions. However, when we are polling, we exclude any already built changes. Since in this scenario, the remote hasn't updated, we'll report that no changes exist. Because we didn't find any valid references, we then check for whether the reference is actually a tag or other specific value (such as "refs/changes/<change number>" from gerrit, by checking the ref directly. In some cases, it is possible for the local work tree to have a checkout of the local branch which is at some commit that maybe hasn't been built yet. (For example, if the job in question creates commits or performs git commands). This results in the polling reporting that a new commit exists, which triggers a build. However, the actual build will not be for that commit, so we end up in an endless loop where polling reports a new commit every time. The issue is somewhat rare for a number of reasons: (a) it requires a local branch to be checked out, which does not happen by default when checking out using the git plugin (b) the local branch must be checked out to a commit that is *not* on the remote This is somewhat unlikely but possible if a job happens to generate new commits, or perform other work as part of its tasks. Once the branch is created, it causes the endless rebuilding. The real root cause is due to how we handle scrubbing already built revisions. We currently discard them directly in getHeadRevision. Instead, move this check outside of getHeadRevision only *after* we check for tags. This ensures, that the check for non-branch refs will only occur in the case where there *are* no matching remote names. This avoids accidentally looking at the local branch name in this case, ensuring that we do not report its commits as candidates for a build. This ensures that we'll match commit selection when polling vs when not polling, and prevent the endless rebuild cycle. Add a test case highlighting the issue so that we can prevent regressions in the future. Fixes: JENKINS-50556 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Now that we don't use the isPollCall or BuildData values, we can drop them from this private helper function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

jacob-keller · 2018-04-23T18:04:34Z

I fixed up the commit messages which needed to be re-worded, and squashed the test case into the same patch which fixes the bug. I also updated the message to clearly indicate the bug, and why we didn't find it in the past.

MarkEWaite · 2018-05-15T04:57:48Z

I believe I've created a verification job that shows the problem in my environment. Unfortunately, that job shows the same "build every poll" whether I'm using this set of changes or not.

I still haven't found the correct set of configuration steps to duplicate the problem without the change and then to see the problem fixed with the change.

jacob-keller · 2018-05-15T05:45:00Z

My test case is to have a job follow a branch "master" (but not explicitly marked as "origin/master"), and then also run a git checkout within the repository.

Specifically, I found this in a job which clones the repos, and makes a new commit on the branch and publishes it to gerrit.

I think your example has the */ which doesn't match a local branch reference.

If you look at my example test case, the only way so far I've found to trigger this is when the local job actually does a git checkout of some kind, as currently we happen to avoid the issue due to how clone never actually checks out the branch but instead just leaves us on a detached head checkout, and local branches aren't fetched.

So, the only way I've actually triggered this is to explicitly checkout something in a shell script or by hand in the workspace outside the job. I suppose there may also be configurations which do result in a checkout of the local branch, but I haven't found one yet.

Additionally, the checkout would need to be to some sha1 which hasn't yet been built by some other job. Otherwise we'll check against build data and discover it was already built by a different build.

MarkEWaite · 2018-05-15T12:06:32Z

I'll experiment further. I'm using the "LocalBranch" extension (checkout to specific local branch) which performs a checkout) and forcing workspace polling. I'll try switching the definition of the branch to checkout from the intentionally wildcard version to a non-wildcard version.

Thanks for investigating!

jacob-keller changed the title ~~Jk fix polling local branches~~ JENKINS-50556 test case highlighting breakage Apr 5, 2018

jacob-keller force-pushed the jk-fix-polling-local-branches branch from 51b11e2 to 28a70a1 Compare April 5, 2018 18:15

jacob-keller force-pushed the jk-fix-polling-local-branches branch from 948ee3e to b6b0a57 Compare April 5, 2018 22:31

jacob-keller changed the title ~~JENKINS-50556 test case highlighting breakage~~ JENKINS-50556 don't check local refs if we have remote ones Apr 6, 2018

jacob-keller force-pushed the jk-fix-polling-local-branches branch from b6b0a57 to 68e469f Compare April 12, 2018 22:20

jacob-keller added 5 commits April 23, 2018 10:57

use the helper objectId2Revision in DefaultBuildChooser

fe9abc1

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

drop unused parameters on getHeadRevisions

f6fb108

Now that we don't use the isPollCall or BuildData values, we can drop them from this private helper function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

jacob-keller force-pushed the jk-fix-polling-local-branches branch from 68e469f to f6fb108 Compare April 23, 2018 18:02

Minor phrasing fix in README

3ba908e

MarkEWaite added the bugfix Fixes a bug - used by Release Drafter label Jul 13, 2019

MarkEWaite added this to the Later milestone Dec 19, 2019

MarkEWaite closed this Dec 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JENKINS-50556 don't check local refs if we have remote ones#579

JENKINS-50556 don't check local refs if we have remote ones#579
jacob-keller wants to merge 6 commits intojenkinsci:masterfrom
jacob-keller:jk-fix-polling-local-branches

jacob-keller commented Apr 4, 2018 •

edited

Loading

Uh oh!

jacob-keller commented Apr 5, 2018

Uh oh!

jacob-keller commented Apr 5, 2018

Uh oh!

jacob-keller commented Apr 6, 2018

Uh oh!

jacob-keller commented Apr 6, 2018

Uh oh!

jacob-keller commented Apr 11, 2018

Uh oh!

MarkEWaite commented Apr 11, 2018

Uh oh!

jacob-keller commented Apr 12, 2018

Uh oh!

jacob-keller commented Apr 13, 2018

Uh oh!

jacob-keller commented Apr 23, 2018

Uh oh!

MarkEWaite commented May 15, 2018

Uh oh!

jacob-keller commented May 15, 2018

Uh oh!

MarkEWaite commented May 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jacob-keller commented Apr 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacob-keller commented Apr 5, 2018

Uh oh!

jacob-keller commented Apr 5, 2018

Uh oh!

jacob-keller commented Apr 6, 2018

Uh oh!

jacob-keller commented Apr 6, 2018

Uh oh!

jacob-keller commented Apr 11, 2018

Uh oh!

MarkEWaite commented Apr 11, 2018

Uh oh!

jacob-keller commented Apr 12, 2018

Uh oh!

jacob-keller commented Apr 13, 2018

Uh oh!

jacob-keller commented Apr 23, 2018

Uh oh!

MarkEWaite commented May 15, 2018

Uh oh!

jacob-keller commented May 15, 2018

Uh oh!

MarkEWaite commented May 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jacob-keller commented Apr 4, 2018 •

edited

Loading