Skip to content

JENKINS-50556 don't check local refs if we have remote ones#579

Closed
jacob-keller wants to merge 6 commits intojenkinsci:masterfrom
jacob-keller:jk-fix-polling-local-branches
Closed

JENKINS-50556 don't check local refs if we have remote ones#579
jacob-keller wants to merge 6 commits intojenkinsci:masterfrom
jacob-keller:jk-fix-polling-local-branches

Conversation

@jacob-keller
Copy link
Contributor

@jacob-keller jacob-keller commented Apr 4, 2018

JENKINS-50556 don't check local refs if we had remote ones

Fix an issue related to how polling works with remote vs local references. In the DefaultBuildChooser, we currently check branch-like references first, by converting them to remote/name lookups. This is done so that we can check the status of the branches according to the remote references. If we don't find one, then we assume that the branchSpec is actually a tag or tag-like reference.

Unfortunately, during polling, we prune these references that have already been built. This results in an accidental check of local branches, in the case where remote branches exist, but have no new changes. In some cases, this can result in polling reporting a change to build, but which will never be found by the build chooser during run time (as isPollCall would be false then, and we would not report the local branches in this case).

Fix this by instead pruning the list of references at the end, after collecting them. This ensures that the DefaultBuildChooser will work similarly in both the normal and polling cases, and avoids the possibility of permanently polling which can happen if a local branch gets a reference checked out.

Note that the polling bug may not be caught in many cases, because it happens that local branches do not get setup by the git plugin normally. Additionally, the remote polling setup is not susceptible to this issue, as it does not use the workspace for polling, instead using only ls-remote.

Add a test case which forces this sequence of events in order to highlight the issue, and make sure we don't re-introduce it.

@jacob-keller jacob-keller changed the title Jk fix polling local branches JENKINS-50556 test case highlighting breakage Apr 5, 2018
@jacob-keller jacob-keller force-pushed the jk-fix-polling-local-branches branch from 51b11e2 to 28a70a1 Compare April 5, 2018 18:15
@jacob-keller
Copy link
Contributor Author

I have a possible solution to add to this shortly.

@jacob-keller jacob-keller force-pushed the jk-fix-polling-local-branches branch from 948ee3e to b6b0a57 Compare April 5, 2018 22:31
@jacob-keller
Copy link
Contributor Author

Oops, the singleton collection returned by getHeadRevision is immutable, so we need to use addAll() when getting the tag-like references. Fixed that.

@jacob-keller
Copy link
Contributor Author

Ok, so there's some tests failing with timeout, and they sometimes fail locally, and other times they don't. Especially if I limit the test run to a single test case, it doesn't seem to fail. I also see similar tests fail locally if I run the test suite against master. Any thoughts on how to debug this?

@jacob-keller
Copy link
Contributor Author

It seems like some of the tests only appear to timeout during the full test run, but if run manually they do not timeout. Additionally, when I run the test suite locally, many tests timeout, but appear to have been marked as "flaky", and get re-run. I'm trying a local test run with the jenkins.test.timeout value set to a large number to see if it's maybe just a race condition for a few specific tests, or if it really is some deadlock that occurs.

@jacob-keller jacob-keller changed the title JENKINS-50556 test case highlighting breakage JENKINS-50556 don't check local refs if we have remote ones Apr 6, 2018
@jacob-keller
Copy link
Contributor Author

I'm also pretty sure these tests are timing out due to the lack of entropy on the virtual machine (SSHD fails to start up when generating private keys due to blocking on /dev/random). It fails intermittently. Is there a way to re-run the tests?

@MarkEWaite
Copy link
Contributor

The simplest way that I've found to run tests on a pull request again is to git commit --amend the most recent commit, change the text of the commit message, then git push --force to update the pull request.

@jacob-keller jacob-keller force-pushed the jk-fix-polling-local-branches branch from b6b0a57 to 68e469f Compare April 12, 2018 22:20
@jacob-keller
Copy link
Contributor Author

Looks like the same problem again with the failing to generate SSHD server keys in the JenkinsRule timeout.

@jacob-keller
Copy link
Contributor Author

Looks like @jglick tried something to re-run the tests,but they still timeout:

Error test timed out after 180 seconds Stacktrace org.junit.runners.model.TestTimedOutException: test timed out after 180 seconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:267) at jenkins.InitReactorRunner.run(InitReactorRunner.java:44) at jenkins.model.Jenkins.executeReactor(Jenkins.java:916) at jenkins.model.Jenkins.<init>(Jenkins.java:815) at hudson.model.Hudson.<init>(Hudson.java:85) at org.jvnet.hudson.test.JenkinsRule.newHudson(JenkinsRule.java:611) at org.jvnet.hudson.test.JenkinsRule.before(JenkinsRule.java:384) at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:537) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) Standard Output

This code was commented out by 52d75b9 ("Comment out changes from
4f963ad, b55df6b, and 46f610e.", 2011-07-22), which was done 7 years
ago. The code remains in the history for those that need it. We don't
need to pollute the remaining implementation with code that is not used
and probably does not work.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
A future patch is going to migrate the code which ignores already built
elements from outside of getHeadRevision directly into the
getCandidateRevisions.

However, currently when we check for tag-like refs we just assign
revisions to the collection returned by the getHeadRevision. This will
cause a problem because the Collections.singleton() is immutable and does
not support removing elements.

Fix this by using .addAll to put the elements into the HashSet rather
than continuing to use the singleton collection.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Title: Use addAll instead of replacing the revisions collection
Change-type: DefectResolution
When the git plugin attempts to determine if there are new changes using
the DefaultBuildChooser it can sometimes accidentally report changes
even when there are no new changes.

This can occur in rare cases due to the way the code handles non-branch
references.

Currently, we first check the reference pre-fixed with the remote name,
assuming that its a branch. getHeadRevision is used to find all the
proper revisions. However, when we are polling, we exclude any already
built changes. Since in this scenario, the remote hasn't updated, we'll
report that no changes exist.

Because we didn't find any valid references, we then check for whether
the reference is actually a tag or other specific value (such as
"refs/changes/<change number>" from gerrit, by checking the ref
directly.

In some cases, it is possible for the local work tree to have a checkout
of the local branch which is at some commit that maybe hasn't been built
yet. (For example, if the job in question creates commits or performs
git commands).

This results in the polling reporting that a new commit exists, which
triggers a build. However, the actual build will not be for that commit,
so we end up in an endless loop where polling reports a new commit every
time.

The issue is somewhat rare for a number of reasons:

 (a) it requires a local branch to be checked out, which does not happen
     by default when checking out using the git plugin
 (b) the local branch must be checked out to a commit that is *not* on
     the remote

This is somewhat unlikely but possible if a job happens to generate new
commits, or perform other work as part of its tasks. Once the branch is
created, it causes the endless rebuilding.

The real root cause is due to how we handle scrubbing already built
revisions. We currently discard them directly in getHeadRevision.

Instead, move this check outside of getHeadRevision only *after* we
check for tags. This ensures, that the check for non-branch refs will
only occur in the case where there *are* no matching remote names. This
avoids accidentally looking at the local branch name in this case,
ensuring that we do not report its commits as candidates for a build.

This ensures that we'll match commit selection when polling vs when not
polling, and prevent the endless rebuild cycle.

Add a test case highlighting the issue so that we can prevent
regressions in the future.

Fixes: JENKINS-50556
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Now that we don't use the isPollCall or BuildData values, we can drop
them from this private helper function.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
@jacob-keller jacob-keller force-pushed the jk-fix-polling-local-branches branch from 68e469f to f6fb108 Compare April 23, 2018 18:02
@jacob-keller
Copy link
Contributor Author

I fixed up the commit messages which needed to be re-worded, and squashed the test case into the same patch which fixes the bug. I also updated the message to clearly indicate the bug, and why we didn't find it in the past.

@MarkEWaite
Copy link
Contributor

I believe I've created a verification job that shows the problem in my environment. Unfortunately, that job shows the same "build every poll" whether I'm using this set of changes or not.

I still haven't found the correct set of configuration steps to duplicate the problem without the change and then to see the problem fixed with the change.

@jacob-keller
Copy link
Contributor Author

My test case is to have a job follow a branch "master" (but not explicitly marked as "origin/master"), and then also run a git checkout within the repository.

Specifically, I found this in a job which clones the repos, and makes a new commit on the branch and publishes it to gerrit.

I think your example has the */ which doesn't match a local branch reference.

If you look at my example test case, the only way so far I've found to trigger this is when the local job actually does a git checkout of some kind, as currently we happen to avoid the issue due to how clone never actually checks out the branch but instead just leaves us on a detached head checkout, and local branches aren't fetched.

So, the only way I've actually triggered this is to explicitly checkout something in a shell script or by hand in the workspace outside the job. I suppose there may also be configurations which do result in a checkout of the local branch, but I haven't found one yet.

Additionally, the checkout would need to be to some sha1 which hasn't yet been built by some other job. Otherwise we'll check against build data and discover it was already built by a different build.

@MarkEWaite
Copy link
Contributor

I'll experiment further. I'm using the "LocalBranch" extension (checkout to specific local branch) which performs a checkout) and forcing workspace polling. I'll try switching the definition of the branch to checkout from the intentionally wildcard version to a non-wildcard version.

Thanks for investigating!

@MarkEWaite MarkEWaite added the bugfix Fixes a bug - used by Release Drafter label Jul 13, 2019
@MarkEWaite MarkEWaite added this to the Later milestone Dec 19, 2019
@MarkEWaite MarkEWaite closed this Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Fixes a bug - used by Release Drafter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants