Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JENKINS-50556 don't check local refs if we have remote ones #579

Open
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@jacob-keller
Copy link

jacob-keller commented Apr 4, 2018

JENKINS-50556 don't check local refs if we had remote ones

Fix an issue related to how polling works with remote vs local references. In the DefaultBuildChooser, we currently check branch-like references first, by converting them to remote/name lookups. This is done so that we can check the status of the branches according to the remote references. If we don't find one, then we assume that the branchSpec is actually a tag or tag-like reference.

Unfortunately, during polling, we prune these references that have already been built. This results in an accidental check of local branches, in the case where remote branches exist, but have no new changes. In some cases, this can result in polling reporting a change to build, but which will never be found by the build chooser during run time (as isPollCall would be false then, and we would not report the local branches in this case).

Fix this by instead pruning the list of references at the end, after collecting them. This ensures that the DefaultBuildChooser will work similarly in both the normal and polling cases, and avoids the possibility of permanently polling which can happen if a local branch gets a reference checked out.

Note that the polling bug may not be caught in many cases, because it happens that local branches do not get setup by the git plugin normally. Additionally, the remote polling setup is not susceptible to this issue, as it does not use the workspace for polling, instead using only ls-remote.

Add a test case which forces this sequence of events in order to highlight the issue, and make sure we don't re-introduce it.

@jacob-keller jacob-keller changed the title Jk fix polling local branches JENKINS-50556 test case highlighting breakage Apr 5, 2018

@jacob-keller jacob-keller force-pushed the jacob-keller:jk-fix-polling-local-branches branch from 51b11e2 to 28a70a1 Apr 5, 2018

@jacob-keller

This comment has been minimized.

Copy link
Author

jacob-keller commented Apr 5, 2018

I have a possible solution to add to this shortly.

@jacob-keller jacob-keller force-pushed the jacob-keller:jk-fix-polling-local-branches branch from 948ee3e to b6b0a57 Apr 5, 2018

@jacob-keller

This comment has been minimized.

Copy link
Author

jacob-keller commented Apr 5, 2018

Oops, the singleton collection returned by getHeadRevision is immutable, so we need to use addAll() when getting the tag-like references. Fixed that.

@jacob-keller

This comment has been minimized.

Copy link
Author

jacob-keller commented Apr 6, 2018

Ok, so there's some tests failing with timeout, and they sometimes fail locally, and other times they don't. Especially if I limit the test run to a single test case, it doesn't seem to fail. I also see similar tests fail locally if I run the test suite against master. Any thoughts on how to debug this?

@jacob-keller

This comment has been minimized.

Copy link
Author

jacob-keller commented Apr 6, 2018

It seems like some of the tests only appear to timeout during the full test run, but if run manually they do not timeout. Additionally, when I run the test suite locally, many tests timeout, but appear to have been marked as "flaky", and get re-run. I'm trying a local test run with the jenkins.test.timeout value set to a large number to see if it's maybe just a race condition for a few specific tests, or if it really is some deadlock that occurs.

@jacob-keller jacob-keller changed the title JENKINS-50556 test case highlighting breakage JENKINS-50556 don't check local refs if we have remote ones Apr 6, 2018

@jacob-keller

This comment has been minimized.

Copy link
Author

jacob-keller commented Apr 11, 2018

I'm also pretty sure these tests are timing out due to the lack of entropy on the virtual machine (SSHD fails to start up when generating private keys due to blocking on /dev/random). It fails intermittently. Is there a way to re-run the tests?

@MarkEWaite

This comment has been minimized.

Copy link

MarkEWaite commented Apr 11, 2018

The simplest way that I've found to run tests on a pull request again is to git commit --amend the most recent commit, change the text of the commit message, then git push --force to update the pull request.

@jacob-keller jacob-keller force-pushed the jacob-keller:jk-fix-polling-local-branches branch from b6b0a57 to 68e469f Apr 12, 2018

@jacob-keller

This comment has been minimized.

Copy link
Author

jacob-keller commented Apr 12, 2018

Looks like the same problem again with the failing to generate SSHD server keys in the JenkinsRule timeout.

@jacob-keller

This comment has been minimized.

Copy link
Author

jacob-keller commented Apr 13, 2018

Looks like @jglick tried something to re-run the tests,but they still timeout:

Error test timed out after 180 seconds Stacktrace org.junit.runners.model.TestTimedOutException: test timed out after 180 seconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:267) at jenkins.InitReactorRunner.run(InitReactorRunner.java:44) at jenkins.model.Jenkins.executeReactor(Jenkins.java:916) at jenkins.model.Jenkins.<init>(Jenkins.java:815) at hudson.model.Hudson.<init>(Hudson.java:85) at org.jvnet.hudson.test.JenkinsRule.newHudson(JenkinsRule.java:611) at org.jvnet.hudson.test.JenkinsRule.before(JenkinsRule.java:384) at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:537) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) Standard Output

Jacob Keller added some commits Apr 5, 2018

Jacob Keller
remove commented out code in DefaultBuildChooser
This code was commented out by 52d75b9 ("Comment out changes from
4f963ad, b55df6b, and 46f610e.", 2011-07-22), which was done 7 years
ago. The code remains in the history for those that need it. We don't
need to pollute the remaining implementation with code that is not used
and probably does not work.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Jacob Keller
use the helper objectId2Revision in DefaultBuildChooser
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Jacob Keller
fix JENKINS-50556 local branch names cause polling to return true
When the git plugin attempts to determine if there are new changes using
the DefaultBuildChooser it can sometimes accidentally report changes
even when there are no new changes.

This can occur in rare cases due to the way the code handles non-branch
references.

Currently, we first check the reference pre-fixed with the remote name,
assuming that its a branch. getHeadRevision is used to find all the
proper revisions. However, when we are polling, we exclude any already
built changes. Since in this scenario, the remote hasn't updated, we'll
report that no changes exist.

Because we didn't find any valid references, we then check for whether
the reference is actually a tag or other specific value (such as
"refs/changes/<change number>" from gerrit, by checking the ref
directly.

In some cases, it is possible for the local work tree to have a checkout
of the local branch which is at some commit that maybe hasn't been built
yet. (For example, if the job in question creates commits or performs
git commands).

This results in the polling reporting that a new commit exists, which
triggers a build. However, the actual build will not be for that commit,
so we end up in an endless loop where polling reports a new commit every
time.

The issue is somewhat rare for a number of reasons:

 (a) it requires a local branch to be checked out, which does not happen
     by default when checking out using the git plugin
 (b) the local branch must be checked out to a commit that is *not* on
     the remote

This is somewhat unlikely but possible if a job happens to generate new
commits, or perform other work as part of its tasks. Once the branch is
created, it causes the endless rebuilding.

The real root cause is due to how we handle scrubbing already built
revisions. We currently discard them directly in getHeadRevision.

Instead, move this check outside of getHeadRevision only *after* we
check for tags. This ensures, that the check for non-branch refs will
only occur in the case where there *are* no matching remote names. This
avoids accidentally looking at the local branch name in this case,
ensuring that we do not report its commits as candidates for a build.

This ensures that we'll match commit selection when polling vs when not
polling, and prevent the endless rebuild cycle.

Add a test case highlighting the issue so that we can prevent
regressions in the future.

Fixes: JENKINS-50556
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Jacob Keller
drop unused parameters on getHeadRevisions
Now that we don't use the isPollCall or BuildData values, we can drop
them from this private helper function.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Jacob Keller
Use addAll instead of replacing the revisions collection
A future patch is going to migrate the code which ignores already built
elements from outside of getHeadRevision directly into the
getCandidateRevisions.

However, currently when we check for tag-like refs we just assign
revisions to the collection returned by the getHeadRevision. This will
cause a problem because the Collections.singleton() is immutable and does
not support removing elements.

Fix this by using .addAll to put the elements into the HashSet rather
than continuing to use the singleton collection.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Title: Use addAll instead of replacing the revisions collection
Change-type: DefectResolution

@jacob-keller jacob-keller force-pushed the jacob-keller:jk-fix-polling-local-branches branch from 68e469f to f6fb108 Apr 23, 2018

@jacob-keller

This comment has been minimized.

Copy link
Author

jacob-keller commented Apr 23, 2018

I fixed up the commit messages which needed to be re-worded, and squashed the test case into the same patch which fixes the bug. I also updated the message to clearly indicate the bug, and why we didn't find it in the past.

@MarkEWaite

This comment has been minimized.

Copy link

MarkEWaite commented May 15, 2018

I believe I've created a verification job that shows the problem in my environment. Unfortunately, that job shows the same "build every poll" whether I'm using this set of changes or not.

I still haven't found the correct set of configuration steps to duplicate the problem without the change and then to see the problem fixed with the change.

@jacob-keller

This comment has been minimized.

Copy link
Author

jacob-keller commented May 15, 2018

My test case is to have a job follow a branch "master" (but not explicitly marked as "origin/master"), and then also run a git checkout within the repository.

Specifically, I found this in a job which clones the repos, and makes a new commit on the branch and publishes it to gerrit.

I think your example has the */ which doesn't match a local branch reference.

If you look at my example test case, the only way so far I've found to trigger this is when the local job actually does a git checkout of some kind, as currently we happen to avoid the issue due to how clone never actually checks out the branch but instead just leaves us on a detached head checkout, and local branches aren't fetched.

So, the only way I've actually triggered this is to explicitly checkout something in a shell script or by hand in the workspace outside the job. I suppose there may also be configurations which do result in a checkout of the local branch, but I haven't found one yet.

Additionally, the checkout would need to be to some sha1 which hasn't yet been built by some other job. Otherwise we'll check against build data and discover it was already built by a different build.

@MarkEWaite

This comment has been minimized.

Copy link

MarkEWaite commented May 15, 2018

I'll experiment further. I'm using the "LocalBranch" extension (checkout to specific local branch) which performs a checkout) and forcing workspace polling. I'll try switching the definition of the branch to checkout from the intentionally wildcard version to a non-wildcard version.

Thanks for investigating!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.