-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leave oauth on jenkins extended test, use token for http-level access #12440
leave oauth on jenkins extended test, use token for http-level access #12440
Conversation
Need the image updated before the extended test will pass |
oooh, nice! |
Hm it is acting like the image wasn't updated. I didn't do a thorough exam
of the image on docker io but its time stamp indicated it was new. I'll do
some manual runs when i get back from PT and see what is up.
…On Wed, Jan 11, 2017 at 2:53 AM OpenShift Bot ***@***.***> wrote:
continuous-integration/openshift-jenkins/testextended FAILURE (
https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/970/)
(Base Commit: 12b6215
<12b6215>)
(Extended Tests: core(openshift pipeline build))
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#12440 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADbadHrq02I3MPsz3JllKI0PPizQLPl_ks5rRIp8gaJpZM4LgCz0>
.
|
OK ... the extended tests pass locally for me using the official docker.io jenkins images (vs. running with my test image which I had to do during development of this pull). Going to assume there was just a timing issue with when the image was available on docker.io vs. when the test system was able to update the local version of the image. Will trigger another run momentarily. |
5317c6f
to
c574a74
Compare
Hmm ... no getting error on vagrant-openshift set up for this pull:
Did not see any existing flakes ... deleted/reposted the extended test comment in case it some how was using the earlier commit ID before I rebased this PR. |
Yep - deleting / reposting the test comment seemed to do the trick. |
Ah .... there is a new set of tests in test/extended/builds/pipeline.go (it is the one @csrwng introduced recently). My test spec of "openshift pipeline builds" (which I copied / pasted from @csrwng 's PR) hits pipeline.go. "openshift pipeline plugin" would have triggered jenkins_plugin.go tests, which are what I was focused. In any event, the pipline.go stuff is still hitting 403's on direct http accesses. Those tests still have ENABLE_OAUTH set to false on the Ultimately, the extended test focus should be "openshift pipeline" to capture both pipeline.go and jenkins_plugin.go. |
c574a74
to
2ec0037
Compare
OK, the test is still in flight, but the PR run is still getting 403's on direct http access, even though both pipeline.go and jenkins_plugin.go are passing for me locally now with the docker.io jenkins image. Maybe there is some sort of user permission difference when running in the PR tester vs. running locally? Or something is messed up with docker on these test systems and we have stale jenkins images? In either case, I'm going to have to push some temporary debug up to the PR to better nail down what is going on. I'll report back when I have some findings. |
Oooooh ... this time the jenkins pod dump had something I've never seen before which torpedos any OAuth based access to Jenkins:
@enj - FYI - The above exception happened when we try to hit the |
Sure enough, there is a panic noted in the master log (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/973/s3/download/test-extended/core/logs/openshift.log) wrt the GET on It looks like:
|
@sttts for apiserver rewiring |
Slightly easier to read stack trace:
|
2ec0037
to
977d989
Compare
With #12453 the use of OAUTH and direct http access via token has already successfully worked multiple times with the currently running extended test. Will report back with any relevant analysis once the test completes. |
k8s plugin does not seem happy ... |
Could be related to the issue with the extra container in the pod being
discussed on the mailing list.
Ben Parees | OpenShift
…On Jan 14, 2017 21:21, "Gabe Montero" ***@***.***> wrote:
k8s plugin does not seem happy ...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12440 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEvl3tSG9uf6jyu19Qe8miNeD1VxbG33ks5rSYKXgaJpZM4LgCz0>
.
|
@bparees - That thread could be part of it, but there are some more fundamental issues going on (which are totally independent of the token based http access this PR if focused on). Several of the tests are encountering issues permissions issues with the jenkins service account. I've seen elements of this both with the openshift-restclient that jenkins-plugin uses, as well as the k8s plugin (which uses the fabric client under the covers). An example from our jenkins-plugin and the openshift-restclient:
An example from the k8s plugin and fabric8:
I'm seeing the similar errors in our overnight test jobs as well ... search for Something underneath us has broke recently. Don't know yet if it is extended test specific or general. I'll try mimicking some of these test cases manually in a local jenkins env I set up and see what transpires. All this said, I think the turning on of oauth for the extended tests proves out, and technically speaking this PR could be merged. But I'm fine if you want to wait as well. |
yeah i'd like to hold this out until we get some stability back into the existing codebase. |
977d989
to
3eee154
Compare
At least the permission issues in jenkins_plugin.go are getting fixed once #12508 merges |
We minimally need openshift/jenkins#231 and https://ci.openshift.redhat.com/jenkins/view/Image%20Verification/job/push_images_s2i/6747/ to complete to see about the k8s plugin tests passing again. |
3eee154
to
e6c8842
Compare
OK we are getting close. The only failure with this PR's ext test run was #12479 Note, the orchestration plugin test ext test passed locally for me just now, but the blue-green test failed. New regressions beneath use non-withstanding, it is time to focus on pipeline.go, and add some full dumps of the jenkins master and slave pods as needed. |
Both pipeline.go and jenkins_plugin.go passed overnight, and are passing for me locally this morning. I'm going to push an update momentarily to add some more debug to pipeline.go when failures occur, and kick off another extended test run. |
e6c8842
to
66ab7a3
Compare
So in the run this time, I got an intermittent hiccup with the Orchestration test case from pipeline.go. I'm circling through the added debug to see if I can discern anything. @csrwng - would you have any cycles to help expedite diagnosis of these failures? Certainly these failures are intermittent (it passed in the overnight build and for me locally today, but failed locally for me last night, failed in recent overnight runs (per @PI-Victor see https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/985/consoleFull#123465095256cbb9a5e4b02b88ae8c2f77), and in this PR so far). |
@gabemontero I can take a look at it later today/early tomorrow morning |
c236737
to
60d2c20
Compare
OK, in this last run, only the orchestration pipeline failed. This time for debug I
In any event, per the discussion between @csrwng, @bparees, and myself after scrum, I'm just going to comment out the orchestration pipeline for now and get enough consistency in the test run for this PR to get it merged. Per our post-scrum discussion, let's use #12479 for @csrwng to spend some time curating this test, sort out these prolonged timing windows, and then re-intergrate. I'll trim the debug a bit, comment out the test, and if we get some consistent success, re-ask for the merge. |
db626aa
to
711b54d
Compare
OK the extended tests have passed mutliple times in a row now with the orchestration pipeline test commented out. @bparees - please revisit the changes in the PR and let's redo the comments->merge loop. thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple questions, mostly looks good.
@@ -13,36 +16,98 @@ import ( | |||
"github.com/openshift/origin/test/extended/util/jenkins" | |||
) | |||
|
|||
func debugAnyJenkinsFailure(br *exutil.BuildResult, name string, oc *exutil.CLI, dumpMaster bool) { | |||
if !br.BuildSuccess { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems slightly weird. Why not just remove the br argument and only call debugAnyJenkinsFailure when desired(namely when the build fails)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent was to just code up the if once vs. coding up the if check in all the places I added calls to debugAnyJenkinsFailure
. I can switch it up if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's fine to leave it, but if we end up wanting to reuse this for debugging other failures, we're going to end up refactoring it.
|
||
if os.Getenv(jenkins.DisableJenkinsMemoryStats) == "" { | ||
g.By("start jenkins gc tracking") | ||
ticker = jenkins.StartJenkinsGCTracking(oc, oc.Namespace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this test only track GC (not memory)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory tracking debug is very verbose. For the purposes of the debug I did the last few days in this pull, I simply needed to prove that the heap was NOT too small and that we were NOT under GC duress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hence, I made the choice in which debug you wanted a bit more granular and selectable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this check be based on a different env variable name then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is more in line with the whole granular motif I've been espousing. I'll make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update pushed
} | ||
}() | ||
if os.Getenv(jenkins.DisableJenkinsMemoryStats) == "" { | ||
ticker = jenkins.StartJenkinsMemoryTracking(oc, jenkinsNamespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this test only tracks memory and not gc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, and the memory analysis here is really centered more on the native memory aspects of the JVM and not the heap aspects. If I recall correctly, in the original problem @jupierce chased down, the heap itself was not constrained. There was not a GC issue with that one. So yeah, again, I chose to make the debug tools more granular, and have applied only the ones that have so far been deemed necessary for each set of tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is very verbose as you say above, do we want it on by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is a question of confidence that we aren't hitting the native memory issue any more. If it proved to be very intermittent, and we didn't have this on when it happens again, that would be a bummer.
I'll defer to you and/or @jupierce on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough, let's leave it on for now, if we get to 32bit JVM and are stable for a while, maybe we can turn it off then.
|
||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
if os.Getenv(jenkins.DisableJenkinsMemoryStats) == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the assumption is a developer would set this locally when running extended tests? I assume it's not being set in our jenkins extended test runs today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct and Correct. And in case it wasn't clear, we already merged in the use of this env var. I simply moved it from a private var in jenkins_plugin.go
to a public var in monitor.go
since it is being leveraged in different places now.
711b54d
to
70a726d
Compare
} | ||
cleanup = func() { | ||
if os.Getenv(jenkins.DisableJenkinsGCSTats) == "" { | ||
g.By("stop jenkins memory tracking") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/memory/gc/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update pushed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one final nit and lgtm.
70a726d
to
6582f26
Compare
6582f26
to
a6b94b4
Compare
[merge] |
Evaluated for origin merge up to a6b94b4 |
[Test]ing while waiting on the merge queue |
Evaluated for origin testextended up to a6b94b4 |
in the test run, |
@gabemontero please tag flakes anyway just so we can help identify frequency and get attention on them. in this case it was flake #12667 |
Evaluated for origin test up to a6b94b4 |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13318/) (Base Commit: adc5ee3) |
continuous-integration/openshift-jenkins/testextended FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/1036/) (Base Commit: 59e57b1) (Extended Tests: core(openshift pipeline)) |
looks like the extended test got hung? |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13327/) (Base Commit: 50300d1) (Image: devenv-rhel7_5783) |
Yeah the blue-green test failed this time. I've seen that fail sometimes though not as frequently as the orchestration pipeline test we commented out. I suspect it falls under the same sort of flake category as the orchestration tests (long delay or problem in pods getting started). Perhaps @csrwng 's upcoming investigation / rework in the orchestration test will have some carry-over to blue-green. Certainly if the flakes start coming up more regularly in the overnight runs, temporarily disabling it is a consideration. I do have a theory on the way the test ended. I ended up putting a If that theory holds water with others here, I'll open a new pull for reworking the monitoring piece to get invoked from the go threads themselves. |
@bparees PTAL
of course, this should not merge until the jenkins centos image is updated, but I'm assuming that happens before we finish reviewing this change and the merge queue frees up from the rebase.