leave oauth on jenkins extended test, use token for http-level access #12440

gabemontero · 2017-01-11T00:06:10Z

of course, this should not merge until the jenkins centos image is updated, but I'm assuming that happens before we finish reviewing this change and the merge queue frees up from the rebase.

gabemontero · 2017-01-11T01:23:06Z

Need the image updated before the extended test will pass

bparees · 2017-01-11T02:22:23Z

oooh, nice!
lgtm pending success.

gabemontero · 2017-01-11T14:11:51Z

Hm it is acting like the image wasn't updated. I didn't do a thorough exam of the image on docker io but its time stamp indicated it was new. I'll do some manual runs when i get back from PT and see what is up.

…

On Wed, Jan 11, 2017 at 2:53 AM OpenShift Bot ***@***.***> wrote: continuous-integration/openshift-jenkins/testextended FAILURE ( https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/970/) (Base Commit: 12b6215 <12b6215>) (Extended Tests: core(openshift pipeline build)) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#12440 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADbadHrq02I3MPsz3JllKI0PPizQLPl_ks5rRIp8gaJpZM4LgCz0> .

gabemontero · 2017-01-11T16:26:23Z

OK ... the extended tests pass locally for me using the official docker.io jenkins images (vs. running with my test image which I had to do during development of this pull). Going to assume there was just a timing issue with when the image was available on docker.io vs. when the test system was able to update the local version of the image. Will trigger another run momentarily.

gabemontero · 2017-01-11T16:35:34Z

Hmm ... no getting error on vagrant-openshift set up for this pull:

*****Locally Merging Pull Request: https://github.com/openshift/origin/pull/12440
+ test_pull_requests --local_merge_pull_request 12440 --repo origin --config /var/lib/jenkins/.test_pull_requests_origin.json

  Checking if current base repo commit ID matches what we expect
  Deleting comment #271917118
Base repository commit ID 6468143888ef9a8d30cc72b6d3a59be896c8588f doesn't match evaluated commit ID 4330ed72d5baff5578d080771ce2aca2f4c751b6
Build step 'Execute shell' marked build as failure

Did not see any existing flakes ... deleted/reposted the extended test comment in case it some how was using the earlier commit ID before I rebased this PR.

gabemontero · 2017-01-11T16:44:21Z

Yep - deleting / reposting the test comment seemed to do the trick.

gabemontero · 2017-01-11T17:37:18Z

Ah .... there is a new set of tests in test/extended/builds/pipeline.go (it is the one @csrwng introduced recently). My test spec of "openshift pipeline builds" (which I copied / pasted from @csrwng 's PR) hits pipeline.go. "openshift pipeline plugin" would have triggered jenkins_plugin.go tests, which are what I was focused.

In any event, the pipline.go stuff is still hitting 403's on direct http accesses. Those tests still have ENABLE_OAUTH set to false on the oc new-app <jenkins template>l call. I'll push an update for that in a bit.

Ultimately, the extended test focus should be "openshift pipeline" to capture both pipeline.go and jenkins_plugin.go.

gabemontero · 2017-01-11T19:13:22Z

OK, the test is still in flight, but the PR run is still getting 403's on direct http access, even though both pipeline.go and jenkins_plugin.go are passing for me locally now with the docker.io jenkins image.

Maybe there is some sort of user permission difference when running in the PR tester vs. running locally? Or something is messed up with docker on these test systems and we have stale jenkins images?

In either case, I'm going to have to push some temporary debug up to the PR to better nail down what is going on. I'll report back when I have some findings.

gabemontero · 2017-01-11T19:30:19Z

Oooooh ... this time the jenkins pod dump had something I've never seen before which torpedos any OAuth based access to Jenkins:

com.google.api.client.http.HttpResponseException: 500 Internal Server Error
This request caused apisever to panic. Look in log for details.
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1061)
	at org.openshift.jenkins.plugins.openshiftlogin.OpenShiftOAuth2SecurityRealm.getOpenShiftOAuthProvider(OpenShiftOAuth2SecurityRealm.java:474)
	at org.openshift.jenkins.plugins.openshiftlogin.OpenShiftOAuth2SecurityRealm.populateDefaults(OpenShiftOAuth2SecurityRealm.java:346)
	at org.openshift.jenkins.plugins.openshiftlogin.OpenShiftOAuth2SecurityRealm.<init>(OpenShiftOAuth2SecurityRealm.java:274)
	at org.openshift.jenkins.plugins.openshiftlogin.OpenShiftSetOAuth.setOauth(OpenShiftSetOAuth.java:69)
	at org.openshift.jenkins.plugins.openshiftlogin.OpenShiftSetOAuth.setOauth(OpenShiftSetOAuth.java:46)
	at org.openshift.jenkins.plugins.openshiftlogin.OpenShiftItemListener.onLoaded(OpenShiftItemListener.java:41)
	at jenkins.model.Jenkins.<init>(Jenkins.java:960)
	at hudson.model.Hudson.<init>(Hudson.java:85)
	at hudson.model.Hudson.<init>(Hudson.java:81)
	at hudson.WebAppMain$3.run(WebAppMain.java:231)

@enj - FYI - The above exception happened when we try to hit the /.well-known/oauth-authorization-server OAuth http endpoint on the master. I'll have to wait for the extended test run to complete and see what appears in the master log. Given I git-rebased this PR against the newly rebased k8s/origin level on master, I wonder if we've hit a newly introduced issue with the just finished rebase..

gabemontero · 2017-01-11T21:29:50Z

Sure enough, there is a panic noted in the master log (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/973/s3/download/test-extended/core/logs/openshift.log) wrt the GET on /.well-known/oauth-authorization-server. I'll open an origin issue.

It looks like:

E0111 14:05:33.258571   12471 panics.go:37] APIServer panic'd on GET /.well-known/oauth-authorization-server: runtime error: invalid memory address or nil pointer dereference
goroutine 115106 [running]:
runtime/debug.Stack(0x8d3d0e0, 0xc42e9b5f80, 0x42897e6)
	/usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/genericapiserver/filters.WithPanicRecovery.func1.1(0x3a88d80, 0xc4200100d0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/genericapiserver/filters/panics.go:37 +0x74
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime.HandleCrash(0xc42dc09ec8, 0x1, 0x1)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:52 +0xe5
panic(0x3a88d80, 0xc4200100d0)
	/usr/local/go/src/runtime/panic.go:458 +0x243
github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).dispatch.func2(0xc422be2fc0, 0xc42dc08e28)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:206 +0x62
panic(0x3a88d80, 0xc4200100d0)
	/usr/local/go/src/runtime/panic.go:458 +0x243
github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).dispatch.func4(0xc422be2fc0, 0xc42dc08da8, 0xc42fbbc098, 0xc43112d180, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:241 +0x74
github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).dispatch(0xc422be2fc0, 0x8d3ca20, 0xc42fbbc090, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:242 +0x170
github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).(github.com/openshift/origin/vendor/github.com/emicklei/go-restful.dispatch)-fm(0x8d3ca20, 0xc42fbbc090, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:120 +0x48
net/http.HandlerFunc.ServeHTTP(0xc4229111f0, 0x8d3ca20, 0xc42fbbc090, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:1726 +0x44
net/http.(*ServeMux).ServeHTTP(0xc4210b3f50, 0x8d3ca20, 0xc42fbbc090, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:2022 +0x7f
github.com/openshift/origin/pkg/cmd/server/origin.(*MasterConfig).authorizationFilter.func1(0x8d3ca20, 0xc42fbbc090, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/handlers.go:103 +0x171
net/http.HandlerFunc.ServeHTTP(0xc420ef8f60, 0x8d3ca20, 0xc42fbbc090, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:1726 +0x44
github.com/openshift/origin/pkg/cmd/server/origin.(*MasterConfig).impersonationFilter.func1(0x8d3ca20, 0xc42fbbc090, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/handlers.go:305 +0x2413
net/http.HandlerFunc.ServeHTTP(0xc420ef8f80, 0x8d3ca20, 0xc42fbbc090, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:1726 +0x44
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/apiserver/filters.WithAudit.func1(0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/apiserver/filters/audit.go:124 +0xa04
net/http.HandlerFunc.ServeHTTP(0xc4210aee80, 0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:1726 +0x44
github.com/openshift/origin/pkg/cmd/server/origin.authenticationHandlerFilter.func1(0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/auth.go:786 +0x2ba
net/http.HandlerFunc.ServeHTTP(0xc4210aeec0, 0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:1726 +0x44
github.com/openshift/origin/pkg/cmd/server/origin.namespacingFilter.func1(0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/handlers.go:183 +0xd2
net/http.HandlerFunc.ServeHTTP(0xc420c5fb00, 0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:1726 +0x44
github.com/openshift/origin/pkg/cmd/server/origin.cacheControlFilter.func1(0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/handlers.go:151 +0xc2
net/http.HandlerFunc.ServeHTTP(0xc420c5fce0, 0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:1726 +0x44
github.com/openshift/origin/vendor/github.com/gorilla/context.ClearHandler.func1(0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/gorilla/context/context.go:141 +0x8b
net/http.HandlerFunc.ServeHTTP(0xc42136f660, 0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:1726 +0x44
net/http.(*ServeMux).ServeHTTP(0xc42010fd70, 0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:2022 +0x7f
net/http.(*ServeMux).ServeHTTP(0xc421e02ed0, 0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:2022 +0x7f
github.com/openshift/origin/pkg/cmd/server/origin.WithPatternsHandler.func1(0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/master.go:945 +0xcd
net/http.HandlerFunc.ServeHTTP(0xc421deb000, 0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:1726 +0x44
github.com/openshift/origin/pkg/cmd/server/origin.WithAssetServerRedirect.func1(0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/handlers.go:297 +0x7f
net/http.HandlerFunc.ServeHTTP(0xc421e03770, 0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:1726 +0x44
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/genericapiserver/filters.WithCORS.func1(0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/genericapiserver/filters/cors.go:77 +0x1a2
net/http.HandlerFunc.ServeHTTP(0xc4207b9f80, 0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:1726 +0x44
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/genericapiserver/filters.WithPanicRecovery.func1(0x8d3d0e0, 0xc42e9b5f80, 0xc436b0cff0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/genericapiserver/filters/panics.go:75 +0x24a
net/http.HandlerFunc.ServeHTTP(0xc421e038c0, 0x7f9cd818b180, 0xc4326ef7d0, 0xc436b0cff0)
	/usr/local/go/src/net/http/server.go:1726 +0x44
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/genericapiserver/filters.(*timeoutHandler).ServeHTTP.func1(0xc421e6f840, 0x8d461a0, 0xc4326ef7d0, 0xc436b0cff0, 0xc426baa240)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/genericapiserver/filters/timeout.go:78 +0x8d
created by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/genericapiserver/filters.(*timeoutHandler).ServeHTTP
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/genericapiserver/filters/timeout.go:80 +0x1db

liggitt · 2017-01-11T21:37:27Z

@sttts for apiserver rewiring

enj · 2017-01-11T21:57:20Z

Slightly easier to read stack trace:

debug      stack.go:24      Stack(#3, #6, 0x42897e6)
filters    panics.go:37     WithPanicRecovery.func1.1(Handler(#1))
runtime    runtime.go:52    HandleCrash(func(0xc42dc09ec8), func(0x1))
           panic.go:458     panic(#1, #4)
go-restful container.go:206 (*Container).dispatch.func2(*Container(#5), ResponseWriter(0xc42dc08e28))
           panic.go:458     panic(#1, #4)
go-restful container.go:241 (*Container).dispatch.func4(*Container(#5), ResponseWriter(0xc42dc08da8), *Request(0xc43112d180), #9)
go-restful container.go:242 (*Container).dispatch(*Container(#5), ResponseWriter(#2), *Request(#9))
go-restful container.go:120 dispatch)-fm(*Container(#2), *WebService(#7), *ServeMux(#9))
http       server.go:1726   HandlerFunc.ServeHTTP(0xc4229111f0, #2, #7, #9)
http       server.go:2022   (*ServeMux).ServeHTTP(0xc4210b3f50, #2, #7, #9)
origin     handlers.go:103  (*MasterConfig).authorizationFilter.func1(*MasterConfig(#2), Handler(#7))
http       server.go:1726   HandlerFunc.ServeHTTP(0xc420ef8f60, #2, #7, #9)
origin     handlers.go:305  (*MasterConfig).impersonationFilter.func1(*MasterConfig(#2), Handler(#7))
http       server.go:1726   HandlerFunc.ServeHTTP(0xc420ef8f80, #2, #7, #9)
filters    audit.go:124     WithAudit.func1(Handler(#3), RequestAttributeGetter(#9))
http       server.go:1726   HandlerFunc.ServeHTTP(0xc4210aee80, #3, #6, #9)
origin     auth.go:786      authenticationHandlerFilter.func1(Handler(#3), Request(#9))
http       server.go:1726   HandlerFunc.ServeHTTP(0xc4210aeec0, #3, #6, #9)
origin     handlers.go:183  namespacingFilter.func1(Handler(#3), RequestContextMapper(#9))
http       server.go:1726   HandlerFunc.ServeHTTP(0xc420c5fb00, #3, #6, #9)
origin     handlers.go:151  cacheControlFilter.func1(Handler(#3), string(#9, len=0))
http       server.go:1726   HandlerFunc.ServeHTTP(0xc420c5fce0, #3, #6, #9)
context    context.go:141   ClearHandler.func1(Handler(#3), #9)
http       server.go:1726   HandlerFunc.ServeHTTP(0xc42136f660, #3, #6, #9)
http       server.go:2022   (*ServeMux).ServeHTTP(0xc42010fd70, #3, #6, #9)
http       server.go:2022   (*ServeMux).ServeHTTP(0xc421e02ed0, #3, #6, #9)
origin     master.go:945    WithPatternsHandler.func1(Handler(#3), Handler(#9))
http       server.go:1726   HandlerFunc.ServeHTTP(0xc421deb000, #3, #6, #9)
origin     handlers.go:297  WithAssetServerRedirect.func1(Handler(#3), string(#9, len=0))
http       server.go:1726   HandlerFunc.ServeHTTP(0xc421e03770, #3, #6, #9)
filters    cors.go:77       WithCORS.func1(Handler(#3), []string(#9 len=0 cap=0))
http       server.go:1726   HandlerFunc.ServeHTTP(0xc4207b9f80, #3, #6, #9)
filters    panics.go:75     WithPanicRecovery.func1(Handler(#3), RequestContextMapper(#9))
http       server.go:1726   HandlerFunc.ServeHTTP(0xc421e038c0, 0x7f9cd818b180, #8, #9)
filters    timeout.go:78    (*timeoutHandler).ServeHTTP.func1(*timeoutHandler(0xc421e6f840), ResponseWriter(0x8d461a0), *Request(#9), 0xc426baa240)

gabemontero · 2017-01-15T01:24:13Z

With #12453 the use of OAUTH and direct http access via token has already successfully worked multiple times with the currently running extended test.

Will report back with any relevant analysis once the test completes.

gabemontero · 2017-01-15T02:21:06Z

k8s plugin does not seem happy ...

bparees · 2017-01-15T04:02:21Z

Could be related to the issue with the extra container in the pod being discussed on the mailing list. Ben Parees | OpenShift

…

On Jan 14, 2017 21:21, "Gabe Montero" ***@***.***> wrote: k8s plugin does not seem happy ... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12440 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEvl3tSG9uf6jyu19Qe8miNeD1VxbG33ks5rSYKXgaJpZM4LgCz0> .

gabemontero · 2017-01-15T16:44:02Z

@bparees - That thread could be part of it, but there are some more fundamental issues going on (which are totally independent of the token based http access this PR if focused on).

Several of the tests are encountering issues permissions issues with the jenkins service account. I've seen elements of this both with the openshift-restclient that jenkins-plugin uses, as well as the k8s plugin (which uses the fabric client under the covers).

An example from our jenkins-plugin and the openshift-restclient:

com.openshift.restclient.authorization.ResourceForbiddenException: User "system:serviceaccount:extended-test-jenkins-plugin-nr9lo-dtkq8-jenkins:jenkins" cannot "get" on "/swaggerapi/oapi/v1" User "system:serviceaccount:extended-test-jenkins-plugin-nr9lo-dtkq8-jenkins:jenkins" cannot "get" on "/swaggerapi/oapi/v1"

An example from the k8s plugin and fabric8:

SEVERE: Failed to load initial Builds: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default/oapi/v1/namespaces/extended-test-jenkins-plugin-nr9lo-ysea9-jenkins/builds?fieldSelector=status%3DNew. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked..
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default/oapi/v1/namespaces/extended-test-jenkins-plugin-nr9lo-ysea9-jenkins/builds?fieldSelector=status%3DNew. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked..

I'm seeing the similar errors in our overnight test jobs as well ... search for ResourceForbidden in https://ci.openshift.redhat.com/jenkins/job/origin_extended_image_tests/818/consoleFull for example.

Something underneath us has broke recently. Don't know yet if it is extended test specific or general. I'll try mimicking some of these test cases manually in a local jenkins env I set up and see what transpires.

All this said, I think the turning on of oauth for the extended tests proves out, and technically speaking this PR could be merged. But I'm fine if you want to wait as well.

bparees · 2017-01-15T17:17:17Z

yeah i'd like to hold this out until we get some stability back into the existing codebase.

gabemontero · 2017-01-16T19:32:35Z

At least the permission issues in jenkins_plugin.go are getting fixed once #12508 merges

gabemontero · 2017-01-16T22:36:46Z

We minimally need openshift/jenkins#231 and https://ci.openshift.redhat.com/jenkins/view/Image%20Verification/job/push_images_s2i/6747/ to complete to see about the k8s plugin tests passing again.

gabemontero · 2017-01-17T04:19:31Z

OK we are getting close. The only failure with this PR's ext test run was #12479

Note, the orchestration plugin test ext test passed locally for me just now, but the blue-green test failed.

New regressions beneath use non-withstanding, it is time to focus on pipeline.go, and add some full dumps of the jenkins master and slave pods as needed.

gabemontero · 2017-01-17T16:03:12Z

Both pipeline.go and jenkins_plugin.go passed overnight, and are passing for me locally this morning. I'm going to push an update momentarily to add some more debug to pipeline.go when failures occur, and kick off another extended test run.

gabemontero · 2017-01-17T18:35:42Z

So in the run this time, I got an intermittent hiccup with the Orchestration test case from pipeline.go.

I'm circling through the added debug to see if I can discern anything. @csrwng - would you have any cycles to help expedite diagnosis of these failures? Certainly these failures are intermittent (it passed in the overnight build and for me locally today, but failed locally for me last night, failed in recent overnight runs (per @PI-Victor see https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/985/consoleFull#123465095256cbb9a5e4b02b88ae8c2f77), and in this PR so far).

csrwng · 2017-01-17T18:52:23Z

@gabemontero I can take a look at it later today/early tomorrow morning

gabemontero · 2017-01-25T03:50:07Z

OK, in this last run, only the orchestration pipeline failed. This time for debug I

dumped any failed maven containers .... there were none
added a polling memory dump that ran jstats --gcutil to get a sense of GC activity ... there was no heavy GC activity
similar timeout looking situation with the mapsapp / mlbparks relationship that I saw last night. The mlbparks deployment of mlbparks-1 is still running after 10 minutes. Perhaps we just didn't wait long enough, but that seems like a crazy long wait.

In any event, per the discussion between @csrwng, @bparees, and myself after scrum, I'm just going to comment out the orchestration pipeline for now and get enough consistency in the test run for this PR to get it merged.

Per our post-scrum discussion, let's use #12479 for @csrwng to spend some time curating this test, sort out these prolonged timing windows, and then re-intergrate.

I'll trim the debug a bit, comment out the test, and if we get some consistent success, re-ask for the merge.

gabemontero · 2017-01-25T18:11:45Z

OK the extended tests have passed mutliple times in a row now with the orchestration pipeline test commented out.

@bparees - please revisit the changes in the PR and let's redo the comments->merge loop.

thanks

bparees

couple questions, mostly looks good.

bparees · 2017-01-25T18:29:19Z

test/extended/builds/pipeline.go

@@ -13,36 +16,98 @@ import (
 	"github.com/openshift/origin/test/extended/util/jenkins"
 )

+func debugAnyJenkinsFailure(br *exutil.BuildResult, name string, oc *exutil.CLI, dumpMaster bool) {
+	if !br.BuildSuccess {


this seems slightly weird. Why not just remove the br argument and only call debugAnyJenkinsFailure when desired(namely when the build fails)?

The intent was to just code up the if once vs. coding up the if check in all the places I added calls to debugAnyJenkinsFailure. I can switch it up if you like.

I guess it's fine to leave it, but if we end up wanting to reuse this for debugging other failures, we're going to end up refactoring it.

bparees · 2017-01-25T18:37:03Z

test/extended/builds/pipeline.go

+
+			if os.Getenv(jenkins.DisableJenkinsMemoryStats) == "" {
+				g.By("start jenkins gc tracking")
+				ticker = jenkins.StartJenkinsGCTracking(oc, oc.Namespace())


why does this test only track GC (not memory)?

The memory tracking debug is very verbose. For the purposes of the debug I did the last few days in this pull, I simply needed to prove that the heap was NOT too small and that we were NOT under GC duress.

Hence, I made the choice in which debug you wanted a bit more granular and selectable.

should this check be based on a different env variable name then?

That is more in line with the whole granular motif I've been espousing. I'll make that change.

update pushed

bparees · 2017-01-25T18:37:45Z

test/extended/image_ecosystem/jenkins_plugin.go

-				}
-			}()
+		if os.Getenv(jenkins.DisableJenkinsMemoryStats) == "" {
+			ticker = jenkins.StartJenkinsMemoryTracking(oc, jenkinsNamespace)


and this test only tracks memory and not gc?

Correct, and the memory analysis here is really centered more on the native memory aspects of the JVM and not the heap aspects. If I recall correctly, in the original problem @jupierce chased down, the heap itself was not constrained. There was not a GC issue with that one. So yeah, again, I chose to make the debug tools more granular, and have applied only the ones that have so far been deemed necessary for each set of tests.

if this is very verbose as you say above, do we want it on by default?

Yeah, it is a question of confidence that we aren't hitting the native memory issue any more. If it proved to be very intermittent, and we didn't have this on when it happens again, that would be a bummer.

I'll defer to you and/or @jupierce on that.

fair enough, let's leave it on for now, if we get to 32bit JVM and are stable for a while, maybe we can turn it off then.

bparees · 2017-01-25T18:39:45Z

test/extended/builds/pipeline.go

+
+			o.Expect(err).NotTo(o.HaveOccurred())
+
+			if os.Getenv(jenkins.DisableJenkinsMemoryStats) == "" {


the assumption is a developer would set this locally when running extended tests? I assume it's not being set in our jenkins extended test runs today?

Correct and Correct. And in case it wasn't clear, we already merged in the use of this env var. I simply moved it from a private var in jenkins_plugin.go to a public var in monitor.go since it is being leveraged in different places now.

bparees · 2017-01-25T19:51:48Z

test/extended/builds/pipeline.go

+		}
+		cleanup = func() {
+			if os.Getenv(jenkins.DisableJenkinsGCSTats) == "" {
+				g.By("stop jenkins memory tracking")


s/memory/gc/

update pushed

bparees

one final nit and lgtm.

bparees · 2017-01-25T20:03:57Z

[merge]

openshift-bot · 2017-01-25T20:05:43Z

Evaluated for origin merge up to a6b94b4

openshift-bot · 2017-01-25T20:05:44Z

[Test]ing while waiting on the merge queue

openshift-bot · 2017-01-25T21:45:36Z

Evaluated for origin testextended up to a6b94b4

gabemontero · 2017-01-25T21:49:21Z

in the test run, test/cmd/status.sh failed; certainly unrelated to this extended test change

bparees · 2017-01-25T22:45:43Z

@gabemontero please tag flakes anyway just so we can help identify frequency and get attention on them.

in this case it was flake #12667
[test]

openshift-bot · 2017-01-25T22:53:56Z

Evaluated for origin test up to a6b94b4

openshift-bot · 2017-01-26T00:21:32Z

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13318/) (Base Commit: adc5ee3)

openshift-bot · 2017-01-26T03:05:40Z

continuous-integration/openshift-jenkins/testextended FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/1036/) (Base Commit: 59e57b1) (Extended Tests: core(openshift pipeline))

bparees · 2017-01-26T04:12:32Z

looks like the extended test got hung?

openshift-bot · 2017-01-26T06:13:40Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13327/) (Base Commit: 50300d1) (Image: devenv-rhel7_5783)

gabemontero · 2017-01-26T15:27:55Z

Yeah the blue-green test failed this time. I've seen that fail sometimes though not as frequently as the orchestration pipeline test we commented out. I suspect it falls under the same sort of flake category as the orchestration tests (long delay or problem in pods getting started). Perhaps @csrwng 's upcoming investigation / rework in the orchestration test will have some carry-over to blue-green. Certainly if the flakes start coming up more regularly in the overnight runs, temporarily disabling it is a consideration.

I do have a theory on the way the test ended. I ended up putting a g.GinkgoRecover() call in the go thread in that test, because of a go panic warning that arose if an assert happened on that thread. I suspect that call is interfering with the defer cleanup() I have on the main thread.

If that theory holds water with others here, I'll open a new pull for reworking the monitoring piece to get invoked from the go threads themselves.

gabemontero force-pushed the turnOnOAuthJenkinsExtTest branch from 5317c6f to c574a74 Compare January 11, 2017 16:27

gabemontero force-pushed the turnOnOAuthJenkinsExtTest branch from c574a74 to 2ec0037 Compare January 11, 2017 17:55

gabemontero mentioned this pull request Jan 11, 2017

Go panic on get to /.well-known/oauth-authorization-server #12453

Closed

gabemontero force-pushed the turnOnOAuthJenkinsExtTest branch from 2ec0037 to 977d989 Compare January 15, 2017 00:29

gabemontero force-pushed the turnOnOAuthJenkinsExtTest branch from 977d989 to 3eee154 Compare January 16, 2017 17:45

gabemontero force-pushed the turnOnOAuthJenkinsExtTest branch from 3eee154 to e6c8842 Compare January 17, 2017 02:13

gabemontero force-pushed the turnOnOAuthJenkinsExtTest branch from e6c8842 to 66ab7a3 Compare January 17, 2017 16:13

gabemontero force-pushed the turnOnOAuthJenkinsExtTest branch 3 times, most recently from c236737 to 60d2c20 Compare January 24, 2017 23:25

gabemontero force-pushed the turnOnOAuthJenkinsExtTest branch 3 times, most recently from db626aa to 711b54d Compare January 25, 2017 18:09

bparees reviewed Jan 25, 2017

View reviewed changes

gabemontero force-pushed the turnOnOAuthJenkinsExtTest branch from 711b54d to 70a726d Compare January 25, 2017 19:47

bparees reviewed Jan 25, 2017

View reviewed changes

bparees requested changes Jan 25, 2017

View reviewed changes

gabemontero force-pushed the turnOnOAuthJenkinsExtTest branch from 70a726d to 6582f26 Compare January 25, 2017 19:54

leave oauth on jenkins extended test, use token for http-level access

a6b94b4

gabemontero force-pushed the turnOnOAuthJenkinsExtTest branch from 6582f26 to a6b94b4 Compare January 25, 2017 19:56

gabemontero mentioned this pull request Jan 25, 2017

[tests][extended] openshift pipeline build #12479

Closed

openshift-bot merged commit fa592b8 into openshift:master Jan 26, 2017

gabemontero deleted the turnOnOAuthJenkinsExtTest branch January 26, 2017 15:18


		o.Expect(err).NotTo(o.HaveOccurred())

		if os.Getenv(jenkins.DisableJenkinsMemoryStats) == "" {

leave oauth on jenkins extended test, use token for http-level access #12440

leave oauth on jenkins extended test, use token for http-level access #12440

Conversation

gabemontero commented Jan 11, 2017

gabemontero commented Jan 11, 2017

bparees commented Jan 11, 2017

gabemontero commented Jan 11, 2017 via email

gabemontero commented Jan 11, 2017

gabemontero commented Jan 11, 2017 • edited Loading

gabemontero commented Jan 11, 2017

gabemontero commented Jan 11, 2017 • edited Loading

gabemontero commented Jan 11, 2017

gabemontero commented Jan 11, 2017

gabemontero commented Jan 11, 2017

liggitt commented Jan 11, 2017 • edited Loading

enj commented Jan 11, 2017 • edited Loading

gabemontero commented Jan 15, 2017

gabemontero commented Jan 15, 2017

bparees commented Jan 15, 2017 via email

gabemontero commented Jan 15, 2017

bparees commented Jan 15, 2017

gabemontero commented Jan 16, 2017

gabemontero commented Jan 16, 2017

gabemontero commented Jan 17, 2017

gabemontero commented Jan 17, 2017

gabemontero commented Jan 17, 2017

csrwng commented Jan 17, 2017

gabemontero commented Jan 25, 2017

gabemontero commented Jan 25, 2017

bparees left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bparees left a comment

Choose a reason for hiding this comment

bparees commented Jan 25, 2017

openshift-bot commented Jan 25, 2017

openshift-bot commented Jan 25, 2017

openshift-bot commented Jan 25, 2017

gabemontero commented Jan 25, 2017

bparees commented Jan 25, 2017

openshift-bot commented Jan 25, 2017

openshift-bot commented Jan 26, 2017

openshift-bot commented Jan 26, 2017

bparees commented Jan 26, 2017

openshift-bot commented Jan 26, 2017 • edited Loading

gabemontero commented Jan 26, 2017

gabemontero commented Jan 11, 2017 •

edited

Loading

gabemontero commented Jan 11, 2017 •

edited

Loading

liggitt commented Jan 11, 2017 •

edited

Loading

enj commented Jan 11, 2017 •

edited

Loading

openshift-bot commented Jan 26, 2017 •

edited

Loading