Add extended test for pipeline build #11130

wanghaoran1988 · 2016-09-28T02:19:24Z

wanghaoran1988 · 2016-09-28T02:20:59Z

@bparees Please help have a look when you can, thanks.

gabemontero · 2016-09-28T14:51:10Z

Generally speaking, LGTM. The various checks, etc. look in line with what we do other tests. Good use of our exutil helpers.

@bparees - I toyed with the idea of suggesting that some sort of examination of the Jenkins job log be done (like we do in our e), but as the deployment endpoint being available is the culmination of activities in the sample pipeline job, so its readiness is sufficient indication that things went well, and examining the job log would be redundant.

please post the merge comment @bparees at your convenience - thanks.

bparees · 2016-09-28T14:54:33Z

[testextended][extended:core(openshift pipeline build)]

bparees · 2016-09-28T17:13:24Z

@wanghaoran1988 the new test appears to be failing.

gabemontero · 2016-09-28T17:47:15Z

So I looked at the failed extended test run.
The sample-pipeline build did not complete successfully.
The debug that is dumped shows that the attempt to start the build occurred, but the build timed out after 10 seconds.
At first blush, it would seem to me the test worked "ok" and it uncovered minimally an environmental issue (perhaps 10 seconds is not long enough in our PR testing env), or a problem with the jenkins file strategy.
@bparees any thoughts / corrections?

gabemontero · 2016-09-28T17:49:05Z

Some snippets from the console log:

Waiting for build/sample-pipeline-1 to complete
Done waiting for build/sample-pipeline-1: util.BuildResult{BuildPath:"build/sample-pipeline-1", StartBuildStdErr:"", StartBuildStdOut:"build/sample-pipeline-1", StartBuildErr:error(nil), BuildConfigName:"", Build:(*api.Build)(0xc820e1e000), BuildAttempt:true, BuildSuccess:false, BuildFailure:false, BuildTimeout:true, oc:(*util.CLI)(0xc8202723c0)}

and

Sep 28 12:58:41.582: INFO: Error running &{/data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/oc [oc logs --namespace=extended-test-jenkins-pipeline-r2be4-x0986 --config=/tmp/openshift/extended-test-jenkins-pipeline-r2be4-x0986-user.kubeconfig -f build/sample-pipeline-1] []   Error from server: Timeout: timed out waiting for build sample-pipeline-1 to start after 10s
 Error from server: Timeout: timed out waiting for build sample-pipeline-1 to start after 10s
 [] <nil> 0xc820f2f7e0 exit status 1 <nil> true [0xc8201500f8 0xc820150178 0xc820150178] [0xc8201500f8 0xc820150178] [0xc820150120 0xc820150160] [0xaf7f70 0xaf80d0] 0xc821025200}:
Error from server: Timeout: timed out waiting for build sample-pipeline-1 to start after 10s
Error during log retrieval: Error retieving logs for util.BuildResult{BuildPath:"build/sample-pipeline-1", StartBuildStdErr:"", StartBuildStdOut:"build/sample-pipeline-1", StartBuildErr:error(nil), BuildConfigName:"", Build:(*api.Build)(0xc820e1e000), BuildAttempt:true, BuildSuccess:false, BuildFailure:false, BuildTimeout:true, oc:(*util.CLI)(0xc8202723c0)}: exit status 1

bparees · 2016-09-28T17:58:50Z

we wait an hour for a build to complete.

the 10s timeout you see is from the code that tries to dump the build logs after we've decided the build has failed/timedout. and that's always going to happen because pipeline builds don't have logs to dump, so it will always timeout waiting for those logs. But the real issue is why the build failed/timedout in the first place.

gabemontero · 2016-09-28T18:16:02Z

Ah - gotcha (10 s piece).

As to the real issue, I suppose it is complicated by the fact the jenkins build strategy does not generate build logs in the classic sense.

Any suggestions on what debug mechanism should be added to the extended test exutil bag of tools to capture what is needed (unless the key data is there in this console already and I'm just missing it).

bparees · 2016-09-28T18:18:01Z

@gabemontero yeah we should locate the jenkins pod and dump its logs, at least. that won't get us the job logs, but at least it will tell us if something went wrong inside jenkins itself.

wanghaoran1988 · 2016-09-30T02:04:01Z

`** Build Description:
Name: sample-pipeline-1
Namespace: extended-test-jenkins-pipeline-r2be4-x0986
Created: About an hour ago
Labels: app=jenkins-pipeline-example
buildconfig=sample-pipeline
name=sample-pipeline
openshift.io/build-config.name=sample-pipeline
openshift.io/build.start-policy=Serial
template=application-template-sample-pipeline
Annotations: openshift.io/build-config.name=sample-pipeline
openshift.io/build.number=1

Status: New
Duration: waiting for 1h0m2s
Build Config: sample-pipeline
Build Pod: sample-pipeline-1-build
`
By the log we can see, the "sample-pipeline-1-build" build never starts, and keeps "New" within the one hour, no idea why the build never starts, @gabemontero , could you please help have investigate this ?

bparees · 2016-09-30T03:20:37Z

@wanghaoran1988 start by looking at the jenkins pod logs.

if you walk through the steps from the test manually on a cluster, does it work?

if the build never starts that means the sync plugin in the jenkins pod is not working correctly (or the jenkins pod itself is not running properly at all)

wanghaoran1988 · 2016-09-30T05:46:21Z

@bparees , It works when I manually run the steps, and I run it again with devenv-rhel7_5101 on the AWS, it passed.

bparees · 2016-09-30T12:54:11Z

i'm going to rerun it in this PR, but if it fails again we're going to have to dig deeper into the test.

gabemontero · 2016-09-30T14:16:03Z

test/extended/builds/pipeline.go

+			br.AssertSuccess()
+
+			g.By("expecting the frontend service get endpoints")
+			err = oc.KubeFramework().WaitForAnEndpoint("frontend")


@wanghaoran1988 so the latest test run past, which is good, but given the hiccup we saw, I would suggest adding some debug here in case this test catches the issue again after we merge.

Before calling o.Expect(err).NotTo(o.HaveOccurred()), first test if err is not nil and call one of the debug facilities to "locate the jenkins pod and dump its logs" as @bparees noted earlier. The code would look like this:

if err ! nil { exutil.DumpDeploymentLogs("jenkins", oc) }

There are means to dump the contents of the jenkins job logs but are much more complicated, and based on the details of the hiccup, I'm not even sure it got started. If need be I can put it in at a later date.

Put the minimal debug I've outlined (where of course incorporate any comments @bparees might have) and then we can merge this PR and go from there.

thanks!!

oh, still leave the o.Expect(err).NotTo(o.HaveOccurred()) line in after the if block I outlined above.

wanghaoran1988 · 2016-10-02T04:05:27Z

@gabemontero Dump the jenkins log after deploy jenkins failed and build failed.

wanghaoran1988 · 2016-10-02T16:16:38Z

The build keeps status "New" again, and this is the error log from jenkins pod:

INFO: Updated job sample-pipeline from BuildConfig NamespaceName{extended-test-jenkins-pipeline-fuw6c-i0j6q:sample-pipeline} with revision: 766
java.io.IOException: closed
at okhttp3.internal.ws.WebSocketWriter.writeControlFrameSynchronized(WebSocketWriter.java:119)
at okhttp3.internal.ws.WebSocketWriter.writeClose(WebSocketWriter.java:111)
at okhttp3.internal.ws.RealWebSocket.close(RealWebSocket.java:168)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onClose(WatchConnectionManager.java:229)
at okhttp3.internal.ws.RealWebSocket.peerClose(RealWebSocket.java:197)
at okhttp3.internal.ws.RealWebSocket.access$200(RealWebSocket.java:38)
at okhttp3.internal.ws.RealWebSocket$1$2.execute(RealWebSocket.java:84)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.io.IOException: closed
at okhttp3.internal.ws.WebSocketWriter.writeControlFrameSynchronized(WebSocketWriter.java:119)
at okhttp3.internal.ws.WebSocketWriter.writeClose(WebSocketWriter.java:111)
at okhttp3.internal.ws.RealWebSocket.close(RealWebSocket.java:168)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onClose(WatchConnectionManager.java:229)
at okhttp3.internal.ws.RealWebSocket.peerClose(RealWebSocket.java:197)
at okhttp3.internal.ws.RealWebSocket.access$200(RealWebSocket.java:38)
at okhttp3.internal.ws.RealWebSocket$1$2.execute(RealWebSocket.java:84)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.io.IOException: closed
at okhttp3.internal.ws.WebSocketWriter.writeControlFrameSynchronized(WebSocketWriter.java:119)
at okhttp3.internal.ws.WebSocketWriter.writeClose(WebSocketWriter.java:111)
at okhttp3.internal.ws.RealWebSocket.close(RealWebSocket.java:168)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onClose(WatchConnectionManager.java:229)
at okhttp3.internal.ws.RealWebSocket.peerClose(RealWebSocket.java:197)
at okhttp3.internal.ws.RealWebSocket.access$200(RealWebSocket.java:38)
at okhttp3.internal.ws.RealWebSocket$1$2.execute(RealWebSocket.java:84)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

gabemontero · 2016-10-03T20:31:25Z

@wanghaoran1988 thanks for the debug. The stack trace you posted is in the openshift-sync plugin.

@jimmidyson - could you provide some insight here. The context is that we are running the sample-pipleline job from the origin extended tests, and we intermittently see the test fail where the openshift side build hangs in New state. When debug was added to dump the jenkins pod, the associated jenkins job's output had a series of Java stack traces.

See @wanghaoran1988 's prior comment.

Is this indicative of an environmental type issue wrt the watches the sync plugin employs? Or is this an error which should be handled, the operation retried, etc?

thanks.

jimmidyson · 2016-10-03T20:44:17Z

Not sure if Jenkins has had the sync plugin upgraded to 0.0.13 yet? Could someone check?

gabemontero · 2016-10-03T20:50:16Z

Not yet, but the move to 0.0.13 is in progress. The RPM has been built. I'm waiting on a couple of other items and then will be submitting a pull that should start the image updates.

Does this look like one of the known items fixed with 0.0.13 ?

jimmidyson · 2016-10-03T20:58:45Z

There are some fixes to reconnection that could be this, but I've not seen this exact issue until now.

gabemontero · 2016-10-04T00:25:55Z

OK - we'll still wait until the image gets 0.0.13 and see. Though not
consistent it has happened with enough frequency that we should be able to
see if it makes a difference. And of course we've raised your awareness.

On Monday, October 3, 2016, Jimmi Dyson notifications@github.com wrote:

There are some fixes to reconnection that could be this, but I've not seen
this exact issue until now.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11130 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADbadLXW8HGNj1wK-CVfoRCGyk92grxPks5qwWyOgaJpZM4KIWTO
.

gabemontero · 2016-10-04T00:28:00Z

Oh and with the debug in now we could in theory merge this testcase. Let's
see what @bparees says next time he checks email.

On Monday, October 3, 2016, Gabe Montero gmontero@redhat.com wrote:

OK - we'll still wait until the image gets 0.0.13 and see. Though not
consistent it has happened with enough frequency that we should be able to
see if it makes a difference. And of course we've raised your awareness.

On Monday, October 3, 2016, Jimmi Dyson <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

There are some fixes to reconnection that could be this, but I've not
seen this exact issue until now.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11130 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADbadLXW8HGNj1wK-CVfoRCGyk92grxPks5qwWyOgaJpZM4KIWTO
.

bparees · 2016-10-04T00:32:57Z

don't merge broken tests. is what i say :)

we have enough flakes.

gabemontero · 2016-10-04T00:35:35Z

Rgoer that :-)

On Monday, October 3, 2016, Ben Parees notifications@github.com wrote:

don't merge broken tests. is what i say :)

we have enough flakes.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11130 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADbadN37GKk7yxHq-AJGkCrWcbK-JXyLks5qwZ7CgaJpZM4KIWTO
.

gabemontero · 2016-10-13T17:35:32Z

Went back and looked at the console output.

The slave pod was listed as running.

However, since the name is maven-2ec09ee38ad, it was not caught by our dump of the pod logs on the error.

@wanghaoran1988 - could you add a exutil.DumpDeploymentLogs("maven", oc) after the exutil.DumpDeploymentLogs("jenkins", oc) call.

We'll try a few more runs and see if we catch either these new flakes.

@bparees - fyi, based on how this goes, I'm leaning toward NOT moving https://github.com/openshift/origin/blob/master/test/extended/jenkins/kubernetes_plugin.go under image_ecosystem when I move https://github.com/openshift/origin/blob/master/test/extended/jenkins/plugin.go

bparees · 2016-10-14T18:43:13Z

can we update this pipeline test to not use a slave image? then we don't need to worry about the kubernetes plugin flake issues.

I think we still want a test that does use the slave launcher, but it can be a separate test. @wanghaoran1988 @gabemontero what do you think?

gabemontero · 2016-10-14T18:49:21Z

On Fri, Oct 14, 2016 at 2:43 PM, Ben Parees notifications@github.com
wrote:

can we update this pipeline test to not use a slave image? then we
don't need to worry about the kubernetes plugin flake issues.

I think we still want a test that does use the slave launcher, but it
can be a separate test. @wanghaoran1988
https://github.com/wanghaoran1988 @gabemontero
https://github.com/gabemontero what do you think?

I'm good with that.

And Michal did create a slave launcher test (the kubernetes_plugin.go test
I referenced earlier in this PR), though it creates a slave image vs. using
one of the predefined ones (re: the issue you assigned me regarding the
master-slave example) .
Like the plugin test, I'm betting it is not getting invoked consistently by
any of the existing ci.openshift jobs.

—

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11130 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADbadMaL_YlY_Ym1o60Wi1NLdrS0W-Jmks5qz81LgaJpZM4KIWTO
.

wanghaoran1988 · 2016-10-18T02:01:50Z

@bparees @gabemontero test updated, add a new template with jenkinsfile not use the maven node

gabemontero · 2016-10-18T14:27:50Z

test/extended/builds/pipeline.go

+			g.By("starting a pipeline build")
+			br, _ := exutil.StartBuildAndWait(oc, "sample-pipeline")
+			if !br.BuildSuccess {
+				exutil.DumpDeploymentLogs("jenkins", oc)


with you switch to ruby, I would add a

exutil.DumpDeploymentLogs(oc, "ruby")

In case we get slave errors there

nevermind - duh, you moved off of slave images per earlier comment from @bparees

@gabemontero Yes, I removed the slave images

gabemontero · 2016-10-18T14:29:03Z

running extended tests a few times ... see if flakes emerge. Also added 1 new review comment.

[testextended][extended:core(openshift pipeline build)]

gabemontero · 2016-10-18T15:33:23Z

OK one successful run.

Second run:

[testextended][extended:core(openshift pipeline build)]

gabemontero · 2016-10-18T16:55:28Z

Second run successful.

Third:

[testextended][extended:core(openshift pipeline build)]

gabemontero · 2016-10-18T17:54:40Z

Third successful run.

Fourth:

[testextended][extended:core(openshift pipeline build)]

gabemontero · 2016-10-18T19:30:51Z

Fourth run successful.

Fifth:
[testextended][extended:core(openshift pipeline build)]

gabemontero · 2016-10-18T20:37:11Z

@bparees that is 5 successful runs in a row; with the sync plugin update and moving off of slave images, I think this PR is good to go

bparees

one spelling nit and please squash the commits and then i'll merge.

bparees · 2016-10-18T22:49:47Z

test/extended/builds/pipeline.go

+		err := exutil.WaitForBuilderAccount(oc.KubeREST().ServiceAccounts(oc.Namespace()))
+		o.Expect(err).NotTo(o.HaveOccurred())
+	})
+	g.Context("Manual deploy the jenkins and triger a jenkins pipeline build", func() {


triger->trigger

Sorry for the typo, updated

bparees · 2016-10-19T12:35:13Z

@wanghaoran1988 i still need you to squash your commits.

wanghaoran1988 · 2016-10-20T03:11:16Z

@bparees squashed

bparees · 2016-10-20T03:13:31Z

[merge]

openshift-bot · 2016-10-20T03:13:34Z

Evaluated for origin testextended up to 02bc722

openshift-bot · 2016-10-20T03:21:33Z

Evaluated for origin merge up to 02bc722

openshift-bot · 2016-10-20T03:21:34Z

[Test]ing while waiting on the merge queue

openshift-bot · 2016-10-20T03:29:35Z

Evaluated for origin test up to 02bc722

openshift-bot · 2016-10-20T04:13:36Z

continuous-integration/openshift-jenkins/testextended SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/630/) (Base Commit: 44fd91b) (Extended Tests: core(openshift pipeline build))

openshift-bot · 2016-10-20T05:05:35Z

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10287/) (Base Commit: 44fd91b)

openshift-bot · 2016-10-21T07:53:34Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10380/) (Base Commit: c94f61a) (Image: devenv-rhel7_5214)

bparees assigned gabemontero Sep 28, 2016

danmcp added priority/P2 component/build labels Sep 28, 2016

bparees self-assigned this Sep 28, 2016

gabemontero reviewed Sep 30, 2016

View reviewed changes

wanghaoran1988 force-pushed the test_pipeline branch from 45dd1ef to 9ac8ddb Compare October 2, 2016 04:03

wanghaoran1988 force-pushed the test_pipeline branch from 9ac8ddb to 8c9d536 Compare October 2, 2016 08:50

bparees changed the title ~~Add extened test for pipeline build~~ Add extended test for pipeline build Oct 10, 2016

wanghaoran1988 force-pushed the test_pipeline branch 2 times, most recently from 62c2e1c to 3fab1b0 Compare October 18, 2016 01:01

gabemontero reviewed Oct 18, 2016

View reviewed changes

bparees requested changes Oct 18, 2016

View reviewed changes

wanghaoran1988 force-pushed the test_pipeline branch from 3fab1b0 to 31e6475 Compare October 19, 2016 03:14

add extened test for pipeline build

02bc722

wanghaoran1988 force-pushed the test_pipeline branch from 31e6475 to 02bc722 Compare October 20, 2016 03:09

openshift-bot merged commit 87f1f55 into openshift:master Oct 21, 2016

wanghaoran1988 deleted the test_pipeline branch November 17, 2016 02:22

Add extended test for pipeline build #11130

Add extended test for pipeline build #11130

Conversation

wanghaoran1988 commented Sep 28, 2016

wanghaoran1988 commented Sep 28, 2016

gabemontero commented Sep 28, 2016

bparees commented Sep 28, 2016

bparees commented Sep 28, 2016

gabemontero commented Sep 28, 2016 • edited

gabemontero commented Sep 28, 2016

bparees commented Sep 28, 2016

gabemontero commented Sep 28, 2016

bparees commented Sep 28, 2016

wanghaoran1988 commented Sep 30, 2016 • edited

bparees commented Sep 30, 2016

wanghaoran1988 commented Sep 30, 2016

bparees commented Sep 30, 2016

gabemontero Sep 30, 2016

Choose a reason for hiding this comment

gabemontero Sep 30, 2016

Choose a reason for hiding this comment

wanghaoran1988 commented Oct 2, 2016

wanghaoran1988 commented Oct 2, 2016 • edited

gabemontero commented Oct 3, 2016

jimmidyson commented Oct 3, 2016

gabemontero commented Oct 3, 2016

jimmidyson commented Oct 3, 2016

gabemontero commented Oct 4, 2016

gabemontero commented Oct 4, 2016

bparees commented Oct 4, 2016

gabemontero commented Oct 4, 2016

gabemontero commented Oct 13, 2016

bparees commented Oct 14, 2016

gabemontero commented Oct 14, 2016

wanghaoran1988 commented Oct 18, 2016

gabemontero Oct 18, 2016

Choose a reason for hiding this comment

gabemontero Oct 18, 2016

Choose a reason for hiding this comment

wanghaoran1988 Oct 19, 2016

Choose a reason for hiding this comment

gabemontero commented Oct 18, 2016

gabemontero commented Oct 18, 2016

gabemontero commented Oct 18, 2016

gabemontero commented Oct 18, 2016

gabemontero commented Oct 18, 2016

gabemontero commented Oct 18, 2016

bparees left a comment

Choose a reason for hiding this comment

bparees Oct 18, 2016

Choose a reason for hiding this comment

wanghaoran1988 Oct 19, 2016

Choose a reason for hiding this comment

bparees commented Oct 19, 2016

wanghaoran1988 commented Oct 20, 2016

bparees commented Oct 20, 2016

openshift-bot commented Oct 20, 2016

openshift-bot commented Oct 20, 2016

openshift-bot commented Oct 20, 2016

openshift-bot commented Oct 20, 2016

openshift-bot commented Oct 20, 2016

openshift-bot commented Oct 20, 2016

openshift-bot commented Oct 21, 2016 • edited

gabemontero commented Sep 28, 2016 •

edited

wanghaoran1988 commented Sep 30, 2016 •

edited

wanghaoran1988 commented Oct 2, 2016 •

edited

openshift-bot commented Oct 21, 2016 •

edited