New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add extended test for pipeline build #11130
Conversation
@bparees Please help have a look when you can, thanks. |
Generally speaking, LGTM. The various checks, etc. look in line with what we do other tests. Good use of our exutil helpers. @bparees - I toyed with the idea of suggesting that some sort of examination of the Jenkins job log be done (like we do in our e), but as the deployment endpoint being available is the culmination of activities in the sample pipeline job, so its readiness is sufficient indication that things went well, and examining the job log would be redundant. please post the merge comment @bparees at your convenience - thanks. |
[testextended][extended:core(openshift pipeline build)] |
@wanghaoran1988 the new test appears to be failing. |
So I looked at the failed extended test run. |
Some snippets from the console log:
and
|
we wait an hour for a build to complete. the 10s timeout you see is from the code that tries to dump the build logs after we've decided the build has failed/timedout. and that's always going to happen because pipeline builds don't have logs to dump, so it will always timeout waiting for those logs. But the real issue is why the build failed/timedout in the first place. |
Ah - gotcha (10 s piece). As to the real issue, I suppose it is complicated by the fact the jenkins build strategy does not generate build logs in the classic sense. Any suggestions on what debug mechanism should be added to the extended test exutil bag of tools to capture what is needed (unless the key data is there in this console already and I'm just missing it). |
@gabemontero yeah we should locate the jenkins pod and dump its logs, at least. that won't get us the job logs, but at least it will tell us if something went wrong inside jenkins itself. |
`** Build Description: Status: New |
@wanghaoran1988 start by looking at the jenkins pod logs. if you walk through the steps from the test manually on a cluster, does it work? if the build never starts that means the sync plugin in the jenkins pod is not working correctly (or the jenkins pod itself is not running properly at all) |
@bparees , It works when I manually run the steps, and I run it again with devenv-rhel7_5101 on the AWS, it passed. |
i'm going to rerun it in this PR, but if it fails again we're going to have to dig deeper into the test. |
br.AssertSuccess() | ||
|
||
g.By("expecting the frontend service get endpoints") | ||
err = oc.KubeFramework().WaitForAnEndpoint("frontend") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wanghaoran1988 so the latest test run past, which is good, but given the hiccup we saw, I would suggest adding some debug here in case this test catches the issue again after we merge.
Before calling o.Expect(err).NotTo(o.HaveOccurred())
, first test if err is not nil and call one of the debug facilities to "locate the jenkins pod and dump its logs" as @bparees noted earlier. The code would look like this:
if err ! nil {
exutil.DumpDeploymentLogs("jenkins", oc)
}
There are means to dump the contents of the jenkins job logs but are much more complicated, and based on the details of the hiccup, I'm not even sure it got started. If need be I can put it in at a later date.
Put the minimal debug I've outlined (where of course incorporate any comments @bparees might have) and then we can merge this PR and go from there.
thanks!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, still leave the o.Expect(err).NotTo(o.HaveOccurred())
line in after the if block I outlined above.
45dd1ef
to
9ac8ddb
Compare
@gabemontero Dump the jenkins log after deploy jenkins failed and build failed. |
9ac8ddb
to
8c9d536
Compare
The build keeps status "New" again, and this is the error log from jenkins pod: INFO: Updated job sample-pipeline from BuildConfig NamespaceName{extended-test-jenkins-pipeline-fuw6c-i0j6q:sample-pipeline} with revision: 766 |
@wanghaoran1988 thanks for the debug. The stack trace you posted is in the openshift-sync plugin. @jimmidyson - could you provide some insight here. The context is that we are running the sample-pipleline job from the origin extended tests, and we intermittently see the test fail where the openshift side build hangs in New state. When debug was added to dump the jenkins pod, the associated jenkins job's output had a series of Java stack traces. See @wanghaoran1988 's prior comment. Is this indicative of an environmental type issue wrt the watches the sync plugin employs? Or is this an error which should be handled, the operation retried, etc? thanks. |
Not sure if Jenkins has had the sync plugin upgraded to 0.0.13 yet? Could someone check? |
Not yet, but the move to 0.0.13 is in progress. The RPM has been built. I'm waiting on a couple of other items and then will be submitting a pull that should start the image updates. Does this look like one of the known items fixed with 0.0.13 ? |
There are some fixes to reconnection that could be this, but I've not seen this exact issue until now. |
OK - we'll still wait until the image gets 0.0.13 and see. Though not On Monday, October 3, 2016, Jimmi Dyson notifications@github.com wrote:
|
Oh and with the debug in now we could in theory merge this testcase. Let's On Monday, October 3, 2016, Gabe Montero gmontero@redhat.com wrote:
|
don't merge broken tests. is what i say :) we have enough flakes. |
Rgoer that :-) On Monday, October 3, 2016, Ben Parees notifications@github.com wrote:
|
Went back and looked at the console output. The slave pod was listed as running. However, since the name is maven-2ec09ee38ad, it was not caught by our dump of the pod logs on the error. @wanghaoran1988 - could you add a We'll try a few more runs and see if we catch either these new flakes. @bparees - fyi, based on how this goes, I'm leaning toward NOT moving https://github.com/openshift/origin/blob/master/test/extended/jenkins/kubernetes_plugin.go under |
can we update this pipeline test to not use a slave image? then we don't need to worry about the kubernetes plugin flake issues. I think we still want a test that does use the slave launcher, but it can be a separate test. @wanghaoran1988 @gabemontero what do you think? |
On Fri, Oct 14, 2016 at 2:43 PM, Ben Parees notifications@github.com
And Michal did create a slave launcher test (the kubernetes_plugin.go test —
|
62c2e1c
to
3fab1b0
Compare
@bparees @gabemontero test updated, add a new template with jenkinsfile not use the maven node |
g.By("starting a pipeline build") | ||
br, _ := exutil.StartBuildAndWait(oc, "sample-pipeline") | ||
if !br.BuildSuccess { | ||
exutil.DumpDeploymentLogs("jenkins", oc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with you switch to ruby, I would add a
exutil.DumpDeploymentLogs(oc, "ruby")
In case we get slave errors there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind - duh, you moved off of slave images per earlier comment from @bparees
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gabemontero Yes, I removed the slave images
running extended tests a few times ... see if flakes emerge. Also added 1 new review comment. [testextended][extended:core(openshift pipeline build)] |
OK one successful run. Second run: [testextended][extended:core(openshift pipeline build)] |
Second run successful. Third: [testextended][extended:core(openshift pipeline build)] |
Third successful run. Fourth: [testextended][extended:core(openshift pipeline build)] |
Fourth run successful. Fifth: |
@bparees that is 5 successful runs in a row; with the sync plugin update and moving off of slave images, I think this PR is good to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one spelling nit and please squash the commits and then i'll merge.
err := exutil.WaitForBuilderAccount(oc.KubeREST().ServiceAccounts(oc.Namespace())) | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
}) | ||
g.Context("Manual deploy the jenkins and triger a jenkins pipeline build", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
triger->trigger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the typo, updated
3fab1b0
to
31e6475
Compare
@wanghaoran1988 i still need you to squash your commits. |
31e6475
to
02bc722
Compare
@bparees squashed |
[merge] |
Evaluated for origin testextended up to 02bc722 |
Evaluated for origin merge up to 02bc722 |
[Test]ing while waiting on the merge queue |
Evaluated for origin test up to 02bc722 |
continuous-integration/openshift-jenkins/testextended SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/630/) (Base Commit: 44fd91b) (Extended Tests: core(openshift pipeline build)) |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10287/) (Base Commit: 44fd91b) |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10380/) (Base Commit: c94f61a) (Image: devenv-rhel7_5214) |
trello card
Pass Log