Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjust prometheus ext test existence check to query StatefulSets #16984

Merged

Conversation

gabemontero
Copy link
Contributor

Fixes #16957

@openshift/devex ptal

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 20, 2017
@gabemontero
Copy link
Contributor Author

/unassign @stevekuznetsov @dcbw
/assign @bparees @smarterclayton

@bparees
Copy link
Contributor

bparees commented Oct 20, 2017

why didn't whatever broken this, break the conformance tests?

@bparees bparees added the kind/bug Categorizes issue or PR as related to a bug. label Oct 20, 2017
@bparees
Copy link
Contributor

bparees commented Oct 20, 2017

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 20, 2017
@openshift-merge-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, gabemontero

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 20, 2017
@gabemontero
Copy link
Contributor Author

prometheus_builds.go does not have the conformance label, only the build label (there was a concern about the build test adding to the time spent)

@bparees
Copy link
Contributor

bparees commented Oct 20, 2017

prometheus_builds.go does not have the conformance label, only the build label (there was a concern about the build test adding to the time spent)

i know, but prometheus.go is also dependent on this behavior and it is conformance.

I see your explanation in the issue that it would only break when both tests run, but i haven't been able to convince myself we would run both tests in any of our CI jobs. (certainly the extended builds test job is the only one that possibly could run both, and it should not be running the base prometheus conformance run)

@gabemontero
Copy link
Contributor Author

I was curious about all that as well ... but unless there is a third test I'm unaware of that is attempting to instantiate the prometheus template, I can think of no other possibility.

Certainly the existence test needed to switch from deployments to statefulsets regardless, and muddies the waters if something else is in fact going on.

We'll just have to see what happens after this merges and go from there.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@smarterclayton
Copy link
Contributor

/retest

1 similar comment
@0xmichalis
Copy link
Contributor

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@gabemontero
Copy link
Contributor Author

On the conformance_install_update run at https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended_conformance_install_update/8386/console, the extended tests seem to have actually passed:

Ran 6 of 806 Specs in 173.172 seconds
SUCCESS! -- 6 Passed | 0 Failed | 0 Pending | 800 Skipped Oct 22 06:27:17.721: INFO: Error running cluster/log-dump/log-dump.sh: fork/exec /data/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/cluster/log-dump/log-dump.sh: no such file or directory
PASS

Ginkgo ran 1 suite in 2m53.806167552s
Test Suite Passed
[INFO] [CLEANUP] Beginning cleanup routines...
[INFO] [CLEANUP] Dumping cluster events to _output/scripts/conformance/artifacts/events.txt
Logged into "https://ip-172-18-9-85.ec2.internal:8443" as "system:admin" using existing credentials.

You have access to the following projects and can switch between them with 'oc project <projectname>':

  * default
    kube-public
    kube-system
    logging
    management-infra
    openshift
    openshift-infra
    openshift-node

Using project "default".
[INFO] [CLEANUP] Dumping container logs to _output/scripts/conformance/logs/containers
[INFO] [CLEANUP] Truncating log files over 200M
[INFO] [CLEANUP] Stopping docker containers
[INFO] [CLEANUP] Removing docker containers
Error response from daemon: You cannot remove a running container aed8d1b798dd6974f65bbab6a251c04418547b9176a4164813017be0ffcb403e. Stop the container before attempting removal or use -f
Error response from daemon: You cannot remove a running container 415319916f8aa20bcaa4c70fe14cf831c83d46cba3c8d330b1bcad385474a816. Stop the container before attempting removal or use -f
Error response from daemon: You cannot remove a running container 732266eba0c0c329f64e2b4d00bbff25e5bd66e11d2a6d2b41019b10ec8c8d41. Stop the container before attempting removal or use -f
Error response from daemon: You cannot remove a running container 0bf71f0aebd735577ac7fe42ef4f9c4693e20f0580cb2b7dca7a7181f9f2b954. Stop the container before attempting removal or use -f
Error response from daemon: You cannot remove a running container 3f89e3023b71076d77f676ace2606e725a550595d072e9158f50f9524de65630. Stop the container before attempting removal or use -f
Error response from daemon: You cannot remove a running container 5f8b64b707a649769b4b55c9a47d1c24b0533d8eae7aeb0b8cc3e14cdc28f45e. Stop the container before attempting removal or use -f
Error response from daemon: You cannot remove a running container 8fc784324164ab44fe1b64a73920c163fa705f65ce35d14802f3e280aa694a3e. Stop the container before attempting removal or use -f
Error response from daemon: You cannot remove a running container c4314bdfe765882aff91e922801db27ed9b19f2bbda7934e7176b4adbb173134. Stop the container before attempting removal or use -f
Error response from daemon: You cannot remove a running container 2745d7d93cce7375ac771abe2cdd2b6cab5f05b224f1381c84483fb13389903d. Stop the container before attempting removal or use -f
Error response from daemon: You cannot remove a running container 7d051edc641ed8657698bd211b954975b89bfd4c93a26b43c74124a3d6751566. Stop the container before attempting removal or use -f
Error response from daemon: You cannot remove a running container cec8395a17620c17c121cc653fa53176cc8d60a3e70fc84209bb0f4f841f9835. Stop the container before attempting removal or use -f
[INFO] [CLEANUP] Killing child processes
[INFO] [CLEANUP] Pruning etcd data directory
rm: cannot remove ‘/tmp/etcd/openshift-backup-post-3.0-20171022054520’: Permission denied
rm: cannot remove ‘/tmp/etcd/openshift-backup-pre-upgrade-20171022054456’: Permission denied
rm: cannot remove ‘/tmp/etcd/member’: Permission denied
[INFO] test/extended/conformance.sh exited with code 0 after 00h 38m 47s
+ set +o xtrace
########## FINISHED STAGE: SUCCESS: RUN EXTENDED TESTS [00h 39m 37s] ##########

But the download artifacts phase is buggered:

FATAL: Unable to produce a script file
java.io.IOException: No space left on device
	at java.io.UnixFileSystem.createFileExclusively(Native Method)
	at java.io.File.createTempFile(File.java:2024)
	at hudson.FilePath$17.invoke(FilePath.java:1374)
Caused: java.io.IOException: Failed to create a temporary directory in /tmp
	at hudson.FilePath$17.invoke(FilePath.java:1376)
	at hudson.FilePath$17.invoke(FilePath.java:1364)
	at hudson.FilePath.act(FilePath.java:998)
	at hudson.FilePath.act(FilePath.java:976)
	at hudson.FilePath.createTextTempFile(FilePath.java:1364)
Caused: java.io.IOException: Failed to create a temp file on /var/lib/jenkins/jobs/test_pull_request_origin_extended_conformance_install_update/workspace
	at hudson.FilePath.createTextTempFile(FilePath.java:1387)
	at hudson.tasks.CommandInterpreter.createScriptFile(CommandInterpreter.java:162)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:94)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
	at org.jenkinsci.plugins.postbuildscript.PostBuildScript.processBuildSteps(PostBuildScript.java:204)
	at org.jenkinsci.plugins.postbuildscript.PostBuildScript.processScripts(PostBuildScript.java:143)
	at org.jenkinsci.plugins.postbuildscript.PostBuildScript._perform(PostBuildScript.java:105)
	at org.jenkinsci.plugins.postbuildscript.PostBuildScript.perform(PostBuildScript.java:85)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735)
	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:676)
	at hudson.model.Build$BuildExecution.post2(Build.java:186)
	at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:621)
	at hudson.model.Run.execute(Run.java:1760)
	at com.tikal.jenkins.plugins.multijob.MultiJobBuild.run(MultiJobBuild.java:73)
	at hudson.model.ResourceController.execute(ResourceController.java:97)
	at hudson.model.Executor.run(Executor.java:405)
Build step 'Execute a set of scripts' changed build result to FAILURE
Build step 'Execute a set of scripts' marked build as failure
Archiving artifacts
ERROR: Failed to archive artifacts: .config/origin-ci-tool/logs/junit/*.xml,artifacts/**/*.xml
java.io.IOException: Failed to create a directory at /var/lib/jenkins/jobs/test_pull_request_origin_extended_conformance_install_update/builds/8386/archive/.config/origin-ci-tool/logs/junit
	at hudson.util.IOUtils.mkdirs(IOUtils.java:66)
	at hudson.FilePath.mkdirsE(FilePath.java:2932)
	at hudson.FilePath.access$2000(FilePath.java:197)
	at hudson.FilePath$42$1.visit(FilePath.java:2151)
	at hudson.util.DirScanner.scanSingle(DirScanner.java:49)
	at hudson.FilePath$ExplicitlySpecifiedDirScanner.scan(FilePath.java:2822)
	at hudson.FilePath$42.invoke(FilePath.java:2146)
	at hudson.FilePath$42.invoke(FilePath.java:2139)
	at hudson.FilePath.act(FilePath.java:998)
	at hudson.FilePath.act(FilePath.java:976)
	at hudson.FilePath.copyRecursiveTo(FilePath.java:2139)
	at jenkins.model.StandardArtifactManager.archive(StandardArtifactManager.java:61)
	at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:244)
	at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:81)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735)
	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:676)
	at hudson.model.Build$BuildExecution.post2(Build.java:186)
	at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:621)
	at hudson.model.Run.execute(Run.java:1760)
	at com.tikal.jenkins.plugins.multijob.MultiJobBuild.run(MultiJobBuild.java:73)
	at hudson.model.ResourceController.execute(ResourceController.java:97)
	at hudson.model.Executor.run(Executor.java:405)
ERROR: Failed to upload files
java.io.IOException: Failed to create a temporary file in /var/lib/jenkins/fingerprints/72/e2
	at hudson.util.AtomicFileWriter.<init>(AtomicFileWriter.java:70)
	at hudson.util.AtomicFileWriter.<init>(AtomicFileWriter.java:57)
	at hudson.model.Fingerprint.save(Fingerprint.java:1256)
	at hudson.model.Fingerprint.save(Fingerprint.java:1245)
	at hudson.model.Fingerprint.<init>(Fingerprint.java:882)
	at hudson.model.FingerprintMap.create(FingerprintMap.java:93)
	at hudson.model.FingerprintMap.create(FingerprintMap.java:47)
	at hudson.util.KeyedDataStorage.get(KeyedDataStorage.java:163)
	at hudson.model.FingerprintMap.get(FingerprintMap.java:82)
	at hudson.model.FingerprintMap.get(FingerprintMap.java:47)
	at hudson.util.KeyedDataStorage.getOrCreate(KeyedDataStorage.java:111)
	at hudson.model.FingerprintMap.getOrCreate(FingerprintMap.java:72)
	at hudson.plugins.s3.FingerprintRecord.addRecord(FingerprintRecord.java:33)
	at hudson.plugins.s3.S3BucketPublisher.fillFingerprints(S3BucketPublisher.java:313)
	at hudson.plugins.s3.S3BucketPublisher.perform(S3BucketPublisher.java:264)
	at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:81)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735)
	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:676)

@gabemontero
Copy link
Contributor Author

/test extended_conformance_install_update

@0xmichalis
Copy link
Contributor

cc @stevekuznetsov

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue.

@openshift-merge-robot openshift-merge-robot merged commit e73109b into openshift:master Oct 22, 2017
@gabemontero gabemontero deleted the fix-prometheus-ext-tests branch October 22, 2017 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants