-
Notifications
You must be signed in to change notification settings - Fork 96
Adding canary 'monitor' mode. #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| # Validate the generated canary test image. | ||
| # Run the canary tests - in monitor (infinite loop) mode. | ||
| .PHONY: canary-monitor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New monitor mode.
| canary: | ||
| @./hack/test-canary.sh | ||
| # Run the canary tests - in single run mode. | ||
| .PHONY: canary-run-once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original mode renamed.
| @@ -0,0 +1,29 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script is for the new entrypoint. The easiest solution (rather than mess with GOPATH) is for the entrypoint script to move to the original project root and execute the canary test from there as before.
| -focus "\[Canary\]" \ | ||
| test/e2e \ | ||
| -- --kubeconfig=${KUBECONFIG} --delete-namespace=false \ | ||
| -- --kubeconfig=${KUBECONFIG} --delete-namespace=true \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a better default for the canary tests.
| function clean() { | ||
| kubectl get pods --all-namespaces | grep ccm | awk '{print $1}' | xargs kubectl delete ns | ||
| rm "${TEST_DIR}/${TEST_PREFIX}*" | ||
| echo "ensuring fresh \$START." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In monitor mode we no longer run once and exit - so we need a function to clean up resources between each iteration.
| } | ||
|
|
||
| # Run the tests in loop with the specified wait period. | ||
| function monitor() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new 'monitor' method just sits in an infinite loop and calls the original 'run-once' method on each iteration. It also sleeps for the specified period. This will give a window of opportunity for any sidecars to grab the results from filesystem. They will be deleted on the next iteration of the tests and the current results lost.
| value: /metrics/output.json | ||
| - name: KUBECONFIG_VAR | ||
| value: $(cat ${KUBECONFIG} | openssl enc -base64 -A) | ||
| - name: KUBECONFIG_VAR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation test also shows expected usage of the canary in a pod.
| image: iad.ocir.io/oracle/oci-cloud-controller-manager-ci-e2e:1.0.1 | ||
| command: ["/bin/bash"] | ||
| args: ["-ec", "touch \$METRICS_FILE; while [ -z \$(cat \$METRICS_FILE | grep 'end_time' | cut -d':' -f 1) ]; do sleep 1; done; cat \$METRICS_FILE"] | ||
| args: ["-ec", "while true; do sleep 10; cat \$METRICS_FILE; done"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The canary validation tests now has a dumb reporter that just sits in an infinite loop and cats the contents of the metrics file to std out.
| else | ||
| exit 0 | ||
| fi | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation test itself is a little more involved now. We wait for canary to run CANARY_RUNS times. If we have any failures we consider it FAILED. If we have all successes then PASSED. On completion we destroy the canary pod.
We let the CI destroy long running tests. This should also destroy any running canaries.
| f.cleanupHandle = AddCleanupAction(f.AfterEach) | ||
|
|
||
| if f.Client == nil { | ||
| // Create an OCI client if the cloudConfig has been specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A series of tweaks to allow users to avoid specifying the cloud config if not required. The canary runner might not want to do this.
wercker.yml
Outdated
| - script: | ||
| name: validate canary tests | ||
| code: make validate-canary | ||
| code: make validate-canary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops will get rid of that trailing ws...
| - script: | ||
| name: configure oci canary entrypoint | ||
| code: mkdir -p /oci/scripts/ && cp ./hack/ccm-canary-entrypoint.sh /oci/scripts/ccm-canary-entrypoint.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now mount in the new entry point into the container.
| - mountPath: /metrics | ||
| name: metrics-volume | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops.
kristenjacobs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me.
| local num_pass=$(echo "${logs}"| grep '"create_lb": "1"' | uniq | wc -l) | ||
| local num_fail=$(echo "${logs}"| grep '"create_lb": "0"' | uniq | wc -l) | ||
| if [ "${num_fail}" -gt "0" ]; then | ||
| echo "FAILED" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe dump the test logs here for the failure case, i.e. it might make debugging a failure easier?
|
Currently failing E2E tests so can't merge. |
|
Thanks Owain - I should have been monitoring this. The e2e test 'failed to provision a load balancer within 20m'. We only have 3 (100MB) load balancers available in the the test tenancy region (spinnaker / IAD). I had a look and there were some sick ones sitting around in the 'general' compartment which I deleted. I ran the test again with no code changes and it passed second time around - so I am going to put it down to it correctly failing due to having no load balancers available. I dont know if there is anything we can do to alleviate this issue in future? |
hack/ccm-canary-entrypoint.sh
Outdated
| export CANARY_MODE="monitor" | ||
| fi | ||
|
|
||
| # "/go/src/github.com/oracle/oci-cloud-controller-manager" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment can be removed
hack/ccm-canary-entrypoint.sh
Outdated
| fi | ||
|
|
||
| # "/go/src/github.com/oracle/oci-cloud-controller-manager" | ||
| export CCM_DIR="${GOPATH}/src/github.com/oracle/oci-cloud-controller-manager" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just simply:
pushd "${GOPATH}/src/github.com/oracle/oci-cloud-controller-manager"
owainlewis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. OK to merge.
… to allow the test pods to commuicate with worker nodes. This fixes that issues by making the nodePort portion of the tests configurable.
Adding canary monitor mode.
The canary team requested some changes to the CCM (and volume-provisioner and flexvolume-driver) canary. The requested changes were:
This commit attempts to provide this functionality. The default behaviour of the ccm-canary is now to run in an infinite loop with a pause of 30s before each invocation of the ccm e2e canary test. The pause period can be configured via the 'MONITOR_PERIOD' environment variable. The 'mode' can be configured via the 'CANARY_MODE' ('monitor'|'run-once') environment variable.
The canary validation test has been altered to test the result from the canary for 'CANARY_RUNS' invocations - configured via an environment variable in the CI or locally for dev.