Skip to content

Conversation

@templecloud
Copy link
Contributor

@templecloud templecloud commented Sep 6, 2018

The canary team requested some changes to the CCM (and volume-provisioner and flexvolume-driver) canary. The requested changes were:

  1. Provide a mode where the canary test container entrypoint itself runs in an infinite loop and executes the test on each invocation.
  2. The entry point is made available on path: '/oci/scripts/ccm-canary-entrypoint.sh'

This commit attempts to provide this functionality. The default behaviour of the ccm-canary is now to run in an infinite loop with a pause of 30s before each invocation of the ccm e2e canary test. The pause period can be configured via the 'MONITOR_PERIOD' environment variable. The 'mode' can be configured via the 'CANARY_MODE' ('monitor'|'run-once') environment variable.

The canary validation test has been altered to test the result from the canary for 'CANARY_RUNS' invocations - configured via an environment variable in the CI or locally for dev.


# Validate the generated canary test image.
# Run the canary tests - in monitor (infinite loop) mode.
.PHONY: canary-monitor
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New monitor mode.

canary:
@./hack/test-canary.sh
# Run the canary tests - in single run mode.
.PHONY: canary-run-once
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original mode renamed.

@@ -0,0 +1,29 @@
#!/bin/bash
Copy link
Contributor Author

@templecloud templecloud Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is for the new entrypoint. The easiest solution (rather than mess with GOPATH) is for the entrypoint script to move to the original project root and execute the canary test from there as before.

-focus "\[Canary\]" \
test/e2e \
-- --kubeconfig=${KUBECONFIG} --delete-namespace=false \
-- --kubeconfig=${KUBECONFIG} --delete-namespace=true \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a better default for the canary tests.

function clean() {
kubectl get pods --all-namespaces | grep ccm | awk '{print $1}' | xargs kubectl delete ns
rm "${TEST_DIR}/${TEST_PREFIX}*"
echo "ensuring fresh \$START."
Copy link
Contributor Author

@templecloud templecloud Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In monitor mode we no longer run once and exit - so we need a function to clean up resources between each iteration.

}

# Run the tests in loop with the specified wait period.
function monitor() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new 'monitor' method just sits in an infinite loop and calls the original 'run-once' method on each iteration. It also sleeps for the specified period. This will give a window of opportunity for any sidecars to grab the results from filesystem. They will be deleted on the next iteration of the tests and the current results lost.

value: /metrics/output.json
- name: KUBECONFIG_VAR
value: $(cat ${KUBECONFIG} | openssl enc -base64 -A)
- name: KUBECONFIG_VAR
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation test also shows expected usage of the canary in a pod.

image: iad.ocir.io/oracle/oci-cloud-controller-manager-ci-e2e:1.0.1
command: ["/bin/bash"]
args: ["-ec", "touch \$METRICS_FILE; while [ -z \$(cat \$METRICS_FILE | grep 'end_time' | cut -d':' -f 1) ]; do sleep 1; done; cat \$METRICS_FILE"]
args: ["-ec", "while true; do sleep 10; cat \$METRICS_FILE; done"]
Copy link
Contributor Author

@templecloud templecloud Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The canary validation tests now has a dumb reporter that just sits in an infinite loop and cats the contents of the metrics file to std out.

else
exit 0
fi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation test itself is a little more involved now. We wait for canary to run CANARY_RUNS times. If we have any failures we consider it FAILED. If we have all successes then PASSED. On completion we destroy the canary pod.

We let the CI destroy long running tests. This should also destroy any running canaries.

f.cleanupHandle = AddCleanupAction(f.AfterEach)

if f.Client == nil {
// Create an OCI client if the cloudConfig has been specified.
Copy link
Contributor Author

@templecloud templecloud Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A series of tweaks to allow users to avoid specifying the cloud config if not required. The canary runner might not want to do this.

wercker.yml Outdated
- script:
name: validate canary tests
code: make validate-canary
code: make validate-canary
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops will get rid of that trailing ws...

- script:
name: configure oci canary entrypoint
code: mkdir -p /oci/scripts/ && cp ./hack/ccm-canary-entrypoint.sh /oci/scripts/ccm-canary-entrypoint.sh
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now mount in the new entry point into the container.

- mountPath: /metrics
name: metrics-volume
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops.

Copy link
Member

@kristenjacobs kristenjacobs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me.

local num_pass=$(echo "${logs}"| grep '"create_lb": "1"' | uniq | wc -l)
local num_fail=$(echo "${logs}"| grep '"create_lb": "0"' | uniq | wc -l)
if [ "${num_fail}" -gt "0" ]; then
echo "FAILED"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe dump the test logs here for the failure case, i.e. it might make debugging a failure easier?

@owainlewis
Copy link
Member

Currently failing E2E tests so can't merge.

@templecloud
Copy link
Contributor Author

templecloud commented Sep 7, 2018

Thanks Owain - I should have been monitoring this. The e2e test 'failed to provision a load balancer within 20m'. We only have 3 (100MB) load balancers available in the the test tenancy region (spinnaker / IAD). I had a look and there were some sick ones sitting around in the 'general' compartment which I deleted.

I ran the test again with no code changes and it passed second time around - so I am going to put it down to it correctly failing due to having no load balancers available.

I dont know if there is anything we can do to alleviate this issue in future?

@templecloud templecloud changed the title Adding canary 'monitor' node. Adding canary 'monitor' mode. Sep 7, 2018
export CANARY_MODE="monitor"
fi

# "/go/src/github.com/oracle/oci-cloud-controller-manager"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment can be removed

fi

# "/go/src/github.com/oracle/oci-cloud-controller-manager"
export CCM_DIR="${GOPATH}/src/github.com/oracle/oci-cloud-controller-manager"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just simply:

pushd "${GOPATH}/src/github.com/oracle/oci-cloud-controller-manager"

Copy link
Member

@owainlewis owainlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. OK to merge.

… to allow the test pods to commuicate with worker nodes. This fixes that issues by making the nodePort portion of the tests configurable.
@owainlewis owainlewis removed the request for review from prydie September 14, 2018 08:28
@owainlewis owainlewis merged commit b36103d into master Sep 14, 2018
@owainlewis owainlewis deleted the trjl/infinite-canary branch September 14, 2018 08:32
ayushverma14 pushed a commit to ayushverma14/oci-cloud-controller-manager that referenced this pull request Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants