New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix endless loops in test.sh #692
Conversation
can't you just set a hard limit on the job? That is at least what we do on our CI. |
automation/test.sh
Outdated
echo "Waiting for kubevirt pods to enter the Running state ..." | ||
kubectl get pods -n kube-system --no-headers | >&2 grep -v Running || true | ||
sleep 10 | ||
if ! kctl_out=$(kubectl get pods -n kube-system --no-headers); then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is good to have limited retries, however could you do that in a little bit more sophisticated and reusable way? Maybe with something like a backoff-retry like it is done here: https://coderwall.com/p/--eiqg/exponential-backoff-in-bash ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think exponential backoff would be useful here, since it won't free up any usable resources effectively, since the run slave would still be held by the job.
Anyway re-factored the code to make it look a little nicer
We're going to make the job impose a hard timeout externally too - but its better to also fix this here so that more accurate output can be shown. |
2c370f4
to
af50008
Compare
automation/test.sh
Outdated
kubectl get pods -n kube-system -o'custom-columns=status:status.containerStatuses[*].ready,metadata:metadata.name' --no-headers | awk '/virt-controller/ && /true/' | wc -l | ||
sleep 10 | ||
done | ||
retry_check containers_ready 'KubeVirt virt-controller container' 'virt-controller' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this will work. Wouldn't that check all virt-controller instances to be ready? We just need one (and actually only one can be ready at the same time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I wanna find a way to do that without having to write another function just to have a slightly different condition.
can I assume the line showing the ready virt-controller will not contain the string 'false'?
af50008
to
8f28a74
Compare
ci test please |
aef3dce
to
ac4a63b
Compare
`test.sh` could go into an endless loop waiting for containers to start up. This patch: - Puts a hard limit for the amount of cycles the script can spend in the wait loops - Detects the scenario where the cluster apparently died so things would never come up.
ac4a63b
to
47c4929
Compare
@rmohr now the test is finally timing out on starting one of the KubeVirt containers. Is 10m not enough? how long do you think this should take? The container that is failing to start is:
The system seems to be trying to start 3 copies of it and managing to start only two. |
@ifireball that sounds like a problem with setting up k8s itself. It is very likely a problem in kubeadm, k8s or weave ... I hope that we can soon start pre-creating k8s clusters, and only start preinstalled clusters in our vms anymore, to avoid such inconveniences ... |
@cynepco3hahue what @ifireball says regarding to kube-dns seems to be realted to the issue you told me about on irc. |
Somewhat depends on the outcome of #710 |
#710 is merged, is this patch still blocked? |
ci test please |
1 similar comment
ci test please |
ok to test |
ci test please |
1 similar comment
ci test please |
CI still has a lot of troubles with vagrant machines
|
retest this please |
2 similar comments
retest this please |
retest this please |
ci test please |
Any update? |
@fabiand should we abandon this patch? it doesn't seem like its going somewhere |
Signed-off-by: Federico Gimenez <fgimenez@redhat.com>
test.sh
could go into an endless loop waiting for containers to startup. This patch:
wait loops
never come up.