Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix endless loops in test.sh #692

Closed

Conversation

ifireball
Copy link
Contributor

test.sh could go into an endless loop waiting for containers to start
up. This patch:

  • Puts a hard limit for the amount of cycles the script can spend in the
    wait loops
  • Detects the scenario where the cluster apparently died so things would
    never come up.

@rmohr
Copy link
Member

rmohr commented Feb 1, 2018

can't you just set a hard limit on the job? That is at least what we do on our CI.

echo "Waiting for kubevirt pods to enter the Running state ..."
kubectl get pods -n kube-system --no-headers | >&2 grep -v Running || true
sleep 10
if ! kctl_out=$(kubectl get pods -n kube-system --no-headers); then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good to have limited retries, however could you do that in a little bit more sophisticated and reusable way? Maybe with something like a backoff-retry like it is done here: https://coderwall.com/p/--eiqg/exponential-backoff-in-bash ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think exponential backoff would be useful here, since it won't free up any usable resources effectively, since the run slave would still be held by the job.

Anyway re-factored the code to make it look a little nicer

@ifireball
Copy link
Contributor Author

We're going to make the job impose a hard timeout externally too - but its better to also fix this here so that more accurate output can be shown.

kubectl get pods -n kube-system -o'custom-columns=status:status.containerStatuses[*].ready,metadata:metadata.name' --no-headers | awk '/virt-controller/ && /true/' | wc -l
sleep 10
done
retry_check containers_ready 'KubeVirt virt-controller container' 'virt-controller'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this will work. Wouldn't that check all virt-controller instances to be ready? We just need one (and actually only one can be ready at the same time).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I wanna find a way to do that without having to write another function just to have a slightly different condition.

can I assume the line showing the ready virt-controller will not contain the string 'false'?

@ifireball
Copy link
Contributor Author

ci test please

@ifireball ifireball force-pushed the fix-tester-infinite-loop branch 3 times, most recently from aef3dce to ac4a63b Compare February 4, 2018 09:17
`test.sh` could go into an endless loop waiting for containers to start
up. This patch:
- Puts a hard limit for the amount of cycles the script can spend in the
  wait loops
- Detects the scenario where the cluster apparently died so things would
  never come up.
@ifireball
Copy link
Contributor Author

@rmohr now the test is finally timing out on starting one of the KubeVirt containers. Is 10m not enough? how long do you think this should take?

The container that is failing to start is:

kube-dns-6f4fd4bdf-p75cb

The system seems to be trying to start 3 copies of it and managing to start only two.

@rmohr
Copy link
Member

rmohr commented Feb 5, 2018

@ifireball that sounds like a problem with setting up k8s itself. It is very likely a problem in kubeadm, k8s or weave ...

I hope that we can soon start pre-creating k8s clusters, and only start preinstalled clusters in our vms anymore, to avoid such inconveniences ...

@rmohr
Copy link
Member

rmohr commented Feb 5, 2018

@cynepco3hahue what @ifireball says regarding to kube-dns seems to be realted to the issue you told me about on irc.

@rmohr
Copy link
Member

rmohr commented Feb 6, 2018

Somewhat depends on the outcome of #710

@eedri
Copy link

eedri commented Feb 14, 2018

#710 is merged, is this patch still blocked?

@ifireball
Copy link
Contributor Author

ci test please

1 similar comment
@ifireball
Copy link
Contributor Author

ci test please

@rmohr
Copy link
Member

rmohr commented Feb 14, 2018

ok to test

@ifireball
Copy link
Contributor Author

ci test please

1 similar comment
@eedri
Copy link

eedri commented Feb 28, 2018

ci test please

@cynepco3hahue
Copy link

CI still has a lot of troubles with vagrant machines

An action 'halt' was attempted on the machine 'node0',
15:47:16 [check-patch.el7.x86_64] but another process is already executing an action on the machine.
15:47:16 [check-patch.el7.x86_64] Vagrant locks each machine for access by only one process at a time.
15:47:16 [check-patch.el7.x86_64] Please wait until the other Vagrant process finishes modifying this
15:47:16 [check-patch.el7.x86_64] machine, then try again.

@rmohr
Copy link
Member

rmohr commented Feb 28, 2018

retest this please

2 similar comments
@fabiand
Copy link
Member

fabiand commented Mar 13, 2018

retest this please

@eedri
Copy link

eedri commented Apr 1, 2018

retest this please

@eedri
Copy link

eedri commented Apr 1, 2018

ci test please

@eedri
Copy link

eedri commented Apr 30, 2018

Any update?

@eedri
Copy link

eedri commented Apr 30, 2018

@fabiand should we abandon this patch? it doesn't seem like its going somewhere

@fabiand fabiand closed this Jun 11, 2018
kubevirt-bot pushed a commit to kubevirt-bot/kubevirt that referenced this pull request Nov 6, 2020
kubevirt-bot pushed a commit to kubevirt-bot/kubevirt that referenced this pull request Dec 7, 2021
Signed-off-by: Federico Gimenez <fgimenez@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants