-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with health #976
Comments
Hey @pintonunes, good to know you're still seeing this in 0.7.2. We shipped 0.7.5, but haven't added any health check changes in the meantime. I'll investigate. |
@pintonunes and just confirming, the plan doesn't run and your deployment doesn't update, yeah? Or does it just switch the image on the deployment, the deployment starts to do its rollout, and all the while KUDO is reporting COMPLETED instead of IN_PROGRESS? |
Plan runs but is marked as completed while the deployment is still updating.
I also put a readiness probe on the deployment that fails for some time but kudo shows the plan as completed after a couple of seconds.
… On Oct 18, 2019, at 7:52 PM, Gerred Dillon ***@***.***> wrote:
@pintonunes and just confirming, the plan doesn't run and your deployment doesn't update, yeah? Or does it just switch the image on the deployment, the deployment starts to do its rollout, and all the while KUDO is reporting COMPLETED instead of IN_PROGRESS?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Seems racy. OK, think I have enough to write a regression test around. Thank you! |
@pintonunes yeah, this is potentially a duplicate of #664 right? |
I am taking a look what to do with that, more concrete how to leverage e.g. this https://github.com/kubernetes/kubernetes/blob/75d51896125b1cafd9e995212b02ccda9b8a0aed/staging/src/k8s.io/kubectl/pkg/polymorphichelpers/rollout_status.go#L38 |
I need to be able to depend on 1.16 to do that meaningfully, I'll wait for the version bump we have in progress anyway cc. @gerred |
Yes, this its a dup of #664 |
Awesome guys! Will test as soon as 0.8.0 comes out. |
@pintonunes lucky day - I cut https://github.com/kudobuilder/kudo/releases/tag/v0.8.0-pre.0 a little bit ago. It's still raw and not everything is documented that, but please try this out and give it a shot. I'll keep this issue open until you confirm it works for your needs (and then please close this). |
Guys, I'm afraid that this is still not solved. Using kudo 0.8.0 now and: 2019/11/11 21:12:50 InstanceController: Received Reconcile request for instance "p109" If you need for info from my side just let me know |
@pintonunes Do you have a repo that we can use to reproduce this? |
Let me do some tests on my side to be able to provide you something to work on. |
I think so, I think we'll need to write whatever that flaky test might end up being or run it through a debug build a few times. I'll go look through our deployment health check to see what's happening there. |
@pintonunes would be awesome to have some small thing that will help us reproduce. Right now we're using the same code for deployment readiness as kubectl does so I would hope that's already pretty solid but of course everything can be buggy :) |
Sure.. Let me find some time for that. |
I noticed we're passing |
@pintonunes it would be great if you could just dump your operator as is, censoring out pieces that you are not comfortable sharing. This would allow us to attempt reproducing your issue. |
Oh yeah, @porridge I think you might be right there, after checking the code again and the way kubectl use it https://github.com/kubernetes/kubectl/blob/7b01e2757cc74b1145976726b05cc1108ad2911d/pkg/cmd/rollout/rollout_status.go if we don't put the revision there then we might be asserting health on the previous one. I need to check when is the revision actually created, but I assume it's done asynchronously via some controller so it will most probably be racy I originally assumed from this https://github.com/kubernetes/kubectl/blob/7b01e2757cc74b1145976726b05cc1108ad2911d/pkg/cmd/rollout/rollout_status.go#L116 that the latest revision is fine, since we're interested in the latest :) |
okay, this is where the new revision gets created https://github.com/kubernetes/kubernetes/blob/6a19261e96b12c83e6c69d15e2dcca8089432838/pkg/controller/deployment/sync.go#L143 Looks like we might need for kudo to know what revision are we rolling to, so potentially at the time of execution we might store the target revision into some annotation? |
Hm, it's not clear to me how to provide this revision either... in Maybe we should actually be looking at |
@porridge hmm I did a little bit more of digging. So revision seems to be number very tightly related to actuall rollout - so if your change does not require pod restarts or starting of new pods or killing pods, revision stays the same and so does generation. I tested this with updating As for generation, that is actually number set by API server which is good, because that's synchronous. Also the generation changes only if spec changes, but change in spec does not mean that some pods will be rolled out (you can test this by changing So with all that what I am kind of leaning toward is:
WDYT? :) |
@alenkacz, sorry but I didn't had time to give you a working example where this happens.
At the creation of the deployment I never seen the issue. Anyway, I'm available to test any build that you have that might fix the issue. Also for a webex, if you want to see it happening live. |
@alenkacz I had misconfigured my email filters and only saw your message today. What you wrote SGTM. However it seems as if it's not just us who need this logic, so perhaps there is code somewhere already in k8s that implements this algorithm? |
@porridge yeah, I wonder if that exists... I did not find it though, feel free to fish around :) |
@pintonunes can you provide KUDO controller logs from the situation when the health does not work? I spent some time trying to reproduce and was not able. I tried even scaling down:
I was testing on a master (which is 0.9.0) but that hopefully should not make a difference as we did not change the behavior as far as I am aware. The
|
Hi @alenkacz, let me try with the 0.9.0 and send you the logs if it fails |
@pintonunes thanks, that would be helpful, even 0.8.0 logs might be enough though, I just wonder what the health module was saying when marking your deployment as done |
okay I think with #1187 I definitely fixed some issues we had when evaluating health. Do you by any chance have the operator you can share for testing? If not, that PR will probably merge tomorrow so you could test it if you have time :) @pintonunes |
This is fixed on master and will be released in 0.10.0 please reopen if you can still see the issue. If you have operator to reproduce, I would love to try that :-) |
Sorry Alena, I didn’t had the time to test this yet. Holiday season.. you
know..
I will reopen with logs and everything if I find some issue
*From:* Alena Varkockova <notifications@github.com>
*Sent:* Friday, December 20, 2019 4:45 PM
*To:* kudobuilder/kudo <kudo@noreply.github.com>
*Cc:* Pedro Nunes <pedro.nunes@outsystems.com>; Mention <
mention@noreply.github.com>
*Subject:* Re: [kudobuilder/kudo] Issues with health (#976)
This is fixed on master and will be released in 0.10.0 please reopen if you
can still see the issue. If you have operator to reproduce, I would love to
try that :-)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#976?email_source=notifications&email_token=AJBA3CEV47JL7GWMLUC5YB3QZTZCLA5CNFSM4JCK7G62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHNNPUA#issuecomment-567990224>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJBA3CCKO2VKKHXMYZQX4CDQZTZCLANCNFSM4JCK7G6Q>
.
|
No pressure, send it over in January, I am happy to reopen if there's still more issues :) |
Hi,
Being facing some issues with kudo health checks.
On my understanding, kudo checks for the resources health right after sending the request to kubernetes. In case of a deployment checks:
if obj.Spec.Replicas != nil && obj.Status.ReadyReplicas == *obj.Spec.Replicas
In my case I'm using a kudo parameter to update an image tag on a deployment.
I did some watches on the kubernetes API and on the first changes, the object has already the new image but the spec.replicas is still equal to status.readyreplicas and Kudo marks the plan as COMPLETED.
This makes any sense to you guys?
I'm guessing that this might happens on any deployment change besides the number of replicas.
I think you can replicate this very easily with a simple update plan that changes the image tag of a deployment.
Environment:
kubectl version
):1.14.2kubectl kudo version
): 0.7.2The text was updated successfully, but these errors were encountered: