Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce readiness check and proper wait in init --wait #1629

Merged
merged 3 commits into from
Aug 6, 2020

Conversation

alenkacz
Copy link
Contributor

@alenkacz alenkacz commented Aug 4, 2020

What this PR does / why we need it:
This is part of the solution for #1625

Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
@kensipe
Copy link
Member

kensipe commented Aug 4, 2020

One thing to consider... is while liveness probes and readiness probes are v1.0 stable... startup probes are v1.16 [alpha]. I don't think this is viable for 1.15 or earlier... I also believe konvoy is based on 1.15.
We would have to claim a requirement on kube 1.16+ for a release with this solution.

@kensipe
Copy link
Member

kensipe commented Aug 4, 2020

…nt on readiness probe

Signed-off-by: Ken Sipe <kensipe@gmail.com>
@kensipe kensipe changed the title Introduce a TCP startup check TCP ReadinessProbe added. Init --wait Waits on Ready. Aug 4, 2020
Copy link
Member

@kensipe kensipe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kensipe kensipe added this to the 0.16.0 milestone Aug 4, 2020
@@ -32,6 +32,8 @@ type Initializer struct {
deployment *appsv1.StatefulSet
}

const Name = "manager"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We started to consolidate all the hard-coded names in types.go. Please move this one there too

Copy link
Member

@kensipe kensipe Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I much prefer it here... packages communicate good meaning in Go and manager.Name is very clear and concise IMO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reviewed types.go... not a fan of what is there. for instance ManagerName is not true... that is the manager pod name. IMO it would be better to have manager.PodName than kudoinit.DefaultManagerName which isn't correct... Based on this assertion... perhaps manager.Name could be more accurate as manager.ContainerName but it seems like a reasonable abbrev. It seems to we that all other uses of types.go in the maturer kube eco-system is for values used in the API. For kudoinit, that would be types / options used in Artifacter or InstallVerifier, Step, etc. The constants defined here now, do not adhere to that and thus seem off to me. Without tribal nugging I wouldn't know or to think to put them there. What is the position in having them there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's subjective of course but I do like kudoinit.DefaultManagerName more than manager.Name. First clearly indicates that this is for the init while the second could be anything manager related. I agree that some naming can be improved but my main argument is about consistency: we can have these constants one way (all in one place) or another (each in its own package) but please not both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree on subjectivity... part of my questioning is the scenario you are bringing up... for the current use case this is init so kudoinit makes sense... but it is likely that we would use these constants in other ways which isn't init... By defining them in the package for which they define state they can be leverage by packages that define behavior (if it is a separate package). this doesn't work well the other way IMO.
It would be great to meet around some standards and conventions as well. It would be good to get more aligned on this... having constants "all in one place" isn't a goal that makes sense to me... at least not in isolation.

if err == nil && image == expectedImage {
func verifyKudoDeployment(client corev1.PodsGetter, namespace string) (bool, error) {
ready, err := isKUDOPodReady(client, namespace)
if err == nil && ready {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I object but why was the image comparison removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question... I didn't see any value in it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the value was being able to establish whether it's the version of pod we expect. Not useful when the pod is being created initially, but when we are rolling out an update to the manager deployment, we need to guard against declaring victory after just checking that the pre-existing pod is healthy, i.e. the following race scenario:

  • apply statefulset change
  • statefulset controller is lazy, doing nothing yet
  • we check the (old!) pod health, hey! it's healthy, consider the rollout done
  • only now does the statefulset controller replace the pod with the new one...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in agreement with the need for awareness of this scenario...

  1. currently we do NOT apply a change to statefulset... we delete, then apply
  2. I'm not confident that image version change is sufficient... for instance, is it possible that a user may want to "upgrade" to the latest service-accounts, crds, etc, but use a previous version of kudo manager? The version is a flag controllable thing... Assuming it is the same version of the cli is likely the 80% use case... perhaps more.. but doesn't capture all possibilities.
  3. it would be better IMO to either wait for deletes to complete... or capture the current statefulset prior to upgrade and check revision. I'll create an Issue to capture this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@@ -586,6 +586,9 @@ spec:
- containerPort: 443
name: webhook-server
protocol: TCP
readinessProbe:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it is readiness probe after all? The PR description needs updating, then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the startup probe is 1.16+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't we recently introduced something that requires k8s version >= 1.16 anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we did but I removed it again so we're 1.15 :)

if err == nil && image == expectedImage {
func verifyKudoDeployment(client corev1.PodsGetter, namespace string) (bool, error) {
ready, err := isKUDOPodReady(client, namespace)
if err == nil && ready {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the value was being able to establish whether it's the version of pod we expect. Not useful when the pod is being created initially, but when we are rolling out an update to the manager deployment, we need to guard against declaring victory after just checking that the pre-existing pod is healthy, i.e. the following race scenario:

  • apply statefulset change
  • statefulset controller is lazy, doing nothing yet
  • we check the (old!) pod health, hey! it's healthy, consider the rollout done
  • only now does the statefulset controller replace the pod with the new one...

@alenkacz alenkacz changed the title TCP ReadinessProbe added. Init --wait Waits on Ready. Introduce readiness check and proper wait in init --wait Aug 5, 2020
Copy link
Member

@ANeumann82 ANeumann82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Signed-off-by: Ken Sipe <kensipe@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants