-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce readiness check and proper wait in init --wait
#1629
Conversation
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
cfb620f
to
e811dad
Compare
One thing to consider... is while liveness probes and readiness probes are v1.0 stable... startup probes are |
…nt on readiness probe Signed-off-by: Ken Sipe <kensipe@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@@ -32,6 +32,8 @@ type Initializer struct { | |||
deployment *appsv1.StatefulSet | |||
} | |||
|
|||
const Name = "manager" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We started to consolidate all the hard-coded names in types.go. Please move this one there too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I much prefer it here... packages communicate good meaning in Go and manager.Name
is very clear and concise IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just reviewed types.go... not a fan of what is there. for instance ManagerName is not true... that is the manager pod name. IMO it would be better to have manager.PodName
than kudoinit.DefaultManagerName
which isn't correct... Based on this assertion... perhaps manager.Name
could be more accurate as manager.ContainerName
but it seems like a reasonable abbrev. It seems to we that all other uses of types.go
in the maturer kube eco-system is for values used in the API. For kudoinit, that would be types / options used in Artifacter or InstallVerifier, Step, etc. The constants defined here now, do not adhere to that and thus seem off to me. Without tribal nugging I wouldn't know or to think to put them there. What is the position in having them there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's subjective of course but I do like kudoinit.DefaultManagerName
more than manager.Name
. First clearly indicates that this is for the init while the second could be anything manager related. I agree that some naming can be improved but my main argument is about consistency: we can have these constants one way (all in one place) or another (each in its own package) but please not both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree on subjectivity... part of my questioning is the scenario you are bringing up... for the current use case this is init so kudoinit
makes sense... but it is likely that we would use these constants in other ways which isn't init... By defining them in the package for which they define state they can be leverage by packages that define behavior (if it is a separate package). this doesn't work well the other way IMO.
It would be great to meet around some standards and conventions as well. It would be good to get more aligned on this... having constants "all in one place" isn't a goal that makes sense to me... at least not in isolation.
if err == nil && image == expectedImage { | ||
func verifyKudoDeployment(client corev1.PodsGetter, namespace string) (bool, error) { | ||
ready, err := isKUDOPodReady(client, namespace) | ||
if err == nil && ready { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I object but why was the image comparison removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question... I didn't see any value in it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the value was being able to establish whether it's the version of pod we expect. Not useful when the pod is being created initially, but when we are rolling out an update to the manager deployment, we need to guard against declaring victory after just checking that the pre-existing pod is healthy, i.e. the following race scenario:
- apply statefulset change
- statefulset controller is lazy, doing nothing yet
- we check the (old!) pod health, hey! it's healthy, consider the rollout done
- only now does the statefulset controller replace the pod with the new one...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in agreement with the need for awareness of this scenario...
- currently we do NOT apply a change to statefulset... we delete, then apply
- I'm not confident that image version change is sufficient... for instance, is it possible that a user may want to "upgrade" to the latest service-accounts, crds, etc, but use a previous version of kudo manager? The version is a flag controllable thing... Assuming it is the same version of the cli is likely the 80% use case... perhaps more.. but doesn't capture all possibilities.
- it would be better IMO to either wait for deletes to complete... or capture the current statefulset prior to upgrade and check revision. I'll create an Issue to capture this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
@@ -586,6 +586,9 @@ spec: | |||
- containerPort: 443 | |||
name: webhook-server | |||
protocol: TCP | |||
readinessProbe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it is readiness probe after all? The PR description needs updating, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, the startup probe is 1.16+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't we recently introduced something that requires k8s version >= 1.16 anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we did but I removed it again so we're 1.15 :)
if err == nil && image == expectedImage { | ||
func verifyKudoDeployment(client corev1.PodsGetter, namespace string) (bool, error) { | ||
ready, err := isKUDOPodReady(client, namespace) | ||
if err == nil && ready { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the value was being able to establish whether it's the version of pod we expect. Not useful when the pod is being created initially, but when we are rolling out an update to the manager deployment, we need to guard against declaring victory after just checking that the pre-existing pod is healthy, i.e. the following race scenario:
- apply statefulset change
- statefulset controller is lazy, doing nothing yet
- we check the (old!) pod health, hey! it's healthy, consider the rollout done
- only now does the statefulset controller replace the pod with the new one...
init --wait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Signed-off-by: Ken Sipe <kensipe@gmail.com>
What this PR does / why we need it:
This is part of the solution for #1625