Operator may crash on the first start #3321
Labels
Comments
|
I fear that secret propagation times are dependent on more than k8s cluster version but also size of the cluster, number of secrets or other API resources. So any timeout we pick might fail for some cases. That means the second approach you suggested sounds more compelling i.e. marking the operator pod as updated to speed up secret propagation. If I am not mistaken the operator already has the necessary permissions (?) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When the operator is deployed for the first time the Secret which contains the webhook certificate is populated by the operator (if the webhook certificate is managed by ECK, which is the case by default).
Unfortunately it can take some time for the content of the Secret to be propagated into the container. A wait loop has been introduced in #2312 but my experience while testing
1.2.0-bc2seems to show that the current timeout (30 seconds) is too low.I did a few tests to understand what timeout value would be acceptable (ECK version:
1.2.0-bc2onv1.15.12-gke.2):A first option would be to increase the timeout to something like 90~100 seconds, but it really looks like a high value to me.
A second solution would be to update the Pod which is running the operator by using the MarkPodsAsUpdated function. An other benefit would be that the certificate is also propagate faster when renewed. But it means that the operator should be able to update its own
Pod. (see #496 and #568 for more context)The text was updated successfully, but these errors were encountered: