-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Wait for the certs to be mounted inside the container #2198
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,12 +26,15 @@ import ( | |
"errors" | ||
"fmt" | ||
"math/big" | ||
"os" | ||
"path/filepath" | ||
"strings" | ||
"time" | ||
|
||
admissionregistrationv1 "k8s.io/api/admissionregistration/v1" | ||
corev1 "k8s.io/api/core/v1" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/apimachinery/pkg/util/wait" | ||
"k8s.io/klog" | ||
"sigs.k8s.io/controller-runtime/pkg/client" | ||
"sigs.k8s.io/controller-runtime/pkg/manager" | ||
|
@@ -53,11 +56,11 @@ type CertGenerator struct { | |
namespace string | ||
webhookServiceName string | ||
webhookSecretName string | ||
fullServiceDomain string | ||
kubeClient client.Client | ||
certsReady chan struct{} | ||
|
||
certs *certificates | ||
fullServiceDomain string | ||
certs *certificates | ||
} | ||
|
||
var _ manager.Runnable = &CertGenerator{} | ||
|
@@ -67,11 +70,50 @@ func (c *CertGenerator) Start(ctx context.Context) error { | |
if err := c.generate(ctx); err != nil { | ||
return err | ||
} | ||
klog.Info("Waiting for certs to get ready.") | ||
if err := wait.ExponentialBackoffWithContext(ctx, wait.Backoff{ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even though we use cert-manager, the controller must wait for certs to be injected into the Secret resource. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @johnugeorge @tenzen-y Does it look confusing that we have this timeout in our generator package even if Katib Cert Generator doesn't used ? We can rename There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think locating this timeout here would be better since this timeout is a function for the webhook certs. @andreyvelich WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we rename it to
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, sorry. I will revert 69e9dfe since we can avoid the timeout error in the kubeflow installation by removing katib-webhook-cert Secret from kubeflow installation. So, we wouldn't have to change the KatibConfig. WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think, if we are going to remove this timeout from Kubeflow installation, we need to name it as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @andreyvelich That sounds good to me. Thanks for the great suggestion! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
Duration: time.Second, | ||
Factor: 2, | ||
Jitter: 1, | ||
Steps: 10, | ||
Cap: time.Minute * 5, | ||
}, ensureCertMounted(time.Now())); err != nil { | ||
return err | ||
} | ||
// Sending an empty data to a certsReady means it starts to register controllers to the manager. | ||
c.certsReady <- struct{}{} | ||
return nil | ||
} | ||
|
||
// ensureCertMounted ensures that the generated certs are mounted inside the container. | ||
func ensureCertMounted(start time.Time) func(context.Context) (bool, error) { | ||
return func(ctx context.Context) (bool, error) { | ||
now := time.Now() | ||
outputLog := false | ||
if now.Sub(start) >= 15*time.Second { | ||
start = now | ||
outputLog = true | ||
} | ||
|
||
certFile := filepath.Join(consts.CertDir, serverCertName) | ||
if _, err := os.Stat(certFile); err != nil { | ||
if outputLog { | ||
klog.Infof("Public key file %q doesn't exist in the container yet", certFile) | ||
} | ||
return false, nil | ||
} | ||
keyFile := filepath.Join(consts.CertDir, serverKeyName) | ||
if _, err := os.Stat(keyFile); err != nil { | ||
if outputLog { | ||
klog.Infof("Private key file %q doesn't exist in the container yet", keyFile) | ||
} | ||
return false, nil | ||
} | ||
klog.Info("Succeeded to be mounted certs inside the container.") | ||
return true, nil | ||
} | ||
} | ||
|
||
func (c *CertGenerator) NeedLeaderElection() bool { | ||
return false | ||
} | ||
|
@@ -82,8 +124,13 @@ func AddToManager(mgr manager.Manager, config configv1beta1.CertGeneratorConfig, | |
namespace: consts.DefaultKatibNamespace, | ||
webhookServiceName: config.WebhookServiceName, | ||
webhookSecretName: config.WebhookSecretName, | ||
kubeClient: mgr.GetClient(), | ||
certsReady: certsReady, | ||
fullServiceDomain: strings.Join([]string{ | ||
config.WebhookServiceName, | ||
consts.DefaultKatibNamespace, | ||
"svc", | ||
}, "."), | ||
kubeClient: mgr.GetClient(), | ||
certsReady: certsReady, | ||
}) | ||
} | ||
|
||
|
@@ -99,8 +146,6 @@ func (c *CertGenerator) generate(ctx context.Context) error { | |
return fmt.Errorf("%w: %v", errCertCheckFail, err) | ||
} | ||
if !certExist { | ||
c.fullServiceDomain = strings.Join([]string{c.webhookServiceName, c.namespace, "svc"}, ".") | ||
|
||
if err = c.createCert(); err != nil { | ||
return fmt.Errorf("%w: %v", errCreateCertFail, err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a requirement for a timeout? How does the user get to know that it is being waited for certs to be ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, timeout is 5 minutes.
The controller pod doesn't ready until the cert is ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, how will the user know that Katib controller is waiting for certs to get ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The controller doesn't say,
Waiting for certs to get ready
. So, there are two ways that users can indirectly know whether the controller waits for certs to get ready:Adding a log,
Waiting for certs to get ready
, to L72 in this file might be better.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a log is better because user can understand the reason behind the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.