Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-13946: do not use one second timeout when asserting a webhook connection #1510

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
13 changes: 10 additions & 3 deletions pkg/operator/webhooksupportabilitycontroller/degraded_webhook.go
Expand Up @@ -22,6 +22,9 @@ type webhookInfo struct {
Service *serviceReference
CABundle []byte
FailurePolicyIsIgnore bool
// TimeoutSeconds specifies the timeout for a webhook.
// After the timeout passes, the webhook call will be ignored or the API call will fail
TimeoutSeconds *int32
}

// serviceReference generically represents a service reference
Expand Down Expand Up @@ -49,7 +52,7 @@ func (c *webhookSupportabilityController) updateWebhookConfigurationDegraded(ctx
serviceMsgs = append(serviceMsgs, msg)
continue
}
err = c.assertConnect(ctx, webhook.Name, webhook.Service, webhook.CABundle)
err = c.assertConnect(ctx, webhook.Name, webhook.Service, webhook.CABundle, webhook.TimeoutSeconds)
if err != nil {
msg := fmt.Sprintf("%s: %s", webhook.Name, err)
if webhook.FailurePolicyIsIgnore {
Expand Down Expand Up @@ -94,7 +97,7 @@ func (c *webhookSupportabilityController) assertService(reference *serviceRefere
}

// assertConnect performs a dns lookup of service, opens a tcp connection, and performs a tls handshake.
func (c *webhookSupportabilityController) assertConnect(ctx context.Context, webhookName string, reference *serviceReference, caBundle []byte) error {
func (c *webhookSupportabilityController) assertConnect(ctx context.Context, webhookName string, reference *serviceReference, caBundle []byte, webhookTimeoutSeconds *int32) error {
host := reference.Name + "." + reference.Namespace + ".svc"
port := "443"
if reference.Port != nil {
Expand All @@ -104,6 +107,10 @@ func (c *webhookSupportabilityController) assertConnect(ctx context.Context, web
if len(caBundle) > 0 {
rootCAs.AppendCertsFromPEM(caBundle)
}
timeout := 10 * time.Second
if webhookTimeoutSeconds != nil {
timeout = time.Duration(*webhookTimeoutSeconds) * time.Second
}
// the last error that occurred in the loop below
var err error
// retry up to 3 times on error
Expand All @@ -114,7 +121,7 @@ func (c *webhookSupportabilityController) assertConnect(ctx context.Context, web
case <-time.After(time.Duration(i) * time.Second):
}
dialer := &tls.Dialer{
NetDialer: &net.Dialer{Timeout: 1 * time.Second},
NetDialer: &net.Dialer{Timeout: timeout},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timeout only covers the TCP connect. Since we're using a TLS dialer here, I expect we intend timeouts to cover the handshake too. May need to move the timeout into a context deadline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout applies to connection and TLS handshake as a whole.
See https://github.com/golang/go/blob/master/src/crypto/tls/tls.go#L123

Config: &tls.Config{
ServerName: host,
RootCAs: rootCAs,
Expand Down
Expand Up @@ -27,6 +27,7 @@ func (c *webhookSupportabilityController) updateMutatingAdmissionWebhookConfigur
Name: webhook.Name,
CABundle: webhook.ClientConfig.CABundle,
FailurePolicyIsIgnore: webhook.FailurePolicy != nil && *webhook.FailurePolicy == admissionregistrationv1.Ignore,
TimeoutSeconds: webhook.TimeoutSeconds,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to perform defaulting here rather than as part of every check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaulting is cheap. In the future we might consider adding some logs.
In addition to that it is consistent with defaulting the port.

See

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'm not worried about the cost, I think it would be a tidier separation of responsibility. Since we already translate the API object into an internal representation for use by the dial probes, it didn't make sense to me that the internal representation (i.e. webhookInfo) wasn't directly usable by the dial probe.

It's only my preference and I'm satisfied with consistency too.

}
if webhook.ClientConfig.Service != nil {
info.Service = &serviceReference{
Expand Down Expand Up @@ -58,6 +59,7 @@ func (c *webhookSupportabilityController) updateValidatingAdmissionWebhookConfig
Name: webhook.Name,
CABundle: webhook.ClientConfig.CABundle,
FailurePolicyIsIgnore: webhook.FailurePolicy != nil && (*webhook.FailurePolicy == v1.Ignore),
TimeoutSeconds: webhook.TimeoutSeconds,
}

if webhook.ClientConfig.Service != nil {
Expand Down