Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds request retry attributes to HTTPRoute API Type #184

Closed
wants to merge 1 commit into from

Conversation

danehans
Copy link
Contributor

@danehans danehans commented May 7, 2020

Extends ForwardToTarget of type HTTPRoute to include per-target request timeouts and retries.

Partially fixes #58.

/assign @bowei
/cc @Miciah @robscott

@k8s-ci-robot
Copy link
Contributor

@danehans: GitHub didn't allow me to request PR reviews from the following users: Miciah.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Extends ForwardToTarget of type HTTPRoute to include per-target request timeouts and retries.

Partially fixes #58.

/assign @bowei
/cc @Miciah @robscott

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 7, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: danehans
To complete the pull request process, please assign bowei
You can assign the PR to them by writing /assign @bowei in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 7, 2020
// Support: Core
//
// +optional
Timeout *metav1.Duration `json:"timeout,omitempty" protobuf:"bytes,3,opt,name=timeout"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is read timeout, should we rename this to ReadTimeout?
We should be explicit here if this is TTFB (time to first byte) or not. It seems most proxies have this meaning.

Do we also need Connect Timeout for time to define timeout for the TCP handshake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL at my latest changes and let me know if you feel this is a good starting point for connection timeouts and retries.

// +optional
Timeout *metav1.Duration `json:"timeout,omitempty" protobuf:"bytes,3,opt,name=timeout"`

// RetryPolicy describes a policy for resending a failed request to a targetRef.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What defines a request as failed?
Will it be an L4 failure or L7 failure (idempotent HTTP methods)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the docs. A connection is considered failed when the connection is either refused (5xx) or times out.t

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2020
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 13, 2020
@danehans
Copy link
Contributor Author

@hbagdi I just pushed b76d37b that addresses some of your feedback, PTAL.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 19, 2020
@k8s-ci-robot
Copy link
Contributor

@danehans: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@robscott
Copy link
Member

I think this may overlap with @bowei's Service level config proposal.

@hbagdi
Copy link
Contributor

hbagdi commented Sep 10, 2020

@danehans Should we keep this open or close this and track it as part of #196 and revisit post v1alpha1?

@@ -264,6 +264,59 @@ type ForwardToTarget struct {
//
// +optional
TargetPort *TargetPort `json:"targetPort" protobuf:"bytes,2,opt,name=targetPort"`

// TimeoutPolicy describes a policy for configuring connection timeouts to targetRef.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how much we want to cater to specific implementations but I don't think Envoy supports this, specifically the timeout and retry config is at the route level, so you cannot set it differently for each weighted cluster destination.

@danehans
Copy link
Contributor Author

danehans commented Dec 1, 2020

@hbagdi hbagdi added this to the v1alpha2 milestone Dec 3, 2020
//
// Support: Core
//
Response metav1.Duration `json:"response" protobuf:"bytes,1,opt,name=response"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an inactivity timeout or a deadline to complete the transfer? The field's description is suggests it is a deadline (for example, if the response is large, and it takes too long, a proxy may terminate the transfer even while data are being transferred), in which case it should be named as follows:

Suggested change
Response metav1.Duration `json:"response" protobuf:"bytes,1,opt,name=response"`
ResponseDeadlineSeconds metav1.Duration `json:"response" protobuf:"bytes,1,opt,name=responseDeadlineSeconds"`

On the other hand, if this is an inactivity timeout (for example, if the server doesn't send any data within this amount of time, a proxy may terminate the connection), then it should be named as follows:

Suggested change
Response metav1.Duration `json:"response" protobuf:"bytes,1,opt,name=response"`
ResponseTimeoutSeconds metav1.Duration `json:"response" protobuf:"bytes,1,opt,name=responseTimeoutSeconds"`

See https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#naming-conventions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, maybe "Seconds" should be omitted for metav1.Duration, which is unmarshalled using time.ParseDuration and thus accepts time-unit suffixes (https://github.com/kubernetes/apimachinery/blob/2456ebdaba229616fab2161a615148884b46644b/pkg/apis/meta/v1/duration.go#L31-L45). Oddly, I cannot find any APIs in k8s.io/api that use metav1.Duration. Do you have examples of Kubernetes APIs that do use metav1.Duration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, https://github.com/kubernetes/cloud-provider/blob/master/config/types.go has some examples of fields that use metav1.Duration.

So I think the field should be named either responseDeadline or responseTimeout.

//
// Support: Core
//
Idle metav1.Duration `json:"idle" protobuf:"bytes,2,opt,name=idle"`
Copy link
Contributor

@Miciah Miciah Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Idle metav1.Duration `json:"idle" protobuf:"bytes,2,opt,name=idle"`
IdleTimeout metav1.Duration `json:"idle" protobuf:"bytes,2,opt,name=idleTimeout"`

//
// Support: Core
//
Interval metav1.Duration `json:"interval" protobuf:"bytes,2,opt,name=interval"`
Copy link
Contributor

@Miciah Miciah Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Interval metav1.Duration `json:"interval" protobuf:"bytes,2,opt,name=interval"`
retryPeriod metav1.Duration `json:"interval" protobuf:"bytes,2,opt,name=retryPeriod"`

@k8s-ci-robot
Copy link
Contributor

@danehans: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-service-apis-test b76d37b link /test pull-service-apis-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@@ -658,6 +672,24 @@ message LocalObjectReference {
optional string name = 3;
}

// RetryPolicy describes a policy for retrying a connection until the
// connection is either refused or times out.
Copy link
Member

@nak3 nak3 Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Envoy has more fine-grained policy setting.

And Knative (net-contour) leverages it - https://github.com/knative-sandbox/net-contour/blob/42ebe377c7c9d1e62fe4265ff3ef0cee285e8be3/pkg/reconciler/contour/resources/httpproxy.go#L80-L95
So it would be great if we could have some optional field to customize it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we would like to know the default retry policy.

@robscott robscott added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 14, 2021
@danehans
Copy link
Contributor Author

From 4/14/21 community meeting:

  • Retries are not as portable as heath-checking and timeout connection policies.
  • Consensus around health-check, timeout, retry, etc. policies residing in BackEndPolicy while the ServicePolicy design.

@robscott robscott removed this from the v0.3.0 milestone Apr 23, 2021
@robscott
Copy link
Member

I think that with GEP #713 we've determined that this is a better fit for policy attachment than being built into the route. I'm going to close this in favor of that approach but anyone can feel free to reopen if needed.

/close

@k8s-ci-robot
Copy link
Contributor

@robscott: Closed this PR.

In response to this:

I think that with GEP #713 we've determined that this is a better fit for policy attachment than being built into the route. I'm going to close this in favor of that approach but anyone can feel free to reopen if needed.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Backend Object Traffic Manipulation
8 participants