Timeout issue on long requests #12564

nicolasmingo · 2022-01-27T15:13:50Z

/area networking

Hi, I'm trying to use knative functions to make streaming http requests.
It works well but I have an issue with long requests.
After 5 minutes I receive a timeout error on queue-proxy and my request is closed.

I've tried some solutions but I had other issues:
Modify the config-map named "config-defaults" (on knative-serving namespace)

max-revision-timeout-seconds: '21600' #6 hours
revision-timeout-seconds: '21600' # 6 hours

It works well BUT when the pod wants to be terminating, he is stuck in Terminating status.
I've discovered that knative copied max-revision-timeout-seconds to terminationGracePeriodSeconds.
So, I've tried to patch my knative ksvc yaml to override the terminationGracePeriodSeconds parameter (normally it is in a PodSpec), but it's seems impossible to change it through knative.

Can you give me information on how can I set up the configuration for my need please ?

Nicolas

The text was updated successfully, but these errors were encountered:

skonto · 2022-02-03T14:24:24Z

Hi @nicolasmingo!

I've discovered that knative copied max-revision-timeout-seconds to terminationGracePeriodSeconds.

Could you clarify what you mean here a bit? When the revision deployment is created what is copied is the revision's timeoutSeconds field (more on this bellow):

serving/pkg/reconciler/revision/resources/deploy.go

Line 208 in 1d95294

pod.TerminationGracePeriodSeconds = rev.Spec.TimeoutSeconds

So, I've tried to patch my knative ksvc yaml to override the terminationGracePeriodSeconds parameter (normally it is in a PodSpec)

To clarify a few points. Copying from config-defaults cm:

serving/config/core/configmaps/defaults.yaml

Lines 40 to 56 in 1d95294

    
               # These sample configuration options may be copied out of 
        
               # this example block and unindented to be in the data block 
        
               # to actually change the configuration. 
        
               # revision-timeout-seconds contains the default number of 
        
               # seconds to use for the revision's per-request timeout, if 
        
               # none is specified. 
        
               revision-timeout-seconds: "300"  # 5 minutes 
        
               # max-revision-timeout-seconds contains the maximum number of 
        
               # seconds that can be used for revision-timeout-seconds. 
        
               # This value must be greater than or equal to revision-timeout-seconds. 
        
               # If omitted, the system default is used (600 seconds). 
        
               # 
        
               # If this value is increased, the activator's terminationGraceTimeSeconds 
        
               # should also be increased to prevent in-flight requests being disrupted. 
        
               max-revision-timeout-seconds: "600"  # 10 minutes

Each revision will get a 5min default value if nothing is specified in its related timeoutSeconds field. https://github.com/knative/serving/blob/main/pkg/apis/serving/v1/revision_defaults.go#L53
You can find more about revision config here: https://github.com/knative/serving/blob/main/docs/serving-api.md#revision. Recently we added a maxDurationSeconds field which is wip and will apply per revision (defaults will be introduced in the next release).
Each revision timeout will get validated against the maximum revision timeout (default 10min).

serving/pkg/apis/serving/v1/revision_validation.go

Line 171 in 1d95294

if timeoutSeconds > cfg.Defaults.MaxRevisionTimeoutSeconds || timeoutSeconds < 0 {

Activator's terminationGracePeriod needs to be updated accordingly if you modify defaults because requests may go through activator (10min), check here:

serving/config/core/deployments/activator.yaml

Lines 117 to 124 in 1d95294

    
           # The activator (often) sits on the dataplane, and may proxy long (e.g. 
        
           # streaming, websockets) requests.  We give a long grace period for the 
        
           # activator to "lame duck" and drain outstanding requests before we 
        
           # forcibly terminate the pod (and outstanding connections).  This value 
        
           # should be at least as large as the upper bound on the Revision's 
        
           # timeoutSeconds property to avoid servicing events disrupting 
        
           # connections. 
        
           terminationGracePeriodSeconds: 600

Could you try update the default configs first to allow a larger maximum if you need to and then set a revision timeoutSeconds accordingly? Also dont forget about activator's grace period.

little-eyes · 2022-03-07T07:54:39Z

thanks @skonto. your solution works for me.

nicolasmingo · 2022-03-29T09:35:28Z

Hi, thank you for your answers.

I have continued to inspect and I would like to share what I see.

If I request a function that sends to me (as a http response) a streaming flux which takes more than 15Minutes: it works.
If I request a function that receives (in a http request) a streaming flux (that I send) and sends me back a result after 15minutes, the connection is cut (depends on timeoutSeconds value) even if i'm sending data (in upload) during this time.
If I send and receive flux at the same time, it works but sometimes I have other problems with envoy-istio regarding reliability (upstream_reset_after_response_started{protocol_error})

Maybe knative is only inspecting the response flux to detect timeout?
Do you know how the second part can work in knative

skonto · 2022-03-31T09:18:09Z

Hi, for the istio issue it seems that at some point the connection is rejected due to a malformed http request. Could you enable debug logs at the istio side and post them? Also let's create two separate issues to discuss further (bullets 2,3).
@nak3 may have more to add here.

nicolasmingo · 2022-03-31T11:40:44Z

Hi, I've found the issue. It is not related to istio, we try the same code without knative and it has worked.

I think it is related to the incompatibility to do bidirectional streaming in HTTP/1.1:
#6146

At the origin, I would like to do long requests with upload stream - the only way to do that (without bidirectional streaming) is to increase timeout to 3600s because the http feedback can arrive very late. If I configure 3600s to timeout I have another problem in Knative: my pod will stay at Terminating status during 3600s. If I have a way to override the terminationGracePeriod by a custom value (instead of timeoutSeconds), so I will reach my goal.

github-actions · 2022-06-30T01:29:22Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

henriquevasconcelos · 2022-07-27T18:06:45Z

@skonto thank you so much for your clarification on the timeout mechanism!
Is it possible to configure Knative to have no timeout on requests (or a timeout around 1h) and a TerminationGracePeriodSeconds: 0?

henriquevasconcelos · 2022-07-27T18:08:09Z

/remove-lifecycle stale

psschwei · 2022-07-27T18:32:22Z

You might also want to look into the async component.

cc @AngeloDanducci

henriquevasconcelos · 2022-07-27T19:11:07Z

You might also want to look into the async component.

cc @AngeloDanducci

Thanks for the reply @psschwei!

Ideally, we should be using async requests for this. However, the framework we're using requires completion of the request hence the long timeout :)

henriquevasconcelos · 2022-07-29T13:46:22Z

Just to clarify my issue is that I have a request that needs 20 min to complete and after editing the timeoutSeconds I no longer get the timeout error but now my Pods are stuck on terminating, which did not happen before setting this flag.

Is there a way to terminate the pods faster, i.e., allow for a custom timeout and a timely termination of the pods?

psschwei · 2022-08-01T17:14:10Z

Is there a way to terminate the pods faster, i.e., allow for a custom timeout and a timely termination of the pods?

I think once #12970 lands you may be able to use the max duration without needing to tweak the termination grace period.

github-actions · 2022-10-31T01:30:26Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

knative-prow-robot · 2022-11-30T02:19:17Z

This issue or pull request is stale because it has been open for 90 days with no activity.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close

/lifecycle stale

mkyrilov · 2022-12-08T20:31:37Z

Hello everyone.

I am also interested in setting max-revision-timeout-seconds to a larger value. However I can't find how to update the terminationGracePeriod in activator.

I am using the operator to install Knative Serving:

apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
spec:
  version: 1.8.0 # Same version as the operator
  config:
    defaults:
      max-revision-timeout-seconds: "7200" # 120 minutes
    domain:
      kn.example.com: "" # your DNS
    features:
      kubernetes.podspec-persistent-volume-claim: "enabled"
      kubernetes.podspec-persistent-volume-write: "enabled"
      kubernetes.podspec-affinity: "enabled"
      kubernetes.podspec-tolerations: "enabled"
      kubernetes.podspec-init-containers: "enabled"
    istio:
      gateway.knative-serving.knative-ingress-gateway: istio-ingressgateway.istio-ingress.svc.cluster.local

Does anyone know how to achieve this?

psschwei · 2022-12-08T20:41:37Z

One way is to just edit the Activator deployment, i.e. kubectl edit deploy -n knative-serving activator and then update the terminationGracePeriodSeconds field.

cc @houshengbo for if there's a way to do that in the operator

mkyrilov · 2022-12-08T20:45:13Z

I did try editing the activator deployment manually. It updates fine, but then immediately gets reverted back. I am assuming the operator does this.

psschwei · 2022-12-08T20:59:37Z

In that case, since it's operator-specific, let's move the discussion over there... I opened knative/operator#1295 with your question.

Kaiass · 2022-12-22T16:21:57Z

Hello @psschwei

I think once #12970 lands you may be able to use the max duration without needing to tweak the termination grace period.

It looks like the termination grace period is still taken from timeoutSeconds which makes impossible to specify them separately.

I'm one of those who wants to handle long requests :)
Despite our requests may take many minutes we don't want to wait for the same time until the pod is terminated.

Thanks in advance for help!

psschwei · 2023-01-09T17:48:00Z

Ok, in that case I think the alternative then as things stands would be to see if the async component might fit your needs.

ReToCode · 2023-02-15T12:06:19Z

From the history above, I got:

The initial question was resolved
The operator configuration is handled in Timeout issue on long requests #12564 (comment)
Kaiass use-case (Timeout issue on long requests #12564 (comment)) is better handled by the async component.

Seems like no more points are open, thus I'm closing this. Feel free to reopen, if I'm mistaking.

/close

knative-prow · 2023-02-15T12:06:22Z

@ReToCode: Closing this issue.

In response to this:

From the history above, I got:

The initial question was resolved

The operator configuration is handled in Timeout issue on long requests #12564 (comment)

Kaiass use-case (Timeout issue on long requests #12564 (comment)) is better handled by the async component.

Seems like no more points are open, thus I'm closing this. Feel free to reopen, if I'm mistaking.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

nkreiger · 2023-12-14T00:56:11Z

Is anybody seeing issues with the latest version of knative? 1.12

everity":"ERROR","timestamp":"2023-12-14T00:14:02.568534211Z","logger":"queueproxy","caller":"network/error_handler.go:33","message":"error reverse proxying request; sockstat: sockets: used 10\nTCP: inuse 2 orphan 3 tw 14 alloc 759 mem 253\nUDP: inuse 0 mem 0\nUDPLITE: inuse 0\nRAW: inuse 0\nFRAG: inuse 0 memory 0\n","commit":"2659cc3","knative.dev/key":"dxcm/plugin-github-00003","knative.dev/pod":"plugin-github-00003-deployment-7f9f764f87-8jvt2","error":"context canceled","stacktrace":"knative.dev/pkg/network.ErrorHandler.func1\n\tknative.dev/pkg@v0.0.0-20231023151236-29775d7c9e5c/network/error_handler.go:33\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\tnet/http/httputil/reverseproxy.go:475\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\n\tknative.dev/serving/pkg/queue/request_metric.go:199\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\n\tknative.dev/serving/pkg/queue/handler.go:76\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\n\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\n\tknative.dev/serving/pkg/http/handler/timeout.go:118"}
{"severity":"ERROR","timestamp":"2023-12-14T00:24:04.591008104Z","logger":"queueproxy","caller":"network/error_handler.go:33","message":"error reverse proxying request; sockstat: sockets: used 10\nTCP: inuse 2 orphan 3 tw 12 alloc 595 mem 230\nUDP: inuse 0 mem 0\nUDPLITE: inuse 0\nRAW: inuse 0\nFRAG: inuse 0 memory 0\n","commit":"2659cc3","knative.dev/key":"dxcm/plugin-github-00003","knative.dev/pod":"plugin-github-00003-deployment-7f9f764f87-8jvt2","error":"context canceled","stacktrace":"knative.dev/pkg/network.ErrorHandler.func1\n\tknative.dev/pkg@v0.0.0-20231023151236-29775d7c9e5c/network/error_handler.go:33\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\tnet/http/httputil/reverseproxy.go:475\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\n\tknative.dev/serving/pkg/queue/request_metric.go:199\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\n\tknative.dev/serving/pkg/queue/handler.go:76\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\n\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\n\tknative.dev/serving/pkg/http/handler/timeout.go:118"}
{"severity":"ERROR","timestamp":"2023-12-14T00:34:08.632324508Z","logger":"queueproxy","caller":"network/error_handler.go:33","message":"error reverse proxying request; sockstat: sockets: used 10\nTCP: inuse 2 orphan 2 tw 10 alloc 602 mem 204\nUDP: inuse 0 mem 0\nUDPLITE: inuse 0\nRAW: inuse 0\nFRAG: inuse 0 memory 0\n","commit":"2659cc3","knative.dev/key":"dxcm/plugin-github-00003","knative.dev/pod":"plugin-github-00003-deployment-7f9f764f87-8jvt2","error":"context canceled","stacktrace":"knative.dev/pkg/network.ErrorHandler.func1\n\tknative.dev/pkg@v0.0.0-20231023151236-29775d7c9e5c/network/error_handler.go:33\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\tnet/http/httputil/reverseproxy.go:475\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\n\tknative.dev/serving/pkg/queue/request_metric.go:199\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\n\tknative.dev/serving/pkg/queue/handler.go:76\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\n\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\n\tknative.dev/serving/pkg/http/handler/timeout.go:118"}

apiVersion: v1
data:
  enable-service-links: "false"
  max-revision-timeout-seconds: "60000"
  revision-timeout-seconds: "60000"
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"enable-service-links":"false","max-revision-timeout-seconds":"60000","revision-timeout-seconds":"60000"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"config-defaults","namespace":"knative-serving"}}
  creationTimestamp: "2023-11-28T16:59:46Z"
  name: config-defaults
  namespace: knative-serving
  resourceVersion: "24361043"
  uid: 2496e813-166c-4756-aa7f-f1a83f946cdd

Updated activator as well. Validated the max is getting picked up. However, I am still seeing timeouts at exactly 10 minutes.

nicolasmingo added the kind/question Further information is requested label Jan 27, 2022

knative-prow-robot added the area/networking label Jan 27, 2022

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 30, 2022

knative-prow bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2022

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2022

github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2022

psschwei mentioned this issue Dec 8, 2022

Updating Activator's terminationGracePeriodSeconds value knative/operator#1295

Open

knative-prow bot closed this as completed Feb 15, 2023

dspeck1 mentioned this issue May 22, 2024

Requests sent to terminating pods #15211

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout issue on long requests #12564

Timeout issue on long requests #12564

nicolasmingo commented Jan 27, 2022 •

edited

Loading

skonto commented Feb 3, 2022

little-eyes commented Mar 7, 2022

nicolasmingo commented Mar 29, 2022

skonto commented Mar 31, 2022

nicolasmingo commented Mar 31, 2022 •

edited

Loading

github-actions bot commented Jun 30, 2022

henriquevasconcelos commented Jul 27, 2022

henriquevasconcelos commented Jul 27, 2022

psschwei commented Jul 27, 2022

henriquevasconcelos commented Jul 27, 2022 •

edited

Loading

henriquevasconcelos commented Jul 29, 2022

psschwei commented Aug 1, 2022

github-actions bot commented Oct 31, 2022

knative-prow-robot commented Nov 30, 2022

mkyrilov commented Dec 8, 2022

psschwei commented Dec 8, 2022

mkyrilov commented Dec 8, 2022

psschwei commented Dec 8, 2022

Kaiass commented Dec 22, 2022

psschwei commented Jan 9, 2023

ReToCode commented Feb 15, 2023

knative-prow bot commented Feb 15, 2023

nkreiger commented Dec 14, 2023

Timeout issue on long requests #12564

Timeout issue on long requests #12564

Comments

nicolasmingo commented Jan 27, 2022 • edited Loading

skonto commented Feb 3, 2022

little-eyes commented Mar 7, 2022

nicolasmingo commented Mar 29, 2022

skonto commented Mar 31, 2022

nicolasmingo commented Mar 31, 2022 • edited Loading

github-actions bot commented Jun 30, 2022

henriquevasconcelos commented Jul 27, 2022

henriquevasconcelos commented Jul 27, 2022

psschwei commented Jul 27, 2022

henriquevasconcelos commented Jul 27, 2022 • edited Loading

henriquevasconcelos commented Jul 29, 2022

psschwei commented Aug 1, 2022

github-actions bot commented Oct 31, 2022

knative-prow-robot commented Nov 30, 2022

mkyrilov commented Dec 8, 2022

psschwei commented Dec 8, 2022

mkyrilov commented Dec 8, 2022

psschwei commented Dec 8, 2022

Kaiass commented Dec 22, 2022

psschwei commented Jan 9, 2023

ReToCode commented Feb 15, 2023

knative-prow bot commented Feb 15, 2023

nkreiger commented Dec 14, 2023

nicolasmingo commented Jan 27, 2022 •

edited

Loading

nicolasmingo commented Mar 31, 2022 •

edited

Loading

henriquevasconcelos commented Jul 27, 2022 •

edited

Loading