-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout issue on long requests #12564
Comments
Hi @nicolasmingo!
Could you clarify what you mean here a bit? When the revision deployment is created what is copied is the revision's timeoutSeconds field (more on this bellow):
To clarify a few points. Copying from config-defaults cm: serving/config/core/configmaps/defaults.yaml Lines 40 to 56 in 1d95294
Could you try update the default configs first to allow a larger maximum if you need to and then set a revision timeoutSeconds accordingly? Also dont forget about activator's grace period. |
thanks @skonto. your solution works for me. |
Hi, thank you for your answers. I have continued to inspect and I would like to share what I see.
Maybe knative is only inspecting the response flux to detect timeout? |
Hi, for the istio issue it seems that at some point the connection is rejected due to a malformed http request. Could you enable debug logs at the istio side and post them? Also let's create two separate issues to discuss further (bullets 2,3). |
Hi, I've found the issue. It is not related to istio, we try the same code without knative and it has worked. I think it is related to the incompatibility to do bidirectional streaming in HTTP/1.1: At the origin, I would like to do long requests with upload stream - the only way to do that (without bidirectional streaming) is to increase timeout to 3600s because the http feedback can arrive very late. If I configure 3600s to timeout I have another problem in Knative: my pod will stay at Terminating status during 3600s. If I have a way to override the terminationGracePeriod by a custom value (instead of timeoutSeconds), so I will reach my goal. |
This issue is stale because it has been open for 90 days with no |
@skonto thank you so much for your clarification on the timeout mechanism! |
/remove-lifecycle stale |
You might also want to look into the async component. |
Thanks for the reply @psschwei! Ideally, we should be using async requests for this. However, the framework we're using requires completion of the request hence the long timeout :) |
Just to clarify my issue is that I have a request that needs 20 min to complete and after editing the timeoutSeconds I no longer get the timeout error but now my Pods are stuck on terminating, which did not happen before setting this flag. Is there a way to terminate the pods faster, i.e., allow for a custom timeout and a timely termination of the pods? |
I think once #12970 lands you may be able to use the max duration without needing to tweak the termination grace period. |
This issue is stale because it has been open for 90 days with no |
This issue or pull request is stale because it has been open for 90 days with no activity. This bot triages issues and PRs according to the following rules:
You can:
/lifecycle stale |
Hello everyone. I am also interested in setting I am using the operator to install Knative Serving: apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
name: knative-serving
namespace: knative-serving
spec:
version: 1.8.0 # Same version as the operator
config:
defaults:
max-revision-timeout-seconds: "7200" # 120 minutes
domain:
kn.example.com: "" # your DNS
features:
kubernetes.podspec-persistent-volume-claim: "enabled"
kubernetes.podspec-persistent-volume-write: "enabled"
kubernetes.podspec-affinity: "enabled"
kubernetes.podspec-tolerations: "enabled"
kubernetes.podspec-init-containers: "enabled"
istio:
gateway.knative-serving.knative-ingress-gateway: istio-ingressgateway.istio-ingress.svc.cluster.local Does anyone know how to achieve this? |
One way is to just edit the Activator deployment, i.e. cc @houshengbo for if there's a way to do that in the operator |
I did try editing the activator deployment manually. It updates fine, but then immediately gets reverted back. I am assuming the operator does this. |
In that case, since it's operator-specific, let's move the discussion over there... I opened knative/operator#1295 with your question. |
Hello @psschwei
It looks like the termination grace period is still taken from I'm one of those who wants to handle long requests :) Thanks in advance for help! |
Ok, in that case I think the alternative then as things stands would be to see if the async component might fit your needs. |
From the history above, I got:
Seems like no more points are open, thus I'm closing this. Feel free to reopen, if I'm mistaking. /close |
@ReToCode: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Is anybody seeing issues with the latest version of knative? 1.12
Updated activator as well. Validated the max is getting picked up. However, I am still seeing timeouts at exactly 10 minutes. |
/area networking
Hi, I'm trying to use knative functions to make streaming http requests.
It works well but I have an issue with long requests.
After 5 minutes I receive a timeout error on queue-proxy and my request is closed.
I've tried some solutions but I had other issues:
Modify the config-map named "config-defaults" (on knative-serving namespace)
It works well BUT when the pod wants to be terminating, he is stuck in Terminating status.
I've discovered that knative copied max-revision-timeout-seconds to terminationGracePeriodSeconds.
So, I've tried to patch my knative ksvc yaml to override the terminationGracePeriodSeconds parameter (normally it is in a PodSpec), but it's seems impossible to change it through knative.
Can you give me information on how can I set up the configuration for my need please ?
Nicolas
The text was updated successfully, but these errors were encountered: