Func deploy does not wait for autoscaler to start new nodes #2086

gabbler97 · 2023-11-17T15:08:08Z

I use Knative just as mentioned in the documentation:

 kn version
Version:      v1.12.0
Build Date:   2023-10-25 15:45:39
Git Revision: ae357368
Supported APIs:
* Serving
  - serving.knative.dev/v1 (knative-serving v1.12.0)
* Eventing
  - sources.knative.dev/v1 (knative-eventing v1.12.0)
  - eventing.knative.dev/v1 (knative-eventing v1.12.0)
func version
v0.38.0-99-gcd0bc6ae

https://knative.dev/docs/install/operator/knative-with-operators/
I installed Istio with Istioctl
I have EKS 1.26
I use cluster autoscaler
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws
I have one nodegroup without taints and another nodegroup which has taints (reserved-mynodes: true)
When I am deploying my functions and there are not enough resources in the cluster

cd hello
func --namespace my-ns deploy --registry my-registry-knative-test-go
Events:
  Type     Reason            Age              From                Message
  ----     ------            ----             ----                -------
  Warning  FailedScheduling  76s              default-scheduler   0/5 nodes are available: 1 Too many pods, 1 node(s) had untolerated taint {reserved-mynodes: true}, 4 Insufficient cpu. preemption: 0/5 nodes are available: 1 Preemption is not helpful for scheduling, 4 No preemption victims found for incoming pod..

Sometimes I just got timeout after 120s.

func --namespace my-ns deploy --registry my-registry/knative-test-go
Warning: function is in namespace 'my-ns', but requested namespace is 'my-ns'. Continuing with deployment to 'my-ns'.
Warning: namespace chosen is 'my-ns', but currently active namespace is 'default'. Continuing with deployment to 'my-ns'.
function up-to-date. Force rebuild with --build
Pushing function image to the registry "my-registry" using the "my-user" user credentials
⬆️  Deploying function to the cluster

Service output:

deploy error: knative deployer failed to wait for the Knative Service to become ready: timeout: service 'hello' not ready after 120 seconds
Error: knative deployer failed to wait for the Knative Service to become ready: timeout: service 'hello' not ready after 120 seconds

That is clearly caused by cluster autoscaler. It takes 2-3 minutes to bring up a new worker node if there are not enough resources in the cluster. After I create the function with failed state and the new node is there I can retry to deploy my functions without any issue.

func --namespace my-namespace deploy --registry my-artifactory/knative-test-go
Warning: namespace chosen is 'my-namespace', but currently active namespace is 'default'. Continuing with deployment to 'my-namespace'.
function up-to-date. Force rebuild with --build
Pushing function image to the registry "my-artifactory" using the "my-user" user credentials
⬆️  Deploying function to the cluster
✅ Function updated in namespace "my-namespace" and exposed at URL:
   http://hello.my-namespace.myhostname

How I am able to increase the timeout? I found no --timeout flag or something like this.
Should I find the solution by setting something in knative-eventing?
Thank you very much in advance!

The text was updated successfully, but these errors were encountered:

gabbler97 · 2023-12-11T08:01:40Z

Hello Everyone!
Any clue?

lkingland · 2024-01-10T03:11:39Z

Hello @gabbler97

I am sorry but this timeout is not currently configurable.

I will add this request to our open issues backlog.

I would post your question about knative serving in the CNCF Serving Slack channel. You might get some help there.

In addition to a simple --timeout option, I would prefer we were able to detect that a new node is being allocated, and inform the user; auto-increasing the timeout.

gabbler97 · 2024-01-16T15:12:18Z

Dear @lkingland ,
Thank you very much for your answer! :)

Sanket-0510 · 2024-02-19T00:19:33Z

Hey @lkingland, I think to achieve this we can configure the K8 client, thereby initializing a watcher over the nodes, and look for the events. If a new worker node is allocated, then increasing the timeout.

github-actions · 2024-05-19T01:27:52Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

lkingland added the kind/good-first-issue Denotes an issue ready for a new contributor. label Jan 10, 2024

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Func deploy does not wait for autoscaler to start new nodes #2086

Func deploy does not wait for autoscaler to start new nodes #2086

gabbler97 commented Nov 17, 2023 •

edited

gabbler97 commented Dec 11, 2023

lkingland commented Jan 10, 2024 •

edited

gabbler97 commented Jan 16, 2024

Sanket-0510 commented Feb 19, 2024

github-actions bot commented May 19, 2024

Func deploy does not wait for autoscaler to start new nodes #2086

Func deploy does not wait for autoscaler to start new nodes #2086

Comments

gabbler97 commented Nov 17, 2023 • edited

gabbler97 commented Dec 11, 2023

lkingland commented Jan 10, 2024 • edited

gabbler97 commented Jan 16, 2024

Sanket-0510 commented Feb 19, 2024

github-actions bot commented May 19, 2024

gabbler97 commented Nov 17, 2023 •

edited

lkingland commented Jan 10, 2024 •

edited