Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Func deploy does not wait for autoscaler to start new nodes #2086

Open
gabbler97 opened this issue Nov 17, 2023 · 5 comments
Open

Func deploy does not wait for autoscaler to start new nodes #2086

gabbler97 opened this issue Nov 17, 2023 · 5 comments
Labels
kind/good-first-issue Denotes an issue ready for a new contributor. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@gabbler97
Copy link

gabbler97 commented Nov 17, 2023

I use Knative just as mentioned in the documentation:

 kn version
Version:      v1.12.0
Build Date:   2023-10-25 15:45:39
Git Revision: ae357368
Supported APIs:
* Serving
  - serving.knative.dev/v1 (knative-serving v1.12.0)
* Eventing
  - sources.knative.dev/v1 (knative-eventing v1.12.0)
  - eventing.knative.dev/v1 (knative-eventing v1.12.0)
func version
v0.38.0-99-gcd0bc6ae

https://knative.dev/docs/install/operator/knative-with-operators/
I installed Istio with Istioctl
I have EKS 1.26
I use cluster autoscaler
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws
I have one nodegroup without taints and another nodegroup which has taints (reserved-mynodes: true)
When I am deploying my functions and there are not enough resources in the cluster

cd hello
func --namespace my-ns deploy --registry my-registry-knative-test-go
Events:
  Type     Reason            Age              From                Message
  ----     ------            ----             ----                -------
  Warning  FailedScheduling  76s              default-scheduler   0/5 nodes are available: 1 Too many pods, 1 node(s) had untolerated taint {reserved-mynodes: true}, 4 Insufficient cpu. preemption: 0/5 nodes are available: 1 Preemption is not helpful for scheduling, 4 No preemption victims found for incoming pod..

Sometimes I just got timeout after 120s.

func --namespace my-ns deploy --registry my-registry/knative-test-go
Warning: function is in namespace 'my-ns', but requested namespace is 'my-ns'. Continuing with deployment to 'my-ns'.
Warning: namespace chosen is 'my-ns', but currently active namespace is 'default'. Continuing with deployment to 'my-ns'.
function up-to-date. Force rebuild with --build
Pushing function image to the registry "my-registry" using the "my-user" user credentials
⬆️  Deploying function to the cluster

Service output:

deploy error: knative deployer failed to wait for the Knative Service to become ready: timeout: service 'hello' not ready after 120 seconds
Error: knative deployer failed to wait for the Knative Service to become ready: timeout: service 'hello' not ready after 120 seconds

That is clearly caused by cluster autoscaler. It takes 2-3 minutes to bring up a new worker node if there are not enough resources in the cluster. After I create the function with failed state and the new node is there I can retry to deploy my functions without any issue.

func --namespace my-namespace deploy --registry my-artifactory/knative-test-go
Warning: namespace chosen is 'my-namespace', but currently active namespace is 'default'. Continuing with deployment to 'my-namespace'.
function up-to-date. Force rebuild with --build
Pushing function image to the registry "my-artifactory" using the "my-user" user credentials
⬆️  Deploying function to the cluster
✅ Function updated in namespace "my-namespace" and exposed at URL:
   http://hello.my-namespace.myhostname

How I am able to increase the timeout? I found no --timeout flag or something like this.
Should I find the solution by setting something in knative-eventing?
Thank you very much in advance!

@gabbler97
Copy link
Author

Hello Everyone!
Any clue?

@lkingland
Copy link
Member

lkingland commented Jan 10, 2024

Hello @gabbler97

I am sorry but this timeout is not currently configurable.

I will add this request to our open issues backlog.

I would post your question about knative serving in the CNCF Serving Slack channel. You might get some help there.

In addition to a simple --timeout option, I would prefer we were able to detect that a new node is being allocated, and inform the user; auto-increasing the timeout.

@lkingland lkingland added the kind/good-first-issue Denotes an issue ready for a new contributor. label Jan 10, 2024
@gabbler97
Copy link
Author

Dear @lkingland ,
Thank you very much for your answer! :)

@Sanket-0510
Copy link
Contributor

Hey @lkingland, I think to achieve this we can configure the K8 client, thereby initializing a watcher over the nodes, and look for the events. If a new worker node is allocated, then increasing the timeout.

Copy link
Contributor

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/good-first-issue Denotes an issue ready for a new contributor. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
Status: 🔖 Next
Development

No branches or pull requests

3 participants