Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change transport of direct client for customized MaxIdleConns #8568

Merged
merged 2 commits into from
Jul 8, 2020

Conversation

xliuxu
Copy link
Contributor

@xliuxu xliuxu commented Jul 7, 2020

Currently we noticed the CPU usage for autoscaler is high when there are hundreds of pods running in cluster. From the CPU profling we can tell that it is spending on dialing for connections.

      flat  flat%   sum%        cum   cum%
         0     0%     0%      0.81s 19.61%  net/http.(*Transport).dialConn
         0     0%     0%      0.81s 19.61%  net/http.(*Transport).dialConnFor
         0     0%     0%      0.75s 18.16%  runtime.mcall
         0     0%     0%      0.74s 17.92%  net.(*Dialer).DialContext
         0     0%     0%      0.74s 17.92%  net/http.(*Transport).dial
     0.01s  0.24%  0.24%      0.74s 17.92%  runtime.schedule
         0     0%  0.24%      0.72s 17.43%  golang.org/x/sync/errgroup.(*Group).Go.func1
     0.70s 16.95% 17.19%      0.72s 17.43%  syscall.Syscall
         0     0% 17.19%      0.70s 16.95%  knative.dev/serving/pkg/autoscaler/metrics.(*serviceScraper).scrapePods.func1
         0     0% 17.19%      0.68s 16.46%  knative.dev/serving/pkg/autoscaler/metrics.(*httpScrapeClient).Scrape

The optimization in #8367 of @julz works but the default value 100 of MaxIdleConns is too small for pod scraping. The customized transport config from pkg/network with 1000 of MaxIdleConns should works better. After applying this change I see a significant CPU drop ( 54% -> 36%) with 300 pods running in cluster.

image

Release Note

None

Signed-off-by: Lance Liu <xuliuxl@cn.ibm.com>
@knative-prow-robot knative-prow-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 7, 2020
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jul 7, 2020
@knative-prow-robot
Copy link
Contributor

Welcome @lanceliuu! It looks like this is your first PR to knative/serving 🎉

@knative-prow-robot
Copy link
Contributor

Hi @lanceliuu. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/autoscale labels Jul 7, 2020
@markusthoemmes
Copy link
Contributor

/ok-to-test

/hold
until after the release. Caution and stuff.

@knative-prow-robot knative-prow-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 7, 2020
Signed-off-by: Lance Liu <xuliuxl@cn.ibm.com>
@xliuxu xliuxu requested review from julz and vagababov July 8, 2020 00:05
Copy link
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@knative-prow-robot knative-prow-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 8, 2020
Copy link
Member

@julz julz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/unhold

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 8, 2020
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lanceliuu, markusthoemmes, vagababov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/autoscale cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants