Change transport of direct client for customized MaxIdleConns #8568

xliuxu · 2020-07-07T10:44:54Z

Currently we noticed the CPU usage for autoscaler is high when there are hundreds of pods running in cluster. From the CPU profling we can tell that it is spending on dialing for connections.

      flat  flat%   sum%        cum   cum%
         0     0%     0%      0.81s 19.61%  net/http.(*Transport).dialConn
         0     0%     0%      0.81s 19.61%  net/http.(*Transport).dialConnFor
         0     0%     0%      0.75s 18.16%  runtime.mcall
         0     0%     0%      0.74s 17.92%  net.(*Dialer).DialContext
         0     0%     0%      0.74s 17.92%  net/http.(*Transport).dial
     0.01s  0.24%  0.24%      0.74s 17.92%  runtime.schedule
         0     0%  0.24%      0.72s 17.43%  golang.org/x/sync/errgroup.(*Group).Go.func1
     0.70s 16.95% 17.19%      0.72s 17.43%  syscall.Syscall
         0     0% 17.19%      0.70s 16.95%  knative.dev/serving/pkg/autoscaler/metrics.(*serviceScraper).scrapePods.func1
         0     0% 17.19%      0.68s 16.46%  knative.dev/serving/pkg/autoscaler/metrics.(*httpScrapeClient).Scrape

The optimization in #8367 of @julz works but the default value 100 of MaxIdleConns is too small for pod scraping. The customized transport config from pkg/network with 1000 of MaxIdleConns should works better. After applying this change I see a significant CPU drop ( 54% -> 36%) with 300 pods running in cluster.

Release Note

None

Signed-off-by: Lance Liu <xuliuxl@cn.ibm.com>

knative-prow-robot · 2020-07-07T10:45:07Z

Welcome @lanceliuu! It looks like this is your first PR to knative/serving 🎉

knative-prow-robot · 2020-07-07T10:45:07Z

Hi @lanceliuu. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

markusthoemmes · 2020-07-07T11:18:07Z

/ok-to-test

/hold
until after the release. Caution and stuff.

pkg/autoscaler/metrics/stats_scraper.go

Signed-off-by: Lance Liu <xuliuxl@cn.ibm.com>

vagababov

/lgtm
/approve

julz

/lgtm

markusthoemmes

/lgtm
/approve
/unhold

knative-prow-robot · 2020-07-08T08:51:52Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lanceliuu, markusthoemmes, vagababov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/autoscaler/OWNERS~~ [markusthoemmes,vagababov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Change scraper transport for customized MaxIdleConns

9ca0380

Signed-off-by: Lance Liu <xuliuxl@cn.ibm.com>

knative-prow-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 7, 2020

googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jul 7, 2020

knative-prow-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/autoscale labels Jul 7, 2020

knative-prow-robot requested review from mdemirhan and taragu July 7, 2020 10:45

julz reviewed Jul 7, 2020

View reviewed changes

pkg/autoscaler/metrics/stats_scraper.go Outdated Show resolved Hide resolved

Change scraper transport for customized MaxIdleConns

9fb7a9b

Signed-off-by: Lance Liu <xuliuxl@cn.ibm.com>

xliuxu requested review from julz and vagababov July 8, 2020 00:05

vagababov reviewed Jul 8, 2020

View reviewed changes

knative-prow-robot assigned vagababov Jul 8, 2020

knative-prow-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 8, 2020

julz reviewed Jul 8, 2020

View reviewed changes

knative-prow-robot assigned julz Jul 8, 2020

markusthoemmes approved these changes Jul 8, 2020

View reviewed changes

knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 8, 2020

knative-prow-robot assigned markusthoemmes Jul 8, 2020

knative-prow-robot merged commit 1b5041b into knative:master Jul 8, 2020

skonto mentioned this pull request Jul 13, 2020

Bound Concurrency of Pod Scraping (Probably with a Work Pool) #8377

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change transport of direct client for customized MaxIdleConns #8568

Change transport of direct client for customized MaxIdleConns #8568

xliuxu commented Jul 7, 2020

knative-prow-robot commented Jul 7, 2020

knative-prow-robot commented Jul 7, 2020

markusthoemmes commented Jul 7, 2020

vagababov left a comment

julz left a comment

markusthoemmes left a comment

knative-prow-robot commented Jul 8, 2020

Change transport of direct client for customized MaxIdleConns #8568

Change transport of direct client for customized MaxIdleConns #8568

Conversation

xliuxu commented Jul 7, 2020

knative-prow-robot commented Jul 7, 2020

knative-prow-robot commented Jul 7, 2020

markusthoemmes commented Jul 7, 2020

vagababov left a comment

Choose a reason for hiding this comment

julz left a comment

Choose a reason for hiding this comment

markusthoemmes left a comment

Choose a reason for hiding this comment

knative-prow-robot commented Jul 8, 2020