Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

karmada-scheduler add disable-scheduler-estimator-in-pull-mode flag #2064

Merged
merged 1 commit into from Jun 30, 2022

Conversation

prodanlabs
Copy link
Member

@prodanlabs prodanlabs commented Jun 25, 2022

Signed-off-by: prodan pengshihaoren@gmail.com

What type of PR is this?

/kind bug
/kind feature

What this PR does / why we need it:

In pull mode, if the network of the member cluster and karmada are not two-way communication, scheduler-estimator is not available.,In this scenario, we need to add a flag to disable scheduler-estimator in pull mode.

For example, my cluster2 is in pull mode, and scheduler-estimator is not installed, the scheduler has been looking for karmada-scheduler-estimator-cluster2.

root@dev-karmada-cluster02:~# kubectl  --kubeconfig /etc/karmada/karmada-apiserver.config get cluster
NAME       VERSION   MODE   READY   AGE
cluster1   v1.23.8   Push   True    72m
cluster2   v1.22.0   Pull   True    69m
I0625 09:19:38.584300       1 cache.go:87] Start dialing estimator server(karmada-scheduler-estimator-cluster2:10352) of cluster(cluster2).
W0625 09:19:38.587111       1 clientconn.go:1322] [core] grpc: addrConn.createTransport failed to connect to {karmada-scheduler-estimator-cluster2:10352 karmada-scheduler-estimator-cluster2:10352 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup karmada-scheduler-estimator-cluster2 on 10.96.0.10:53: no such host"
W0625 09:19:39.590520       1 clientconn.go:1322] [core] grpc: addrConn.createTransport failed to connect to {karmada-scheduler-estimator-cluster2:10352 karmada-scheduler-estimator-cluster2:10352 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup karmada-scheduler-estimator-cluster2 on 10.96.0.10:53: no such host"
W0625 09:19:41.431697       1 clientconn.go:1322] [core] grpc: addrConn.createTransport failed to connect to {karmada-scheduler-estimator-cluster2:10352 karmada-scheduler-estimator-cluster2:10352 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup karmada-scheduler-estimator-cluster2 on 10.96.0.10:53: no such host"
E0625 09:19:43.584807       1 cache.go:90] Failed to dial cluster(cluster2): dial karmada-scheduler-estimator-cluster2:10352 error: context deadline exceeded.
I0625 09:19:43.745008       1 cache.go:87] Start dialing estimator server(karmada-scheduler-estimator-cluster2:10352) of cluster(cluster2).
W0625 09:19:43.748414       1 clientconn.go:1322] [core] grpc: addrConn.createTransport failed to connect to {karmada-scheduler-estimator-cluster2:10352 karmada-scheduler-estimator-cluster2:10352 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup karmada-scheduler-estimator-cluster2 on 10.96.0.10:53: no such host"
W0625 09:19:44.752193       1 clientconn.go:1322] [core] grpc: addrConn.createTransport failed to connect to {karmada-scheduler-estimator-cluster2:10352 karmada-scheduler-estimator-cluster2:10352 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup karmada-scheduler-estimator-cluster2 on 10.96.0.10:53: no such host"
W0625 09:19:46.094775       1 clientconn.go:1322] [core] grpc: addrConn.createTransport failed to connect to {karmada-scheduler-estimator-cluster2:10352 karmada-scheduler-estimator-cluster2:10352 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup karmada-scheduler-estimator-cluster2 on 10.96.0.10:53: no such host"
W0625 09:19:48.258426       1 clientconn.go:1322] [core] grpc: addrConn.createTransport failed to connect to {karmada-scheduler-estimator-cluster2:10352 karmada-scheduler-estimator-cluster2:10352 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup karmada-scheduler-estimator-cluster2 on 10.96.0.10:53: no such host"
E0625 09:19:48.745915       1 cache.go:90] Failed to dial cluster(cluster2): dial karmada-scheduler-estimator-cluster2:10352 error: context deadline exceeded.
I0625 09:19:49.066524       1 cache.go:87] Start dialing estimator server(karmada-scheduler-estimator-cluster2:10352) of cluster(cluster2).
W0625 09:19:49.069144       1 clientconn.go:1322] [core] grpc: addrConn.createTransport failed to connect to {karmada-scheduler-estimator-cluster2:10352 karmada-scheduler-estimator-cluster2:10352 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup karmada-scheduler-estimator-cluster2 on 10.96.0.10:53: no such host"
W0625 09:19:50.072502       1 clientconn.go:1322] [core] grpc: addrConn.createTransport failed to connect to {karmada-scheduler-estimator-cluster2:10352 karmada-scheduler-estimator-cluster2:10352 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup karmada-scheduler-estimator-cluster2 on 10.96.0.10:53: no such host"
W0625 09:19:51.583913       1 clientconn.go:1322] [core] grpc: addrConn.createTransport failed to connect to {karmada-scheduler-estimator-cluster2:10352 karmada-scheduler-estimator-cluster2:10352 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup karmada-scheduler-estimator-cluster2 on 10.96.0.10:53: no such host"
E0625 09:19:54.066914       1 cache.go:90] Failed to dial cluster(cluster2): dial karmada-scheduler-estimator-cluster2:10352 error: context deadline exceeded.
I0625 09:19:54.707938       1 cache.go:87] Start dialing estimator server(karmada-scheduler-estimator-cluster2:10352) of cluster(cluster2).
W0625 09:19:54.710377       1 clientconn.go:1322] [core] grpc: addrConn.createTransport failed to connect to {karmada-scheduler-estimator-cluster2:10352 karmada-scheduler-estimator-cluster2:10352 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup karmada-scheduler-estimator-cluster2 on 10.96.0.10:53: no such host"
W0625 09:19:55.713421       1 clientconn.go:1322] [core] grpc: addrConn.createTransport failed to connect to {karmada-scheduler-estimator-cluster2:10352 karmada-scheduler-estimator-cluster2:10352 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup karmada-scheduler-estimator-cluster2 on 10.96.0.10:53: no such host"
W0625 09:19:57.567846       1 clientconn.go:1322] [core] grpc: addrConn.createTransport failed to connect to {karmada-scheduler-estimator-cluster2:10352 karmada-scheduler-estimator-cluster2:10352 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup karmada-scheduler-estimator-cluster2 on 10.96.0.10:53: no such host"
E0625 09:19:59.708969       1 cache.go:90] Failed to dial cluster(cluster2): dial karmada-scheduler-estimator-cluster2:10352 error: context deadline exceeded.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-scheduler`: Introduced  `--disable-scheduler-estimator-in-pull-mode` flag to disable scheduler-estimator for pull mode clusters.

@karmada-bot karmada-bot added kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jun 25, 2022
@karmada-bot karmada-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 25, 2022
@prodanlabs prodanlabs changed the title fix: remove Scheduler estimator Event in Pull mode fix: remove scheduler estimator event in pull mode Jun 25, 2022
@Garrybest
Copy link
Member

Garrybest commented Jun 25, 2022

Actually scheduler-estimator is also used for pulled clusters. If you enable scheduler estimator option in karmada-scheduler, it will look for all estimators. Refer to

- If the cluster has been joined already, you could use `hack/deploy-scheduler-estimator.sh` to only deploy karmada-scheduler-estimator.

@prodanlabs
Copy link
Member Author

prodanlabs commented Jun 25, 2022

Actually scheduler-estimator is also used for pulled clusters. If you enable scheduler estimator option in karmada-scheduler, it will look for all estimators. Refer to

- If the cluster has been joined already, you could use `hack/deploy-scheduler-estimator.sh` to only deploy karmada-scheduler-estimator.

thank you for your reply.

I think the pull mode is used to solve the one-way network problem between the member cluster and karmada, so deploy scheduler-estimator in the host cluster, scheduler-estimator cannot connect to the kube-apiserver that connects to the member cluster.

In the case where the network can communicate in both directions, karmada should recommend users to use the push mode, because from the details, the push mode is more perfect.

Our private cloud's kube-apiserver is not open to the public network, including proxies, for security reasons,so scheduler-estimator is useless for our pull mode. FYR

@Garrybest
Copy link
Member

Hi @prodanlabs, I got you. But some users choose the pull mode not for network reasons. AFAIK, just for HA or some other performance reasons. So I think we should try to use a tunnel or proxy to fix the network barrier, it may be not appropriate to just disable scheduler-estimator in pull mode. How do you think?

@prodanlabs
Copy link
Member Author

@Garrybest Thanks for the quick response

Yes, I admit that some users still use pull mode even when the network can communicate in both directions.

So I think we should try to use a tunnel or proxy to fix the network barrier

In fact, karmada also provides anp solutions to solve network obstacles. In pull mode, karmada-kubectl logs, exec and other subcommands can also get logs or enter Pod.

But as far as we are concerned, because of security management regulations, the core business systems on the private cloud are not allowed to be accessed through the public network, nor can network agents or network tunnels be deployed.

My idea is that push mode is recommended when there is no network failure. Pull mode does not need to deploy scheduler estimator, or compromise, in pull mode, scheduler adds a flag to close scheduler estimator event.

@Garrybest
Copy link
Member

It makes sense. We could add a flag to avoid establishing the connection with clusters in pull mode. Would you like to revise again?

@prodanlabs
Copy link
Member Author

It makes sense. We could add a flag to avoid establishing the connection with clusters in pull mode. Would you like to revise again?

OK

@prodanlabs
Copy link
Member Author

Hi @Garrybest , do you have any good suggestions for flag names?

@karmada-bot karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 27, 2022
@prodanlabs prodanlabs force-pushed the fix-scheduler branch 2 times, most recently from e75e2fc to ee3938e Compare June 27, 2022 05:25
@prodanlabs prodanlabs changed the title fix: remove scheduler estimator event in pull mode karmada-scheduler add enable-pull-scheduler-estimator-event flag Jun 27, 2022
@prodanlabs
Copy link
Member Author

@Garrybest @RainbowMango can you please review?

@Garrybest
Copy link
Member

Hi @prodanlabs, this flag is not just used for disable event. If we don't add pull-mode clusters into schedulerEstimatorWorker, the scheduler will never establish a connection with them. So the flag is to disable scheduler estimator in pull-mode clusters.

The option could be like DisableSchedulerEstimatorInPullMode.

@prodanlabs
Copy link
Member Author

Hi @prodanlabs, this flag is not just used for disable event. If we don't add pull-mode clusters into schedulerEstimatorWorker, the scheduler will never establish a connection with them. So the flag is to disable scheduler estimator in pull-mode clusters.

The option could be like DisableSchedulerEstimatorInPullMode.

Agreed, I'll change it later. thanks

@karmada-bot karmada-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 27, 2022
@prodanlabs prodanlabs changed the title karmada-scheduler add enable-pull-scheduler-estimator-event flag karmada-scheduler add disable-scheduler-estimator-in-pull-mode flag Jun 27, 2022
@karmada-bot karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 27, 2022
@prodanlabs
Copy link
Member Author

Hi @prodanlabs, this flag is not just used for disable event. If we don't add pull-mode clusters into schedulerEstimatorWorker, the scheduler will never establish a connection with them. So the flag is to disable scheduler estimator in pull-mode clusters.

The option could be like DisableSchedulerEstimatorInPullMode.

done.

/cc @Garrybest @RainbowMango

@prodanlabs prodanlabs force-pushed the fix-scheduler branch 2 times, most recently from 8c9ace1 to 09358b7 Compare June 28, 2022 08:59
@Garrybest
Copy link
Member

/lgtm

Thanks!

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 29, 2022
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
/approve

/hold
For @Tingtal to confirm the grammar.
and for @prodanlabs update the PR descriptions. (Since this PR has changed a lot after discussion, so please help update the PR descriptions as well. )

@@ -80,6 +82,7 @@ func (o *Options) AddFlags(fs *pflag.FlagSet) {
fs.Float32Var(&o.KubeAPIQPS, "kube-api-qps", 40.0, "QPS to use while talking with karmada-apiserver. Doesn't cover events and node heartbeat apis which rate limiting is controlled by a different set of flags.")
fs.IntVar(&o.KubeAPIBurst, "kube-api-burst", 60, "Burst to use while talking with karmada-apiserver. Doesn't cover events and node heartbeat apis which rate limiting is controlled by a different set of flags.")
fs.BoolVar(&o.EnableSchedulerEstimator, "enable-scheduler-estimator", false, "Enable calling cluster scheduler estimator for adjusting replicas.")
fs.BoolVar(&o.DisableSchedulerEstimatorInPullMode, "disable-scheduler-estimator-in-pull-mode", false, "Disable scheduler estimator in pull mode clusters, only takes effect when enable-scheduler-estimator is true.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fs.BoolVar(&o.DisableSchedulerEstimatorInPullMode, "disable-scheduler-estimator-in-pull-mode", false, "Disable scheduler estimator in pull mode clusters, only takes effect when enable-scheduler-estimator is true.")
fs.BoolVar(&o.DisableSchedulerEstimatorInPullMode, "disable-scheduler-estimator-in-pull-mode", false, "Disable scheduler estimator for pull mode clusters, only takes effect when enable-scheduler-estimator is true.")

I'm not sure if we should use for here. @Tingtal can you help to confirm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fs.BoolVar(&o.DisableSchedulerEstimatorInPullMode, "disable-scheduler-estimator-in-pull-mode", false, "Disable scheduler estimator in pull mode clusters, only takes effect when enable-scheduler-estimator is true.")
fs.BoolVar(&o.DisableSchedulerEstimatorInPullMode, "disable-scheduler-estimator-in-pull-mode", false, "Disable the scheduler estimator for clusters in pull mode, which takes effect only when enable-scheduler-estimator is true.")

@karmada-bot karmada-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 29, 2022
Copy link
Contributor

@Tingtal Tingtal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small touches. Pls check.

@@ -80,6 +82,7 @@ func (o *Options) AddFlags(fs *pflag.FlagSet) {
fs.Float32Var(&o.KubeAPIQPS, "kube-api-qps", 40.0, "QPS to use while talking with karmada-apiserver. Doesn't cover events and node heartbeat apis which rate limiting is controlled by a different set of flags.")
fs.IntVar(&o.KubeAPIBurst, "kube-api-burst", 60, "Burst to use while talking with karmada-apiserver. Doesn't cover events and node heartbeat apis which rate limiting is controlled by a different set of flags.")
fs.BoolVar(&o.EnableSchedulerEstimator, "enable-scheduler-estimator", false, "Enable calling cluster scheduler estimator for adjusting replicas.")
fs.BoolVar(&o.DisableSchedulerEstimatorInPullMode, "disable-scheduler-estimator-in-pull-mode", false, "Disable scheduler estimator in pull mode clusters, only takes effect when enable-scheduler-estimator is true.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fs.BoolVar(&o.DisableSchedulerEstimatorInPullMode, "disable-scheduler-estimator-in-pull-mode", false, "Disable scheduler estimator in pull mode clusters, only takes effect when enable-scheduler-estimator is true.")
fs.BoolVar(&o.DisableSchedulerEstimatorInPullMode, "disable-scheduler-estimator-in-pull-mode", false, "Disable the scheduler estimator for clusters in pull mode, which takes effect only when enable-scheduler-estimator is true.")


enableEmptyWorkloadPropagation bool
}

type schedulerOptions struct {
// enableSchedulerEstimator represents whether the accurate scheduler estimator should be enabled.
enableSchedulerEstimator bool
// disableSchedulerEstimatorInPullMode represents whether to disable the pull mode scheduler estimator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// disableSchedulerEstimatorInPullMode represents whether to disable the pull mode scheduler estimator.
// disableSchedulerEstimatorInPullMode represents whether to disable the scheduler estimator in pull mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@karmada-bot karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 30, 2022
@prodanlabs
Copy link
Member Author

@XiShanYongYe-Chang
Copy link
Member

Hi @prodanlabs, have you rebased the newest code in the master branch?

@prodanlabs
Copy link
Member Author

Hi @prodanlabs, have you rebased the newest code in the master branch?

my master branch is not up to date.

@XiShanYongYe-Chang
Copy link
Member

There is a bug #2072 which we have fixed by #2074, maybe you can rebase the master branch.

@prodanlabs
Copy link
Member Author

How to rerun CI .

@XiShanYongYe-Chang
Copy link
Member

How to rerun CI .

Rebase and push again?

@prodanlabs
Copy link
Member Author

Rebase and push again?

Last time I seemed to see @RainbowMango used the command to rerun CI, I forgot where I saw it

Signed-off-by: prodan <pengshihaoren@gmail.com>
@RainbowMango
Copy link
Member

Last time I seemed to see @RainbowMango used the command to rerun CI, I forgot where I saw it

No command can be used to re-trigger the test. sad...

@RainbowMango
Copy link
Member

/lgtm
/approve
/hold cancel

@karmada-bot karmada-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 30, 2022
@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 30, 2022
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot merged commit dc8177c into karmada-io:master Jun 30, 2022
@prodanlabs prodanlabs deleted the fix-scheduler branch June 30, 2022 08:54
@RainbowMango RainbowMango added this to the v1.3 milestone Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants