refactor cluster controller #3380

swiftslee · 2021-02-23T08:29:23Z

Signed-off-by: yuswift yuswiftli@yunify.com

What type of PR is this?
/kind design
What this PR does / why we need it:
Reduce the complexity between tower server and clsuter-controller. Remove the port allocation proxy creation token generation steps, add cluster ready detection step.
Which issue(s) this PR fixes:
Fixes #3234

swiftslee · 2021-02-23T08:31:53Z

/cc @zryfish

codecov · 2021-02-23T08:44:40Z

Codecov Report

Merging #3380 (71988d9) into master (5972c4b) will increase coverage by 0.02%.
The diff coverage is 29.60%.

@@            Coverage Diff             @@
##           master    #3380      +/-   ##
==========================================
+ Coverage   11.87%   11.89%   +0.02%     
==========================================
  Files         226      226              
  Lines       42658    42605      -53     
==========================================
+ Hits         5065     5068       +3     
+ Misses      36809    36757      -52     
+ Partials      784      780       -4

Flag	Coverage Δ
unittests	`11.89% <29.60%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pkg/controller/cluster/cluster_controller.go	`0.00% <0.00%> (ø)`
pkg/controller/network/ippool/ippool_controller.go	`55.44% <0.00%> (-0.29%)`	⬇️
pkg/models/devops/devops.go	`12.24% <0.00%> (ø)`
pkg/simple/client/devops/jenkins/pipeline.go	`3.14% <0.00%> (ø)`
pkg/simple/client/devops/jenkins/pure_request.go	`0.00% <0.00%> (ø)`
...kg/simple/client/network/ippool/calico/provider.go	`7.14% <0.00%> (-0.09%)`	⬇️
...er/devopscredential/devopscredential_controller.go	`33.10% <44.00%> (+0.03%)`	⬆️
pkg/apiserver/request/requestinfo.go	`55.30% <100.00%> (+1.75%)`	⬆️
pkg/controller/pipeline/pipeline_controller.go	`42.65% <100.00%> (-3.75%)`	⬇️
pkg/server/params/params.go	`78.72% <100.00%> (+26.09%)`	⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5972c4b...194d054. Read the comment docs.

zryfish · 2021-02-23T10:59:08Z

pkg/controller/cluster/cluster_controller.go

@@ -348,6 +345,80 @@ func (c *clusterController) reconcileHostCluster() error {
 	return err
 }

+func (c *clusterController) judgeIfClusterIsReady() error {


how about changing the name to probeClusters ?

zryfish · 2021-02-23T11:13:53Z

pkg/controller/cluster/cluster_controller.go

+			klog.Error(err)
+			continue
+		}
+		config.Timeout = 10 * time.Second


What if there are lots of clusters, saying 50, and each cluster takes 9s to finish probing, that would be 450s, > resyncPeriod.

10 seconds seem to be too long. But the case you are talking about is very rare. It's unlikely that every cluster connection takes 9s. In most cases, one connection takes several ms. How about changing the timeout to 3s?

If there are network issues on the node where the ks-controller-manager pod residing, it's possible.

What I did before put cluster back to working queue every resyncPeriod, and check its readiness on main sync loop.

I didn't see put cluster back to working queue every resyncPeriod, but I saw check its readiness on main sync loop.. I think we don't need to put cluster back to working queue every resyncPeriod manually, the cluster informer does that automatically. The reason I check the cluster readiness separately is to check all of the cluster readiness, not only proxy connection. What you did before only checks the proxy cluster if has agent availably status, then updates the cluster status to ready or not. By using kubeconfig, I think it's more reliable(e.g. direct connection with kube-apiserver unreachable).

If put in the main sync loop, once the cluster is created/updated/deleted, the check will be performed, which may be too frequent. On the other hand, the check may be too long and will affect the sync loop. What is your suggestion?

That's true, but we need to update cluster.status.configz every resyncPeriod too. So I suggest make config.timeout shorter, and probe in the main loop.

Currently, we update the cluster.status.configz every resyncPeriod at the end of the main loop. The update didn't change.

OK, better to make config.timeout shorter

config.Timeout has been set as 3s by default. We can merge this pr now.

zryfish · 2021-02-23T11:15:19Z

pkg/simple/client/multicluster/options.go

@@ -79,5 +79,5 @@ func (o *Options) AddFlags(fs *pflag.FlagSet, s *Options) {
 		"This field is used when generating deployment yaml for agent.")

 	fs.DurationVar(&o.ClusterControllerResyncSecond, "cluster-controller-resync-second", s.ClusterControllerResyncSecond,
-		"Cluster controller resync second to sync cluster resource.")
+		"Cluster controller resync second to sync cluster resource. e.g. 30s 60s 120s...")


better to start 2m, 5m, 10m, small resync period increases load

Agree with that. I will update the comment.

Signed-off-by: yuswift <yuswiftli@yunify.com>

zryfish · 2021-02-25T02:52:29Z

/lgtm
/approve

ks-ci-bot · 2021-02-25T02:52:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yuswift, zryfish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [zryfish]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ks-ci-bot added kind/design Categorizes issue or PR as related to design. dco-signoff: yes size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 23, 2021

ks-ci-bot requested a review from zryfish February 23, 2021 08:31

swiftslee force-pushed the refactor_cluster_controller branch from da42065 to 81aa760 Compare February 23, 2021 08:34

zryfish reviewed Feb 23, 2021

View reviewed changes

refactor cluster controller

194d054

Signed-off-by: yuswift <yuswiftli@yunify.com>

swiftslee force-pushed the refactor_cluster_controller branch from 81aa760 to 194d054 Compare February 24, 2021 02:13

ks-ci-bot assigned zryfish Feb 25, 2021

ks-ci-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 25, 2021

ks-ci-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 25, 2021

ks-ci-bot merged commit e48306d into kubesphere:master Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor cluster controller #3380

refactor cluster controller #3380

swiftslee commented Feb 23, 2021

swiftslee commented Feb 23, 2021

codecov bot commented Feb 23, 2021 •

edited

zryfish Feb 23, 2021

swiftslee Feb 23, 2021

zryfish Feb 23, 2021

swiftslee Feb 23, 2021 •

edited

zryfish Feb 23, 2021

zryfish Feb 23, 2021

swiftslee Feb 23, 2021 •

edited

swiftslee Feb 24, 2021 •

edited

zryfish Feb 24, 2021

swiftslee Feb 24, 2021 •

edited

zryfish Feb 24, 2021

swiftslee Feb 25, 2021

zryfish Feb 23, 2021

swiftslee Feb 23, 2021

zryfish commented Feb 25, 2021

ks-ci-bot commented Feb 25, 2021

refactor cluster controller #3380

refactor cluster controller #3380

Conversation

swiftslee commented Feb 23, 2021

swiftslee commented Feb 23, 2021

codecov bot commented Feb 23, 2021 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swiftslee Feb 23, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swiftslee Feb 23, 2021 • edited

Choose a reason for hiding this comment

swiftslee Feb 24, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swiftslee Feb 24, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zryfish commented Feb 25, 2021

ks-ci-bot commented Feb 25, 2021

codecov bot commented Feb 23, 2021 •

edited

swiftslee Feb 23, 2021 •

edited

swiftslee Feb 23, 2021 •

edited

swiftslee Feb 24, 2021 •

edited

swiftslee Feb 24, 2021 •

edited