Feat: support sharding in controller #5360

Somefive · 2023-01-18T09:46:09Z

Signed-off-by: Somefive yd219913@alibaba-inc.com

Application Controller Sharding

Background

As many adopters start to choose KubeVela to manage thousands of Applications in production, the performance issue becomes increasingly critical.

The typical Kubernetes controller usually run one controller to manage all related custom resources. The operator reconciliation is designed to be relatively lightweight. In KubeVela, to support customized delivery process and manage lifecycles of tens of resources in one application makes the reconciliation process heavier compared to typical controllers.

Although multiple replicas of KubeVela controller could provide high availability, in avoid of multiple concurrent reconciliation for the same application (which could lead to conflict problems), only one replica could work and this is achieved by acquiring the same leader-election lock.

So usually, users add more resources to KubeVela controller (more CPUs, Memory...) and set up more threads and larger rate limiter to support more applications in the system. This can lead to potential problems in different cases:

There is limitations for the growth of resources for a single KubeVela controller. If the Kubernetes cluster of the control plane uses lots of small machines, a single KubeVela controller can only run one of them, instead of spreading out.
The failure of the single KubeVela controller could block all the delivery process of applications. The failure could be caused by various reasons and some frequently seen reasons (like OOM or Crash due to some weird application input) are not recoverable by restarting controller.
In multi-tenancy scenario, there is no fairness that could be guaranteed if some users have a huge number of applications. They could make the controller run slow and users with a small number of applications could also be affected, due to the sharing of the single KubeVela controller.

Therefore, this KEP proposes to support dividing a single large KubeVela controller into multiple small application controllers for run.

Proposal

When running with --enable-sharding arg in KubeVela core controller, the KubeVela core controller will run in sharding mode.

If the --shard-id arg is set to master, this one will run in master mode.
If the --shard-id arg is set to some other value, like shard-0, this one will run in slave mode.

master mode

The master mode will enable all the controllers, such as ComponentDefinitionController, TraitDefinitionController, ApplicationController, Webhooks, etc.

The application controller will only handle applications with the label controller.core.oam.dev/scheduled-shard-id: master, and only the applications, applicationrevisions and resourcetrackers that carries this label will be watched and cached.

By default, it will watch the pods within the same namespace as it runs. The pods with labels app.kubernetes.io/name: vela-core and carries controller.core.oam.dev/shard-id label key will be selected and their health status will be recorded. The selected ready pods will be registered as schedulable shards. The mutating Webhook will automatically assign shard-id for non-assigned applications when requests come in.

slave mode

The slave mode controller will only start the ApplicationController and will not enable others like Webhook or ComponentDefinitionController. It is dedicated to applications that carries the matched labels controller.core.oam.dev/scheduled-shard-id=<shard-id>.

Example

First, install KubeVela with sharding enabled. This will let the KubeVela core controller deployed in master mode.

helm install kubevela kubevela/vela-core -n vela-system --set sharding.enabled=true

Second, deploy slave mode application controller.

There are different ways to do it.

Use addons for installation. vela addon enable vela-core-shard-manager nShards=3. Supported by [Addon] vela core shard manager catalog#606
Use kubectl to copy the deployment of master and modify it. kubectl get deploy kubevela-vela-core -oyaml -n vela-system | sed 's/schedulable-shards=/shard-id=shard-0/g' | sed 's/instance: kubevela/instance: kubevela-shard/g' | sed 's/shard-id: master/shard-id: shard-0/g' | sed 's/name: kubevela/name: kubevela-shard/g' | kubectl apply -f - This will create a copy of the master vela-core and run a slave replica with shard-id as shard-0.
Just create a new deployment and set the labels to match the above mentioned requirements.

In the case you do not want dynamic discovery for available application controller, you can specify what shards are schedulable by add arg --schedulable-shards=shard-0,shard-1 to the master mode vela-core.

Future Work

This Webhook implemented will only schedule applications when

The application is going to be mutated (create or update).
The application has not been scheduled yet (no controller.core.oam.dev/scheduled-shard-id label).
There is available controller to assign (the master mode controller will not be automatically schedulable in order to divide the burden of handling application into slave mode controllers).

So it cannot handle re-schedule scenario and cannot make automatic scheduling for previous application.

For the next step, we can support:

vela up <app-name> --shard-id <shard-id> Use vela CLI to help user manually reschedule application. NOTE: reschedule application not only need to reset the label of that application but also need to reset the labels of related applicationrevisions and resourcetrackers.
Support a background scheduler or a scheduling script to make schedule for all unscheduled application.
Support DisableAutoSchedule for the case where user wants to disable the automatic schedule for the mutating webhook.
Observability tools (Prometheus, Loki, Grafana) should be able to collect data from all vela-core controllers in sharding mode.

Extend Usage

The sharding package could potentially be moved to kubevela/pkg if later we find it also helpful for other controller like kubevela/workflow.
It is possible to let different shards to use different vela-core version so as to handle applications in multiple versions by different controller.
It is possible to let different tenant to use different vela-core controller to help improve fairness and reduce the damage scope when one controller fails.
The original multi-replica KubeVela core controller for high availability is deferred to each shard's deployment, which means each shard could have multiple replicas (and only one can work for one time, but multiple shards can be parallelized).

Tech Details

Since the cache of Application/ApplicationRevision/ResourceTracker are divided into multiple shards, the memory consumption of each controller is expected to be divided as well. Other resources like ConfigMap (used by Workflow Context) are not divided.
The validating webhook needs to read ApplicationRevision when publish-version is set in application, but since ApplicationRevision is sharded, therefore the cache of the vela-core in master mode will not hold the ApplicationRevision. So here we use native Kubernetes API request to read ApplicationRevision. This may lead to performance drop for validating webhook and this is disabled by default when sharding enabled.

FAQ

Q: What will happen when the master one is down?
A: If the webhook is enabled, the webhook will be down if the master one is down. Then the mutating webhook and validating webhook will fail so no new application creation or change will be accepted. Old applications that are scheduled to master will not be processed anymore. Others that are scheduled to other shards will not be affected. If the webhook is not enabled, then only applications scheduled to master shard will be affected. Users can still create or change applications in this case.

Q: What will happen when a slave mode controller is down?
A: For applications that are not scheduled to that shard, nothing will happen. For applications that are scheduled to the broken shard, succeeded applications will not run state-keep or gc but the delivered resources will not be touched. For applications that are still running workflow, they will not be handled but can be recovered instantly when the controller is restarted.

Q: What will happen if one user delete the sharding label while the application is already in running state?
A: If the webhook is enabled, this behaviour will be prevented. The sharding label will inherit the original one. If the webhook is not enabled, the application will not be handled anymore (no state-keep, no gc and no update).

I have:

Read and followed KubeVela's contribution process.
Related Docs updated properly. In a new feature or configuration option, an update to the documentation is necessary.
Run make reviewable to ensure this PR is ready for review.
Added backport release-x.y labels to auto-backport this PR if necessary.

codecov · 2023-01-18T09:54:23Z

Codecov Report

Base: 61.20% // Head: 49.29% // Decreases project coverage by -11.92% ⚠️

Coverage data is based on head (a508014) compared to base (f733d74).
Patch coverage: 75.94% of modified lines in pull request are covered.

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #5360       +/-   ##
===========================================
- Coverage   61.20%   49.29%   -11.92%     
===========================================
  Files         308      313        +5     
  Lines       46778    47018      +240     
===========================================
- Hits        28631    23177     -5454     
- Misses      15189    21402     +6213     
+ Partials     2958     2439      -519

Flag	Coverage Δ
apiserver-e2etests	`?`
apiserver-unittests	`36.23% <ø> (+0.02%)`	⬆️
core-unittests	`55.04% <37.71%> (-0.24%)`	⬇️
e2e-multicluster-test	`19.17% <59.45%> (+0.18%)`	⬆️
e2e-rollout-tests	`?`
e2etests	`25.78% <16.73%> (-0.80%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pkg/features/controller_features.go	`100.00% <ø> (ø)`
pkg/controller/sharding/vars.go	`37.50% <37.50%> (ø)`
cmd/core/app/server.go	`58.71% <43.47%> (-0.66%)`	⬇️
pkg/utils/app/reschedule.go	`64.28% <64.28%> (ø)`
...ok/core.oam.dev/v1alpha2/application/validation.go	`85.71% <70.00%> (-1.79%)`	⬇️
pkg/controller/sharding/scheduler.go	`80.00% <80.00%> (ø)`
...e.oam.dev/v1alpha2/application/mutating_handler.go	`85.13% <90.47%> (+1.80%)`	⬆️
cmd/core/app/options/options.go	`92.72% <100.00%> (-0.13%)`	⬇️
...ller/core.oam.dev/v1alpha2/application/revision.go	`70.87% <100.00%> (-0.87%)`	⬇️
pkg/controller/sharding/cache.go	`100.00% <100.00%> (ø)`
... and 132 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

barnettZQG · 2023-01-28T03:42:38Z

pkg/controller/sharding/scheduler.go

+		return "", false
+	}
+	// nolint
+	return available[rand.Intn(len(available))], true


Advice uses Round Robin as the default schedule rule.

I don't prefer the schedule algorithm, but do we expose metrics of how many applications the controller is handling? This is important.

This is exposed by the standard metrics endpoint in Prometheus format. We need to add support for the Grafana to collect them.

Advice uses Round Robin as the default schedule rule.

Added to the dynamicDiscoveryScheduler. The staticDiscoveryScheduler will use random still.

wonderflow · 2023-01-28T03:52:06Z

We need to discuss the failover case and best configuration:

when the master mode controller is down, everything is down. so we still need at least 3 replicas for master mode controller.
when the slave mode controller (naming slave-0) is down, what will happen to these two cases?
1. applications in scheduling that marked as slave-0 to handle.
2. applications already handled by slave-0 but with error that needs to reconcile again.
3. how can we discover applications in the above two scenarios?

Another question is, what will happen if one user delete the sharding label while the application is already in running state?

wonderflow · 2023-01-28T03:52:54Z

charts/vela-core/README.md

@@ -153,6 +153,8 @@ helm install --create-namespace -n vela-system kubevela kubevela/vela-core --wai
 | `authentication.withUser`     | Application authentication will impersonate as the request User                                                            | `true`               |
 | `authentication.defaultUser`  | Application authentication will impersonate as the User if no user provided in Application                                 | `kubevela:vela-core` |
 | `authentication.groupPattern` | Application authentication will impersonate as the request Group that matches the pattern                                  | `kubevela:*`         |
+| `sharding.enabled`            | Enable sharding for core controller                                                                                        | `false`              |


should the default value to be true?

Currently no. If the default value set to be true, then when no shards added, application will not be handled.

Behaviour changed. Master shard by default accept applications now.

pkg/controller/sharding/flags.go

pkg/controller/sharding/scheduler.go

StevenLeiZhang · 2023-01-30T07:34:20Z

charts/vela-core/README.md

+| `authentication.withUser`     | Application authentication will impersonate as the request User                                                                                                    | `true`               |
+| `authentication.defaultUser`  | Application authentication will impersonate as the User if no user provided in Application                                                                         | `kubevela:vela-core` |
+| `authentication.groupPattern` | Application authentication will impersonate as the request Group that matches the pattern                                                                          | `kubevela:*`         |
+| `sharding.enabled`            | When sharding enabled, the controller will run as master mode. Refer to https://github.com/kubevela/kubevela/blob/master/design/vela-core/sharding.md for details. | `false`              |


user did not enable sharding, when install vela. After that, if user need enable it in order to manage more apps, can user enable sharding?

Yes, user can use helm upgrade --set sharding.enabled to enable it. But he needs to use vela up --shard-id command to manually schedule the original applications to master shard.

StevenLeiZhang · 2023-01-30T07:39:03Z

charts/vela-core/values.yaml

+
+## @param sharding.enabled When sharding enabled, the controller will run as master mode. Refer to https://github.com/kubevela/kubevela/blob/master/design/vela-core/sharding.md for details.
+## @param sharding.schedulableShards The shards available for scheduling. If empty, dynamic discovery will be used.
+sharding:


Can vela install command install kubevela with sharding enabled?

StevenLeiZhang · 2023-01-30T07:44:20Z

references/cli/up.go

@@ -272,6 +299,7 @@ func NewUpCommand(f velacmd.Factory, order string, c utilcommon.Args, ioStream u
 	cmd.Flags().StringVarP(&o.File, "file", "f", o.File, "The file path for appfile or application. It could be a remote url.")
 	cmd.Flags().StringVarP(&o.PublishVersion, "publish-version", "v", o.PublishVersion, "The publish version for deploying application.")
 	cmd.Flags().StringVarP(&o.RevisionName, "revision", "r", o.RevisionName, "The revision to use for deploying the application, if empty, the current application configuration will be used.")
+	cmd.Flags().StringVarP(&o.ShardID, "shard-id", "s", o.ShardID, "The shard id assigned to the application. If empty, it will not be used.")


If vela is installed with sharding mode, which controller will manage the Applications without label controller.core.oam.dev/scheduled-shard-id ?

No controller will handle those applications. Those ones need to be scheduled through vela up --shard-id (provided in this PR) or customized scheduler.

design/vela-core/sharding.md

charlie0129

typo here

Signed-off-by: Somefive <yd219913@alibaba-inc.com>

chivalryq · 2023-01-31T02:38:34Z

.github/workflows/apiserver-test.yml

-        run: make end-e2e-core
+        run: |
+          make end-e2e-core
+          CORE_NAME=kubevela-shard sh ./hack/e2e/end_e2e_core.sh


Since end-e2e-core is a makefile target, It'd be better if we wrap CORE_NAME=kubevela-shard sh ./hack/e2e/end_e2e_core.sh into another one to keep style here consistent.

add make end-e2e-core-shards

Signed-off-by: Somefive <yd219913@alibaba-inc.com>

* Feat: bootstrap sharding Signed-off-by: Somefive <yd219913@alibaba-inc.com> * Chore: refactor end-e2e-core-shards script Signed-off-by: Somefive <yd219913@alibaba-inc.com> --------- Signed-off-by: Somefive <yd219913@alibaba-inc.com>

Somefive requested review from barnettZQG, wonderflow, leejanee, jefree-cat and FogDong as code owners January 18, 2023 09:46

Somefive marked this pull request as draft January 18, 2023 09:46

Somefive force-pushed the feat/sharding branch 19 times, most recently from 6563179 to c6ea212 Compare January 20, 2023 08:00

Somefive marked this pull request as ready for review January 20, 2023 08:47

barnettZQG reviewed Jan 28, 2023

View reviewed changes

wonderflow reviewed Jan 28, 2023

View reviewed changes

pkg/controller/sharding/flags.go Outdated Show resolved Hide resolved

Somefive force-pushed the feat/sharding branch from 02bf96b to b0861c3 Compare January 28, 2023 09:47

Somefive requested review from zzxwill, StevenLeiZhang, charlie0129 and chivalryq as code owners January 28, 2023 09:47

Somefive force-pushed the feat/sharding branch 2 times, most recently from d00e196 to be07a8b Compare January 29, 2023 02:26

wonderflow reviewed Jan 29, 2023

View reviewed changes

pkg/controller/sharding/flags.go Outdated Show resolved Hide resolved

wonderflow reviewed Jan 29, 2023

View reviewed changes

pkg/controller/sharding/scheduler.go Outdated Show resolved Hide resolved

Somefive force-pushed the feat/sharding branch 2 times, most recently from b0243a9 to 55f796b Compare January 29, 2023 03:19

wonderflow approved these changes Jan 29, 2023

View reviewed changes

Somefive force-pushed the feat/sharding branch from 55f796b to 294e881 Compare January 29, 2023 06:59

StevenLeiZhang reviewed Jan 30, 2023

View reviewed changes

StevenLeiZhang self-requested a review January 30, 2023 07:39

StevenLeiZhang reviewed Jan 30, 2023

View reviewed changes

FogDong reviewed Jan 30, 2023

View reviewed changes

design/vela-core/sharding.md Outdated Show resolved Hide resolved

FogDong approved these changes Jan 30, 2023

View reviewed changes

charlie0129 reviewed Jan 30, 2023

View reviewed changes

Feat: bootstrap sharding

1a9cd2e

Signed-off-by: Somefive <yd219913@alibaba-inc.com>

Somefive force-pushed the feat/sharding branch from 294e881 to 1a9cd2e Compare January 30, 2023 14:04

chivalryq reviewed Jan 31, 2023

View reviewed changes

Chore: refactor end-e2e-core-shards script

a508014

Signed-off-by: Somefive <yd219913@alibaba-inc.com>

chivalryq approved these changes Jan 31, 2023

View reviewed changes

wonderflow merged commit 9efbb72 into kubevela:master Jan 31, 2023

This was referenced Feb 11, 2023

[Feature] Enhanced horizontal scaling of app controller #4596

Closed

Feat: enhanced horizontal scaling of app controller. #4705

Closed

Somefive deleted the feat/sharding branch June 20, 2023 13:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: support sharding in controller #5360

Feat: support sharding in controller #5360

Somefive commented Jan 18, 2023 •

edited

codecov bot commented Jan 18, 2023 •

edited

barnettZQG Jan 28, 2023

Somefive Jan 28, 2023

wonderflow Jan 28, 2023

Somefive Jan 28, 2023

Somefive Jan 28, 2023

wonderflow commented Jan 28, 2023

wonderflow Jan 28, 2023

Somefive Jan 28, 2023

Somefive Jan 29, 2023

StevenLeiZhang Jan 30, 2023

Somefive Jan 30, 2023

StevenLeiZhang Jan 30, 2023

Somefive Jan 30, 2023

StevenLeiZhang Jan 30, 2023

Somefive Jan 30, 2023

charlie0129 left a comment

chivalryq Jan 31, 2023

Somefive Jan 31, 2023

Feat: support sharding in controller #5360

Feat: support sharding in controller #5360

Conversation

Somefive commented Jan 18, 2023 • edited

Application Controller Sharding

Background

Proposal

master mode

slave mode

Example

Future Work

Extend Usage

Tech Details

FAQ

codecov bot commented Jan 18, 2023 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wonderflow commented Jan 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charlie0129 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Somefive commented Jan 18, 2023 •

edited

codecov bot commented Jan 18, 2023 •

edited