Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: support sharding in controller #5360

Merged
merged 2 commits into from
Jan 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/apiserver-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,8 @@ jobs:
make e2e-apiserver-test

- name: Stop kubevela, get profile
run: make end-e2e-core
run: |
make end-e2e-core-shards

- name: Upload coverage report
uses: codecov/codecov-action@d9f34f8cd5cb3b3eb79b3e4b5dae3a16df499a70
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/e2e-multicluster-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,8 @@ jobs:
make e2e-multicluster-test

- name: Stop kubevela, get profile
run: make end-e2e-core
run: |
make end-e2e-core-shards

- name: Upload coverage report
uses: codecov/codecov-action@d9f34f8cd5cb3b3eb79b3e4b5dae3a16df499a70
Expand Down
46 changes: 25 additions & 21 deletions charts/vela-core/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,8 @@ helm install --create-namespace -n vela-system kubevela kubevela/vela-core --wai
| `featureGates.gzipApplicationRevision` | compress apprev using gzip (good) before being stored. This is reduces network throughput when dealing with huge apprevs. | `false` |
| `featureGates.zstdApplicationRevision` | compress apprev using zstd (fast and good) before being stored. This is reduces network throughput when dealing with huge apprevs. Note that zstd will be prioritized if you enable other compression options. | `true` |
| `featureGates.preDispatchDryRun` | enable dryrun before dispatching resources. Enable this flag can help prevent unsuccessful dispatch resources entering resourcetracker and improve the user experiences of gc but at the cost of increasing network requests. | `true` |
| `featureGates.validateComponentWhenSharding` | enable component validation in webhook when sharding mode enabled | `false` |
| `featureGates.disableWebhookAutoSchedule` | disable auto schedule for application mutating webhook when sharding enabled | `false` |


### MultiCluster parameters
Expand Down Expand Up @@ -132,27 +134,29 @@ helm install --create-namespace -n vela-system kubevela kubevela/vela-core --wai

### Common parameters

| Name | Description | Value |
| ----------------------------- | -------------------------------------------------------------------------------------------------------------------------- | -------------------- |
| `imagePullSecrets` | Image pull secrets | `[]` |
| `nameOverride` | Override name | `""` |
| `fullnameOverride` | Fullname override | `""` |
| `serviceAccount.create` | Specifies whether a service account should be created | `true` |
| `serviceAccount.annotations` | Annotations to add to the service account | `{}` |
| `serviceAccount.name` | The name of the service account to use. If not set and create is true, a name is generated using the fullname template | `nil` |
| `nodeSelector` | Node selector | `{}` |
| `tolerations` | Tolerations | `[]` |
| `affinity` | Affinity | `{}` |
| `rbac.create` | Specifies whether a RBAC role should be created | `true` |
| `logDebug` | Enable debug logs for development purpose | `false` |
| `logFilePath` | If non-empty, write log files in this path | `""` |
| `logFileMaxSize` | Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. | `1024` |
| `kubeClient.qps` | The qps for reconcile clients, default is 100 | `100` |
| `kubeClient.burst` | The burst for reconcile clients, default is 200 | `200` |
| `authentication.enabled` | Enable authentication for application | `false` |
| `authentication.withUser` | Application authentication will impersonate as the request User | `true` |
| `authentication.defaultUser` | Application authentication will impersonate as the User if no user provided in Application | `kubevela:vela-core` |
| `authentication.groupPattern` | Application authentication will impersonate as the request Group that matches the pattern | `kubevela:*` |
| Name | Description | Value |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------- |
| `imagePullSecrets` | Image pull secrets | `[]` |
| `nameOverride` | Override name | `""` |
| `fullnameOverride` | Fullname override | `""` |
| `serviceAccount.create` | Specifies whether a service account should be created | `true` |
| `serviceAccount.annotations` | Annotations to add to the service account | `{}` |
| `serviceAccount.name` | The name of the service account to use. If not set and create is true, a name is generated using the fullname template | `nil` |
| `nodeSelector` | Node selector | `{}` |
| `tolerations` | Tolerations | `[]` |
| `affinity` | Affinity | `{}` |
| `rbac.create` | Specifies whether a RBAC role should be created | `true` |
| `logDebug` | Enable debug logs for development purpose | `false` |
| `logFilePath` | If non-empty, write log files in this path | `""` |
| `logFileMaxSize` | Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. | `1024` |
| `kubeClient.qps` | The qps for reconcile clients, default is 100 | `100` |
| `kubeClient.burst` | The burst for reconcile clients, default is 200 | `200` |
| `authentication.enabled` | Enable authentication for application | `false` |
| `authentication.withUser` | Application authentication will impersonate as the request User | `true` |
| `authentication.defaultUser` | Application authentication will impersonate as the User if no user provided in Application | `kubevela:vela-core` |
| `authentication.groupPattern` | Application authentication will impersonate as the request Group that matches the pattern | `kubevela:*` |
| `sharding.enabled` | When sharding enabled, the controller will run as master mode. Refer to https://github.com/kubevela/kubevela/blob/master/design/vela-core/sharding.md for details. | `false` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user did not enable sharding, when install vela. After that, if user need enable it in order to manage more apps, can user enable sharding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, user can use helm upgrade --set sharding.enabled to enable it. But he needs to use vela up --shard-id command to manually schedule the original applications to master shard.

| `sharding.schedulableShards` | The shards available for scheduling. If empty, dynamic discovery will be used. | `""` |


## Uninstallation
Expand Down
38 changes: 37 additions & 1 deletion charts/vela-core/templates/kubevela-controller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@ rules:
- apiGroups: ["authorization.k8s.io"]
resources: ["subjectaccessreviews"]
verbs: ["*"]

---

apiVersion: rbac.authorization.k8s.io/v1
Expand Down Expand Up @@ -85,6 +84,34 @@ subjects:
namespace: {{ .Release.Namespace }}

{{ end }}


{{ if and .Values.sharding.enabled .Values.authentication.enabled }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ include "kubevela.fullname" . }}:shard-scheduler
namespace: {{ .Release.Namespace }}
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ include "kubevela.fullname" . }}:shard-scheduler
namespace: {{ .Release.Namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{ include "kubevela.fullname" . }}:shard-scheduler
subjects:
- kind: ServiceAccount
name: {{ include "kubevela.serviceAccountName" . }}
{{ end }}

---
# permissions to do leader election.
apiVersion: rbac.authorization.k8s.io/v1
Expand Down Expand Up @@ -183,6 +210,9 @@ spec:
metadata:
labels:
{{- include "kubevela.selectorLabels" . | nindent 8 }}
{{ if .Values.sharding.enabled }}
controller.core.oam.dev/shard-id: master
{{ end }}
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8080"
Expand Down Expand Up @@ -282,6 +312,12 @@ spec:
- "--authentication-default-user={{ .Values.authentication.defaultUser }}"
- "--authentication-group-pattern={{ .Values.authentication.groupPattern }}"
{{ end }}
{{ if .Values.sharding.enabled }}
- "--enable-sharding"
- "--schedulable-shards={{ .Values.sharding.schedulableShards }}"
- "--feature-gates=ValidateComponentWhenSharding={{- .Values.featureGates.validateComponentWhenSharding | toString -}}"
- "--feature-gates=DisableWebhookAutoSchedule={{- .Values.featureGates.disableWebhookAutoSchedule | toString -}}"
{{ end }}
image: {{ .Values.imageRegistry }}{{ .Values.image.repository }}:{{ .Values.image.tag }}
imagePullPolicy: {{ quote .Values.image.pullPolicy }}
resources:
Expand Down
10 changes: 10 additions & 0 deletions charts/vela-core/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,8 @@ optimize:
##@param featureGates.gzipApplicationRevision compress apprev using gzip (good) before being stored. This is reduces network throughput when dealing with huge apprevs.
##@param featureGates.zstdApplicationRevision compress apprev using zstd (fast and good) before being stored. This is reduces network throughput when dealing with huge apprevs. Note that zstd will be prioritized if you enable other compression options.
##@param featureGates.preDispatchDryRun enable dryrun before dispatching resources. Enable this flag can help prevent unsuccessful dispatch resources entering resourcetracker and improve the user experiences of gc but at the cost of increasing network requests.
##@param featureGates.validateComponentWhenSharding enable component validation in webhook when sharding mode enabled
##@param featureGates.disableWebhookAutoSchedule disable auto schedule for application mutating webhook when sharding enabled
##@param
featureGates:
enableLegacyComponentRevision: false
Expand All @@ -124,6 +126,8 @@ featureGates:
gzipApplicationRevision: false
zstdApplicationRevision: true
preDispatchDryRun: true
validateComponentWhenSharding: false
disableWebhookAutoSchedule: false

## @section MultiCluster parameters

Expand Down Expand Up @@ -268,3 +272,9 @@ authentication:
withUser: true
defaultUser: kubevela:vela-core
groupPattern: kubevela:*

## @param sharding.enabled When sharding enabled, the controller will run as master mode. Refer to https://github.com/kubevela/kubevela/blob/master/design/vela-core/sharding.md for details.
## @param sharding.schedulableShards The shards available for scheduling. If empty, dynamic discovery will be used.
sharding:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can vela install command install kubevela with sharding enabled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

enabled: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't make the default value to be true?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should one user do when they want to upgrade to the sharding mode when they don't enable it at first installation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed we need new role and rolebinding configuration, that might be hard for users to learn

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't make the default value to be true?

By default, in the sharding mode, the master will not work on applications.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should one user do when they want to upgrade to the sharding mode when they don't enable it at first installation

Just run helm upgrade --set sharding.enabled=true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed we need new role and rolebinding configuration, that might be hard for users to learn

User does not need to know the role/rolebinding configuration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, in the sharding mode, the master will not work on applications.

You should add that to the doc and KEP, I missed this message at first time. This also means users must set several slaves when sharding is enabled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the master shard by default also accepts applications. So slaves are not required.

schedulableShards: ""
10 changes: 4 additions & 6 deletions cmd/core/app/options/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,19 @@ limitations under the License.
package options

import (
"flag"
"strconv"
"time"

ctrlrec "github.com/kubevela/pkg/controller/reconciler"
pkgmulticluster "github.com/kubevela/pkg/multicluster"
utillog "github.com/kubevela/pkg/util/log"
wfTypes "github.com/kubevela/workflow/pkg/types"
utilfeature "k8s.io/apiserver/pkg/util/feature"
cliflag "k8s.io/component-base/cli/flag"
"k8s.io/klog/v2"

standardcontroller "github.com/oam-dev/kubevela/pkg/controller"
commonconfig "github.com/oam-dev/kubevela/pkg/controller/common"
"github.com/oam-dev/kubevela/pkg/controller/sharding"
"github.com/oam-dev/kubevela/pkg/oam"
"github.com/oam-dev/kubevela/pkg/resourcekeeper"

Expand Down Expand Up @@ -165,11 +165,9 @@ func (s *CoreOptions) Flags() cliflag.NamedFlagSets {
pkgmulticluster.AddFlags(fss.FlagSet("multicluster"))
ctrlrec.AddFlags(fss.FlagSet("controllerreconciles"))
utilfeature.DefaultMutableFeatureGate.AddFlag(fss.FlagSet("featuregate"))

sharding.AddFlags(fss.FlagSet("sharding"))
kfs := fss.FlagSet("klog")
local := flag.NewFlagSet("klog", flag.ExitOnError)
klog.InitFlags(local)
kfs.AddGoFlagSet(local)
utillog.AddFlags(kfs)

if s.LogDebug {
_ = kfs.Set("v", strconv.Itoa(int(commonconfig.LogDebug)))
Expand Down