Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable etcd cluster scaling for userclusters #5571

Merged
merged 11 commits into from
Jul 16, 2020

Conversation

moelsayed
Copy link
Contributor

@moelsayed moelsayed commented Jun 25, 2020

What this PR does / why we need it:

  • Enables configurable etcd cluster size in the user cluster spec.
  • Managed scale up/down for etcd stateful set.
  • Automatic etcd member join and reconcile.
  • Adds http based liveness probe for etcd pods
    Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
    Fixes Support configurable user cluster etcd cluster size #5545

Special notes for your reviewer:

Documentation:

Does this PR introduce a user-facing change?:

Adds configurable etcd cluster size to user cluster spec. 

@kubermatic-bot kubermatic-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. team/lifecycle size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 25, 2020
@kubermatic-bot kubermatic-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 1, 2020
@moelsayed moelsayed changed the title [WIP] Enable etcd cluster scaling for userclusters Enable etcd cluster scaling for userclusters Jul 1, 2020
@kubermatic-bot kubermatic-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 1, 2020
if etcdClusterSize > replicas {
return replicas + 1
}
return replicas - 1
Copy link
Contributor

@multi-io multi-io Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with a pre-existing, running cluster with 3 replicas. The statefulset ended up being reduced to 2 replicas, I assume here (.Spec.EtcdClusterSize not set => etcdClusterSize:=0; replicas:=3; isEtcdHealthy:=true => return 2). The etcd launcher then refuses to launch a 2-node cluster because that's smaller than the defaultClusterSize (3), leading to a permanently non-running cluster. Is the idea that the admin must always set .Spec.EtcdClusterSize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's defaulted here, which runs as part of the API server. I am not 100% sure if it will default existing clusters as well. But it is a good point, we should be more defensive here.

Copy link
Contributor

@multi-io multi-io Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that an existing cluster is migrated to the new launcher properly if .spec.etcdClusterSize is set to 3 first. Maybe we should just treat an unspecified (==0) value for the field as if the field was set to 3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's defaulted here, which runs as part of the API server. I am not 100% sure if it will default existing clusters as well. But it is a good point, we should be more defensive here.

Yeah the defaulting I saw too, but I'm pretty sure that's only called when new clusters are created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@multi-io I just pushed a fix for it. Can you please confirm it's working?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed. Existing etcd cluster was migrated without having to manually set .spec.etcdClusterSize or anything else in the cluster resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// not required, will leave it for now.
os.Setenv(initialStateEnvName, "new")
os.Setenv(initialClusterEnvName, initialMemberList(config.clusterSize, config.namespace))
os.Setenv(initialStateEnvName, e.config.initialState)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line and the line underneath it has no error check, I know the original code had no error check either but would make since to add some checks here.

os.Setenv(initialStateEnvName, "new")
os.Setenv(initialClusterEnvName, initialMemberList(config.clusterSize, config.namespace))
os.Setenv(initialStateEnvName, e.config.initialState)
os.Setenv(initialClusterEnvName, strings.Join(initialMembers, ","))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing regarding this line and the one above, are you using it for debugging? because normally env vars are populated via the job instead of setting explicitly in the code.

return fmt.Sprintf("%s.etcd.%s.svc.cluster.local:2380", e.config.podName, e.config.namespace)
}

func (e *etcdCluster) getConfigFromEnv() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we actually pass configs to the bin/cmd, is via having the args/options pattern. In other words, we pass those configs as flags to the cmd instead of env vars. So it follows as:

  • From out side(e.g: job runner) env vars are being created and passed to the container for example.
  • Inside the container/job the configs are passed as flags.

For more information take a look at the user-cluster-controller-manager and how those args are read via the hack script

cmd/etcd-launcher/main.go Show resolved Hide resolved
etcdCmd(config),
os.Environ())
// setup and start etcd command
cmd := exec.Command("/usr/local/bin/etcd", etcdCmd(e.config)...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we make suer that etcd cmd exists before doing any calls or preparation? if os.Status("") => not exist return an error as early as possible?

cmd/etcd-launcher/main.go Show resolved Hide resolved
// just get a key from etcd, this is how `etcdctl endpoint health` works!
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
_, err = client.Get(ctx, "healthy")
cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:maybe check the error then cancel? defer cancel()

// just get a key from etcd, this is how `etcdctl endpoint health` works!
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
_, err := e.client.Get(ctx, "healthy")
cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:maybe check the error then cancel? defer cancel()

return cmd
}

func (e *etcdCluster) getClient() error {
if e.client != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this one, so when the client is not nil you are returning nil, shouldn't you return the client here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client is set in the etcdCluster struct. If it's nil, I create it and assign it to e.client instead of passing it around.


// if existing, we need to reconcile
var healthy bool
for i := 0; i < 5; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here you are trying to make sure that the storage is healthy, so you are trying to get valid results in 5 times. In general that's valid of course but what about adding some semantics something similar to this

	var (
		tries    int
		maxTries int
                healthy bool
	)

	for tries < maxTries {
		if healthy, err = e.isHealthy(); !healthy || err != nil {
			log.Error("...")
			time.Sleep(500 * time.Milisecond)
			tries++
			continue
		}
		log.Info("...")
		break
	}

      if !healthy || err != nil {
       log.Fatal("failed for ever")
     } 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you can try out this lovely package:
https://godoc.org/k8s.io/apimachinery/pkg/util/wait#Poll

@multi-io
Copy link
Contributor

multi-io commented Jul 3, 2020

I think the cluster size field should be .spec.etcd.clusterSize instead of .spec.etcdClusterSize, i.e. have a spec.etcd map in there that can accommodate future etcd-related settings.

for { // reconcile dead members
members, err := e.listMembers()
if err != nil {
time.Sleep(10 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we log the error in the for loop? this line and the lines underneath it?

continue
}
}
} else { // new etcd member, need to join hte cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/hte/the

}

config.clusterSize = defaultClusterSize
if s := os.Getenv("ECTD_CLUSTER_SIZE"); s != "" {
if config.clusterSize, err = strconv.Atoi(s); err != nil {
return nil, fmt.Errorf("failed to read ECTD_CLUSTER_SIZE: %v", err)
return fmt.Errorf("failed to read ECTD_CLUSTER_SIZE: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/ECTD/ETCD

if config.clusterSize > defaultClusterSize {
return nil, fmt.Errorf("ECTD_CLUSTER_SIZE is smaller then %d", defaultClusterSize)
if config.clusterSize < defaultClusterSize {
return fmt.Errorf("ECTD_CLUSTER_SIZE is smaller then %d", defaultClusterSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/ECTD/ETCD

typo: s/then/than

if err = e.getLocalClient(); err != nil {
return false, err
}
var resp *clientv3.StatusResponse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be derived instead and declared inside the for loop since you are not using outside of it:

resp, err = e.localClient.Status(context.Background(), e.endpoint())

continue
}
if !healthy {
if _, err := e.client.MemberRemove(context.Background(), member.ID); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the check here and you are not wrap the error, thus remove the if statement please and return the method.

return e.client.MemberRemove(context.Background(), member.ID)

@moelsayed moelsayed force-pushed the resize_etcd branch 2 times, most recently from e95e6f8 to 5780ffd Compare July 7, 2020 16:23
@moelsayed
Copy link
Contributor Author

/retest

1 similar comment
@moelsayed
Copy link
Contributor Author

/retest

"strings"
"time"

kubermaticlog "github.com/kubermatic/kubermatic/pkg/log"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please regroup the imports respectively :

  • go sdk imports("context")
  • third party imports("go.uber.org/zap")
  • kubermatic/machine-controller imports
  • k8s imports

@@ -88,30 +89,40 @@ func main() {
for { // reconcile dead members
members, err := e.listMembers()
if err != nil {
log.Warnf("failed to list memebers: %v ", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use logging with additional context with error wrapping:
log.Warnw("failed to list memebers ", zap.Error(err))


if _, err := os.Stat(etcdCommandPath); os.IsNotExist(err) {
log.Fatalf("can't find etcd command [%s]: %v", etcdCommandPath, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the pattern that we use regaring contextual logs:
log.Fatalw("can't find command","command-path", etcdCommandPath, zap.Error(err))

break
}
// to avoide race conditions, we will run only on the cluster leader
leader, err := e.isLeader()
if err != nil || !leader {
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to check the error two times, instead make the log message more general:
log.Warnw("failed to remove member, error occurred or didn't get the current leader", zap.Error(err))

In this case, if there was an error it will be printed out as an error

time.Sleep(10 * time.Second)
continue
}
if err := e.removeDeadMembers(); err != nil {
if err := e.removeDeadMembers(log); err != nil {
log.Warnf("failed to remove member: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adjust.
log.Warnw("failed to remove member", zap.Error(err))

if err = wait.Poll(1*time.Second, 30*time.Second, func() (bool, error) {
return e.isEndpointHealthy(member.PeerURLs[0])
}); err != nil {
log.Infof("member [%s] is not responding, removing from cluster", member.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use:
log.Infow("member is not responding, removing from cluster", "member-name", member.Name)

if err != nil {
log.Fatalf("failed to get launcher configuration: %v", err)
}

logOpts := kubermaticlog.NewDefaultOptions()
rawLog := kubermaticlog.New(logOpts.Debug, logOpts.Format)
log := rawLog.Sugar()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable collides with the "log" imported package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imported package is aliased, so it should be ok?

@@ -82,6 +82,21 @@ func (r *Reconciler) syncHealth(ctx context.Context, cluster *kubermaticv1.Clust
if err != nil {
return err
}
// set ClusterConditionEtcdClusterInitialized, this should be don't only once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the comment, can you please make it clearer :-)

},
// {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not needed can you please remove it?

return etcdClusterSize
}
replicas := int(*set.Spec.Replicas)
isEtcdHealthy := data.Cluster().Status.ExtendedHealth.Etcd == kubermaticv1.HealthStatusUp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move this line to 341, as this line is evaluated and not used if the etcdClusterSize == replicas

@moelsayed
Copy link
Contributor Author

/test pre-kubermatic-e2e-aws-flatcar-1.18

@moelsayed
Copy link
Contributor Author

/retest

1 similar comment
@moelsayed
Copy link
Contributor Author

/retest

@moadqassem
Copy link
Member

/approve
/lgtm

@kubermatic-bot kubermatic-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 16, 2020
@kubermatic-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 95798e4abc12a40a47b037e6c9551941f4531c5c

@kubermatic-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: moadqassem, moelsayed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubermatic-triage-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs

Review the full test history

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

Also, here is a cat.
/meow

@kubermatic-bot
Copy link
Contributor

@kubermatic-triage-bot: cat image

In response to this:

/retest
This bot automatically retries jobs that failed/flaked on approved PRs

Review the full test history

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

Also, here is a cat.
/meow

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@moelsayed
Copy link
Contributor Author

/retest

@kubermatic-bot kubermatic-bot merged commit 77fb2bd into kubermatic:master Jul 16, 2020
@kubermatic-bot kubermatic-bot added this to the v2.15 milestone Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support configurable user cluster etcd cluster size
5 participants