New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use an external etcd cluster #2216

Open
egalano opened this Issue Mar 28, 2017 · 21 comments

Comments

Projects
None yet
10 participants
@egalano
Copy link

egalano commented Mar 28, 2017

Would it provide additional stability to run Etcd outside of the Kubernetes cluster? If so, how can we customize the etcd member list for kops would we be able to modify this with a kops edit cluster?

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented May 1, 2017

We do hot have the functionality at this point. Always happy to have contributions!

@Cryptophobia

This comment has been minimized.

Copy link
Contributor

Cryptophobia commented Jun 29, 2017

@chrislovecnm : This is a nice feature to have and much needed. Lots of people are having problems with etcd being filled to capacity and crashing and losing all of their data.

Could someone point me to the areas of the code and the files that need to be changed to make this feature possible?

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Jun 29, 2017

@Cryptophobia if you want to ping me on slack you can.

Quick brain dump.

We would need to do the following items:

  1. Do you want kops to create the cluster as well or just use an external cluster?
  2. We would need to provide the capability for a kops / nodeup / protokube to not create an etcd cluster.
    • API changes - /pkg/apis/kops
    • protokube changes /protokube
    • nodeup changes /nodeup
  3. Ensure that all components such as kubeAPIServer have needed configurations for connecting to etcd.
  4. Integration testing
  5. Unit tests
  6. Documentation

I would like have kops to maintain, and upgrade etcd only servers as well.

@geekofalltrades

This comment has been minimized.

Copy link
Contributor

geekofalltrades commented Oct 12, 2017

I would love to see some priority on this. The Kubernetes documentation explicitly recommends that Kubernetes' state be kept on a separate, dedicated etcd cluster.

Keeping stable etcd clusters is critical to the stability of Kubernetes clusters. Therefore, run etcd clusters on dedicated machines or isolated environments for guaranteed resource requirements.

I was also perplexed to find that the etcd static pods created by kops are not even storing their state in PersistentVolumes, instead storing their state directly on the root volume of the master node. Although I guess that's a chicken-and-egg problem, since you need Kubernetes for PersistentVolumes, and you need etcd for Kubernetes.

I was also noticing that kops installed etcd 2.2.1. At some point, etcd 2 will be end-of-lifed, and then everyone will have to move their etcds to etcd v3. In my experience, their migration process just doesn't work, so it behooves everyone to start on etcd 3.

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Oct 12, 2017

@geekofalltrades how do we get PVC's before k8s is started? We have a chicken and the egg problem with k8s, in both a recovery and bootstrapping situation. We already support etcd3, but install etcd2 by default.

We are completely open to having etcd externalized in kops, but there has never been someone who wants to code it. It is not even that complex. The one thing I would love to see is hard data on why having etcd is recommended. I have had many conversations about it, but nobody can example from a production stand-point or data-driven standpoint why I should be recommending putting etcd off of the cluster.

@geekofalltrades

This comment has been minimized.

Copy link
Contributor

geekofalltrades commented Oct 12, 2017

@chrislovecnm how do I change the etcd version?

@Cryptophobia

This comment has been minimized.

Copy link
Contributor

Cryptophobia commented Oct 14, 2017

@chrislovecnm , I would be willing to contribute in my free time and will ping you on slack to follow up on this.

The main argument is that all of k8s is dependent on the state of etcd cluster and when it is filled beyond capacity it crashes. It becomes a single point of failure for the whole cluster. Not to mention that routing and DNS other configurations depend on it.

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Oct 14, 2017

@geekofalltrades we have etcd 3 version stuff under docs, it is not released quite yet. We do not support upgrading etcd yet.

@clockworksoul

This comment has been minimized.

Copy link

clockworksoul commented Oct 23, 2017

External etcd is a requirement for my company's K8S setups, so this would be a very convenient little feature to have.

As long as I'm not stepping on any toes, I'll try to carve out some time to take a crack at it as well.

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Oct 23, 2017

@clockworksoul have at it. A good place to start is an external etcd section in our API. I would just PR that and get feedback.

Any auto information will need to be handled as secrets.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 21, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@Cryptophobia

This comment has been minimized.

Copy link
Contributor

Cryptophobia commented Jan 27, 2018

/remove-lifecycle stale

1 similar comment
@Cryptophobia

This comment has been minimized.

Copy link
Contributor

Cryptophobia commented Jan 27, 2018

/remove-lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Apr 27, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented May 27, 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jun 26, 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@idealhack

This comment has been minimized.

Copy link
Member

idealhack commented Sep 13, 2018

/remove-lifecycle rotten

I think this is still relevant.

@idealhack

This comment has been minimized.

Copy link
Member

idealhack commented Sep 13, 2018

/reopen
/lifecycle frozen

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Sep 13, 2018

@idealhack: Reopening this issue.

In response to this:

/reopen
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lonynamer

This comment has been minimized.

Copy link

lonynamer commented Oct 8, 2018

Where etcd is kept. I do an aws installation. It creates some files on S3. What are them ? And where real etcd is kept ?

@KIVagant

This comment has been minimized.

Copy link

KIVagant commented Oct 19, 2018

Could someone advice what needs to be done to manually switch a virgin K8s cluster created by KOPS to an external etcd?

  1. Do I need to edit the system.d service
  2. Do I need to delete/update anything related to etcd-server-events-master-* and etcd-server-master-* deployments?
  3. What is the purpose of etcd-server-events?
  4. Do I need to delete any volumes (such as EBS in AWS) created by Kops after ETCD will be switched?
  5. Will kops validate cluster fail after this operations and how to prevent it from failing?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment