Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd operator storage, crd and certificate issues #75

Closed
mjudeikis opened this issue Jul 13, 2018 · 5 comments
Closed

etcd operator storage, crd and certificate issues #75

mjudeikis opened this issue Jul 13, 2018 · 5 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@mjudeikis
Copy link
Contributor

The plan was to use etcd-operators. Now where we struggle.

First, we need to know the namespace when generating certificates. This is because of
https://github.com/coreos/etcd-operator/blob/70d3bd74960dc7127870a393affffbe1df94728e/pkg/util/etcdutil/member.go#L38-L40
The result is that etcd advertises itself with name.namespace.svc and we need to have this in the certificates.

Second (and a little bit bigger on) is storage.
First, etcd-operator online contains multiple misleading docs, examples. So we rely on code.

  1. Etcd pods itself does not have any persistence.
    https://github.com/coreos/etcd-operator/blob/master/pkg/apis/etcd/v1beta2/cluster.go#L137
    Upstream issue:
    Persistent/Durable etcd cluster coreos/etcd-operator#1323

Idea is we run in memory and backup constantly. In DR situation if a single pod is alive - the operator will recover. If all pods restart - recovery is done using etcd-restore-operator and restore is done from backup.

For this we need etcd-backup and etcd-restore operators.
backup operator supports 2 backup methods (Azure ASB and AWS S3) https://github.com/coreos/etcd-operator/blob/master/pkg/apis/etcd/v1beta2/backup_types.go#L19-L28

Configuration is what causes an issue. We need to have secret with storage account name and key.
https://github.com/coreos/etcd-operator/blob/master/doc/design/abs_backup.md

This means pre-requested are:

  1. Storage account created.
  2. Key available during creation of secret.

We don't want to create a storage account during ARM deployment as is not a client facing configuration and artifact. We could use one storage account with multiple buckets per customer. And inject from the backend.

Last one issue is helm ordering for CRD:
helm/helm#2994
TL;DR: When helm created CRD it takes some time for the cluster to accept them. Creating CRD resources fails as it's not yet available.

In addition, we dont want to manage global CRD's for all users from the user configuration side. If CRD is deleted - all etcd cluster are deleted too. It would look like we need to manage them outside azure-helm as part of HCP management.

cc: @jim-minter @Kargakis @pweil-

@mjudeikis mjudeikis mentioned this issue Jul 13, 2018
8 tasks
@0xmichalis
Copy link
Contributor

0xmichalis commented Jul 13, 2018

The result is that etcd advertises itself with name.namespace.svc and we need to have this in the certificates.

How about opening a PR in etcd-operator repo to make this configurable?

Second (and a little bit bigger on) is storage.

How about an init container in the etcd operator deployment, to ensure that both the azure container and storage account are created before etcd comes up? And move backup operator to run as a second container in the etcd operator deployment?

In addition, we dont want to manage global CRD's for all users from the user configuration side. If CRD is deleted - all etcd cluster are deleted too. It would look like we need to manage them outside azure-helm as part of HCP management.

I like to think of the CRD as a global default in every underlay cluster. Our etcd operators shouldn't need cluster-wide access in order to create/delete the CRD.

@mjudeikis
Copy link
Contributor Author

How about an init container in the etcd operator deployment, to ensure that both the azure container and storage account are created before etcd comes up? And move backup operator to run as a second container in the etcd operator deployment?

We could do something like this. Run all 3 operators in one pod. Question is it acceptable that etcd-controller provisions storage for itself and creates a secret for it (I think yes). The question which credential we should use to storage all these "backup storage accounts"

I like to think of the CRD as a global default in every underlay cluster. Our etcd operators shouldn't need cluster-wide access in order to create/delete the CRD.

Agreed. We just need nice way to manage and lifecycle them too,.

@pweil-
Copy link
Contributor

pweil- commented Jul 16, 2018 via email

@mjudeikis
Copy link
Contributor Author

@pweil- It can be used but we need to agree on how. Sync call would be good. Check google doc in the email with small review. Is there any change I can get access to azure repo too?

@0xmichalis
Copy link
Contributor

/kind feature

@openshift-ci-robot openshift-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants