New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd charm scale out/scale back breaks API server #43461

Closed
jacekn opened this Issue Mar 21, 2017 · 7 comments

Comments

Projects
None yet
2 participants
@jacekn
Contributor

jacekn commented Mar 21, 2017

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.): No

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): "etcd relation", "etcd charm"


Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:57:05Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:52:01Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: OpenStack
  • OS (e.g. from /etc/os-release): Ubuntu 16.04.2 LTS
  • Kernel (e.g. uname -a): 4.4.0-66-generic
  • Install tools: juju
  • Others:

What happened:
I scaled out etcd from 3 to 5 units and then removed 2 older units. This caused API server to stop working

What you expected to happen:
I expected for everything to stay up after adding/removing etcd units

How to reproduce it (as minimally and precisely as possible):
Deploy environment with juju and then:

$ juju add-unit -n 2 etcd
# wait for units to finish running hooks, remove old units. For example:
$ juju remove-unit etcd/0
$ juju remove-unit etcd/1
# Once complete kubectl stops working:
$ kubectl get nodes
Error from server: client: etcd cluster is unavailable or misconfigured

Anything else we need to know:

This may be related to juju-solutions/bundle-canonical-kubernetes#154

@jacekn

This comment has been minimized.

Show comment
Hide comment
@jacekn

jacekn Mar 21, 2017

Contributor

Another pieced of info - even when relation data changes daemon flags will possibly not be updated due to this code in kubernetes-master layer:

@when('etcd.available', 'kubernetes-master.components.installed',
      'certificates.server.cert.available', 'authentication.setup')
@when_not('kubernetes-master.components.started')
def start_master(etcd, tls):
    '''Run the Kubernetes master components.'''
    hookenv.status_set('maintenance',
                       'Rendering the Kubernetes master systemd files.')
    freeze_service_cidr()
    handle_etcd_relation(etcd)

If I understand above correctly handle_etcd_relation will only be triggered during initial set up due to @when_not('kubernetes-master.components.started')

Contributor

jacekn commented Mar 21, 2017

Another pieced of info - even when relation data changes daemon flags will possibly not be updated due to this code in kubernetes-master layer:

@when('etcd.available', 'kubernetes-master.components.installed',
      'certificates.server.cert.available', 'authentication.setup')
@when_not('kubernetes-master.components.started')
def start_master(etcd, tls):
    '''Run the Kubernetes master components.'''
    hookenv.status_set('maintenance',
                       'Rendering the Kubernetes master systemd files.')
    freeze_service_cidr()
    handle_etcd_relation(etcd)

If I understand above correctly handle_etcd_relation will only be triggered during initial set up due to @when_not('kubernetes-master.components.started')

lazypower added a commit to lazypower/layer-etcd that referenced this issue Mar 23, 2017

Cluster string adjustements
Related to kubernetes/kubernetes#43461
partial fix for not all etcd endpoints being transmitted on the
relationship interface.

lazypower added a commit to lazypower/layer-etcd that referenced this issue Mar 23, 2017

Cluster string logic fix
Related to kubernetes/kubernetes#43461
partial fix for not all etcd endpoints being transmitted on the
relationship interface.
@lazypower

This comment has been minimized.

Show comment
Hide comment
@lazypower

lazypower Mar 24, 2017

Member

The actual root of this issue is that there is no clean way to remove the unit during the departed phase if you are the leader. This works well for non-leader units, but if you ever destroy the leader which you did in the above example, it will unregister all peers and effectively break state.

https://bugs.launchpad.net/juju/+bug/1417874

This bug has been open since 2015 and doesn't have any clear resolution in sight. It's not clear how I can work around this without making it a process dependent solution such as:

  • Run a deregistration action, and then remove unit

I think I can reasonably in some scenarios have the units stop de-registration, and instead run a reconcile loop where if a unit is missing it gets removed, but this doesn't account for network partitions and other point-in-time failure scenarios that are completely recoverable. This is going to require additional engineering time and planning to see resolution.

Member

lazypower commented Mar 24, 2017

The actual root of this issue is that there is no clean way to remove the unit during the departed phase if you are the leader. This works well for non-leader units, but if you ever destroy the leader which you did in the above example, it will unregister all peers and effectively break state.

https://bugs.launchpad.net/juju/+bug/1417874

This bug has been open since 2015 and doesn't have any clear resolution in sight. It's not clear how I can work around this without making it a process dependent solution such as:

  • Run a deregistration action, and then remove unit

I think I can reasonably in some scenarios have the units stop de-registration, and instead run a reconcile loop where if a unit is missing it gets removed, but this doesn't account for network partitions and other point-in-time failure scenarios that are completely recoverable. This is going to require additional engineering time and planning to see resolution.

lazypower added a commit to lazypower/layer-etcd that referenced this issue Mar 24, 2017

Resolve issues with cluster scale when removing the leader
This branch removes the erronious logic when tracking the -departed
relation states to remove all participating peers. Instead we are now
relying on the behavior of the -broken hook to only execute on the unit
that is being removed from the cluster. This may not be perfect but
works 100% better than the current implementation.

Resolves kubernetes/kubernetes#43461

lazypower added a commit to lazypower/layer-etcd that referenced this issue Mar 24, 2017

Resolve issues with cluster scale when removing the leader
This branch removes the erronious logic when tracking the -departed
relation states to remove all participating peers. Instead we are now
relying on the behavior of the -broken hook to only execute on the unit
that is being removed from the cluster. This may not be perfect but
works 100% better than the current implementation.

Resolves kubernetes/kubernetes#43461

marcoceppi added a commit to juju-solutions/layer-etcd that referenced this issue Mar 24, 2017

#85: Resolve issues with cluster scale when removing the leader (fixes
…kubernetes/kubernetes#43461)

This branch removes the erronious logic when tracking the -departed
relation states to remove all participating peers. Instead we are now
relying on the behavior of the -broken hook to only execute on the unit
that is being removed from the cluster. This may not be perfect but
works 100% better than the current implementation.
@lazypower

This comment has been minimized.

Show comment
Hide comment
@lazypower

lazypower Mar 25, 2017

Member

After some re-negotiation and exploration, we've settled on this method which uses the relation-broken context to allow the unit to self unregister with the currently identified juju leader. This should resolve the above without much further fuss. Removing units (including the leader) yielded successful test results.

Member

lazypower commented Mar 25, 2017

After some re-negotiation and exploration, we've settled on this method which uses the relation-broken context to allow the unit to self unregister with the currently identified juju leader. This should resolve the above without much further fuss. Removing units (including the leader) yielded successful test results.

@jacekn

This comment has been minimized.

Show comment
Hide comment
@jacekn

jacekn Mar 27, 2017

Contributor

@chuckbutler there is one more thing to consider. What happens if current leader disappears temporary, for example due to hardware problems? If that happens relation-broken hooks will not be triggered.

The solution could be to send all cluster members in relation data so that clients can talk to remaining cluster members. This was reported separately here: juju-solutions/bundle-canonical-kubernetes#154

Contributor

jacekn commented Mar 27, 2017

@chuckbutler there is one more thing to consider. What happens if current leader disappears temporary, for example due to hardware problems? If that happens relation-broken hooks will not be triggered.

The solution could be to send all cluster members in relation data so that clients can talk to remaining cluster members. This was reported separately here: juju-solutions/bundle-canonical-kubernetes#154

@lazypower

This comment has been minimized.

Show comment
Hide comment
@lazypower

lazypower Apr 4, 2017

Member

@jacekn this was resolved with the connection string fix in juju-solutions/layer-etcd#84

I'm 98% certain this resolves your concern due to the nature of etcd, raft, and how these systems work with the fully-formed cluster string with each unit. If a unit is not part of the cluster it should fail over to another member thats reporting healthy. If that's not the case I'd like to track that as a separate bug.

Member

lazypower commented Apr 4, 2017

@jacekn this was resolved with the connection string fix in juju-solutions/layer-etcd#84

I'm 98% certain this resolves your concern due to the nature of etcd, raft, and how these systems work with the fully-formed cluster string with each unit. If a unit is not part of the cluster it should fail over to another member thats reporting healthy. If that's not the case I'd like to track that as a separate bug.

@jacekn

This comment has been minimized.

Show comment
Hide comment
@jacekn

jacekn Apr 20, 2017

Contributor

@chuckbutler I latest layer-etcd and can confirm that connection strings are now handled correctly when units are added/removed.

Thanks!

Contributor

jacekn commented Apr 20, 2017

@chuckbutler I latest layer-etcd and can confirm that connection strings are now handled correctly when units are added/removed.

Thanks!

@jacekn jacekn closed this Apr 20, 2017

@jacekn

This comment has been minimized.

Show comment
Hide comment
@jacekn

jacekn Apr 21, 2017

Contributor

@chuckbutler sorry about it but after more careful testing this is not fully resolved.

Connection strings are now fine in the relation data. However after extra testing I discovered that kubernetes-master config is still not updated on relation change. This is the file on k8s-master that I expect to be updated:
/var/snap/kube-apiserver/current/args

--etcd-servers option in particular. This is most likely due to this code in the kubernetes_master.py code, around line 275:

@when('etcd.available', 'tls_client.server.certificate.saved',
      'authentication.setup')
@when_not('kubernetes-master.components.started')
def start_master(etcd):
    '''Run the Kubernetes master components.'''
    hookenv.status_set('maintenance',
                       'Configuring the Kubernetes master services.')
    freeze_service_cidr()
    handle_etcd_relation(etcd)

handle_etcd_relation is only invoked when kubernetes-master.components.started state is not set.
To verify this I added 2nd kubernetes-master unit and the unit ended up with correct etcd address

Contributor

jacekn commented Apr 21, 2017

@chuckbutler sorry about it but after more careful testing this is not fully resolved.

Connection strings are now fine in the relation data. However after extra testing I discovered that kubernetes-master config is still not updated on relation change. This is the file on k8s-master that I expect to be updated:
/var/snap/kube-apiserver/current/args

--etcd-servers option in particular. This is most likely due to this code in the kubernetes_master.py code, around line 275:

@when('etcd.available', 'tls_client.server.certificate.saved',
      'authentication.setup')
@when_not('kubernetes-master.components.started')
def start_master(etcd):
    '''Run the Kubernetes master components.'''
    hookenv.status_set('maintenance',
                       'Configuring the Kubernetes master services.')
    freeze_service_cidr()
    handle_etcd_relation(etcd)

handle_etcd_relation is only invoked when kubernetes-master.components.started state is not set.
To verify this I added 2nd kubernetes-master unit and the unit ended up with correct etcd address

@jacekn jacekn reopened this Apr 21, 2017

lazypower added a commit to lazypower/kubernetes that referenced this issue Apr 26, 2017

Fixes kubernetes#43461
The master-components started state triggers a daemon recycle. The guard
was to prevent the daemons from being cycled too often and interrupting
normal workflow. This additional state check is guarded against the etcd
connection string from changing, allowing the current behavior but
triggers a re-configure and recycle of the api-control plane when etcd
units are scaling up and down.

k8s-merge-robot added a commit that referenced this issue Apr 26, 2017

Merge pull request #44967 from chuckbutler/43461
Automatic merge from submit-queue

Fixes #43461

**What this PR does / why we need it**:
The master-components started state triggers a daemon recycle. The guard
was to prevent the daemons from being cycled too often and interrupting
normal workflow. This additional state check is probing the etcd
connection and if changing triggers a re-configure and recycle of the api-control 
plane when etcd units are scaling up and down.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #43461

**Special notes for your reviewer**:
Check the contents of /var/snap/kube-apiserver/current/args after scaling etcd both up and down and the values will have changed, and kube-apiserver will have recycled to read the new connection string.

**Release note**:

```release-note
kubernetes-master juju charm properly detects etcd-scale events and reconfigures appropriately.
```

lazypower added a commit to lazypower/bundle-canonical-kubernetes that referenced this issue Apr 28, 2017

Cleanup the bundle tests to work post snap refactor
Also adds an etcd scale test to validate that
kubernetes/kubernetes#43461 does not surface
again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment