Skip to content
This repository has been archived by the owner on Apr 30, 2020. It is now read-only.

Automatic replacement of failed nodes #20

Open
JohnStrunk opened this issue Jun 27, 2018 · 0 comments
Open

Automatic replacement of failed nodes #20

JohnStrunk opened this issue Jun 27, 2018 · 0 comments
Labels
epic Large, multi-issue feature set needs-subtasks Issue needs to be sub-divided into smaller items
Projects

Comments

@JohnStrunk
Copy link
Member

JohnStrunk commented Jun 27, 2018

Describe the feature you'd like to have.
When a gluster pod fails, kube will attempt to restart it; if it was a simple crash or other transient problem, this should be sufficient to repair the system (plus automatic heal). However, if the node's state becomes corrupt or is lost, it may be necessary to remove the failed node from the cluster and potentially spawn a new one to take its place.

What is the value to the end user? (why is it a priority?)
If a gluster node (pod) remains offline, the associated bricks will have a reduced level of availability & reliability. Being able to automatically repair failures will help increase system availability and protect users' data.

How will we know we have a good solution? (acceptance criteria)

  • Kubernetes will act as the 1st line of defense, restarting failed Gluster pods
  • A Gluster pod that remains offline from the gluster cluster for an extended period of time will have its bricks moved to other Gluster nodes (by GD2). Permissible downtime should be configurable.
  • Gluster nodes that have been "abandoned" by GD2 should be removed from the TSP and destroyed by the operator
  • Ability to mark a node via the CR such that it will not be subject to replacement (abandonment by GD2 nor destruction by the operator). This is necessary in cases where a Gluster node is expected to be temporarily unavailable (i.e., scheduled downtime or other maintenance).

Additional context
This relies on the node state machine (#17) and an, as yet, unimplemented GD2 automigration plugin.

@JohnStrunk JohnStrunk added epic Large, multi-issue feature set needs-subtasks Issue needs to be sub-divided into smaller items labels Jun 27, 2018
@JohnStrunk JohnStrunk added this to the 1.0 milestone Jun 27, 2018
@JohnStrunk JohnStrunk added this to Incoming in Planning via automation Jun 27, 2018
@JohnStrunk JohnStrunk moved this from Incoming to Epics in Planning Jun 28, 2018
@JohnStrunk JohnStrunk removed this from the 1.0 milestone Sep 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
epic Large, multi-issue feature set needs-subtasks Issue needs to be sub-divided into smaller items
Projects
Development

No branches or pull requests

1 participant