Skip to content

Commit

Permalink
Merge pull request #14 from sapcc/transition
Browse files Browse the repository at this point in the history
Add support for configurable transition targets and notification schedules
  • Loading branch information
Nuckal777 committed Feb 28, 2022
2 parents ac33ddc + e995437 commit e0e33b8
Show file tree
Hide file tree
Showing 34 changed files with 1,674 additions and 927 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ ENVTEST_ASSETS_DIR=$(shell pwd)/testbin
test: manifests generate fmt vet ## Run tests.
mkdir -p ${ENVTEST_ASSETS_DIR}
test -f ${ENVTEST_ASSETS_DIR}/setup-envtest.sh || curl -sSLo ${ENVTEST_ASSETS_DIR}/setup-envtest.sh https://raw.githubusercontent.com/kubernetes-sigs/controller-runtime/v0.8.3/hack/setup-envtest.sh
source ${ENVTEST_ASSETS_DIR}/setup-envtest.sh; fetch_envtest_tools $(ENVTEST_ASSETS_DIR); setup_envtest_env $(ENVTEST_ASSETS_DIR); go test ./controllers ./esx ./event ./kubernikus ./plugin ./plugin/impl ./state -coverprofile=coverage.out -coverpkg github.com/sapcc/maintenance-controller/...
source ${ENVTEST_ASSETS_DIR}/setup-envtest.sh; fetch_envtest_tools $(ENVTEST_ASSETS_DIR); setup_envtest_env $(ENVTEST_ASSETS_DIR); go test ./common ./controllers ./esx ./event ./kubernikus ./plugin ./plugin/impl ./state -coverprofile=coverage.out -coverpkg github.com/sapcc/maintenance-controller/...

##@ Build

Expand Down
256 changes: 156 additions & 100 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,10 @@ A Kubernetes controller to manage node maintenance.
- Concept
- Installation
- Configuration
- General
- Check Plugins
- Notification Plugins
- Notification Schedules
- Trigger Plugins
- Additional integrations
- Example configuration for flatcar update agents
Expand All @@ -28,7 +30,7 @@ It is built with flexibility in mind and should be adaptable to different enviro
This property is achieved with an extensible plugin system.

## Concept
Kubernetes nodes are modelled as finite state machines and can be in one of three states.
Kubernetes nodes are modelled as finite state machines, which can be in one of the following three states:
- Operational
- Maintenance Required
- In Maintenance
Expand All @@ -39,11 +41,10 @@ Such plugin chains can be configured for each state individually via maintenance
Cluster administrators can assign a maintenance profile to a node using the `cloud.sap/maintenance-profile` label.
Before the transition is finished a chain of "trigger plugins" can be invoked, which can perform any action related to termination or startup logic.
While a node is in a certain state, a chain of "notifications plugins" informs the cluster users and administrators regularly about the node being in that state.
Multiple plugins exist.
It is possible to check or alter labels, to be notified via Slack, ...
Multiple plugins exist, so one can check or alter labels, be notified via Slack and so on.

Currently, most actual maintenance actions like Cordoning, Draining and Rebooting nodes are not carried out by the maintenance-controller and are instead delegated to inbuilt or external other controllers.
The maintenance-controller only does the decision making, whether a node can be maintained or not.
Currently, most actual maintenance actions like Cordoning, Draining and Rebooting nodes are not carried out by the maintenance-controller and are instead delegated to inbuilt or external other controllers.
Check out the additional integrations further down.

## Installation
Expand All @@ -53,6 +54,21 @@ Alternatively, execute ```make deploy IMG=sapcc/maintenance-controller```.

## Configuration

### General
The maintenance-controller contains multiple plugins, which are configurable themselves.
`checkLabel` for example needs to know, which label needs to checked for which value.
The combination of a plugin type like `checkLabel` and its specific configuration is referred to as an instance.
Notification instances require a schedule, which describes when and how often to notify about state changes, additionally.
These instances can be chained together to construct more complex check, trigger and notification actions.
In that regard plugin chains refer to instances being used in conjunction.

Profiles describe a single maintenance workflow each by specifying how a node moves through the state machine.
For each state a notification chain can be configured.
Also transitions have to be defined.
These consist of at least of a check chain and the state, which should follow next.
Optionally, a trigger chain can be configured to perform actions, when a node moves from one state into the next one.

### Format
There is a global configuration, which defines some general options, plugin instances and maintenance profiles.
The global configuration should be named ```./config/maintenance.yaml``` and should be placed relative to the controllers working directory preferably via a Kubernetes secret or a config map.
A secret is recommend as some plugins may need authentication data.
Expand All @@ -61,61 +77,71 @@ The basic structure looks like the following:
intervals:
# defines the minimum duration after which a node should be checked again
requeue: 200ms
# defines how frequent to send reminder notifications
# notifications about nodes being operational occur only once
notify: 500ms
# plugin instances are the combination of a plugin and its configuration
instances:
# the are no notification plugins configured here, but their configuration works the same way as for check and trigger plugins
notify: null
# notification plugin instances
notify:
- type: slack # the plugin type
name: somenotificationplugin
config:
hook: slack-webhook
channel: "#the_channel"
message: the message
# notification schedule
schedule:
type: periodic
config:
interval: 24h
# check plugin instances
check:
# the list entries define the chosen plugin type
- hasLabel:
# name of the instance, which is used in the plugin chain configurations
# do not use spaces or other special characters, besides the underscore, which is allowed
name: transition
# the configuration for the plugin. That block depends on the plugin type
config:
key: transition
value: "true"
- type: hasLabel # the plugin type
# name of the instance, which is used in the plugin chain configurations
# do not use spaces or other special characters, besides the underscore, which is allowed
name: transition
# the configuration for the plugin. That block depends on the plugin type
config:
key: transition
value: "true"
# trigger plugin instances
trigger:
- alterLabel:
name: alter
config:
key: alter
value: "true"
remove: false
- type: alterLabel
name: alter
config:
key: alter
value: "true"
remove: false
profiles:
# define a maintenance profile called someprofile
someprofile:
# define the plugin chains for the operational state
operational:
# define a maintenance profile called someprofile
- name: someprofile
# define the plugin chains for the operational state
operational:
# the notification instances to invoke while in the operational state
notify: somenotificationplugin
transitions:
# the exit condition for the operational state refers to the "transition" plugin instance defined in the instances section
check: transition
# the notification instances to invoke while in the operational state
notify: somenotificationplugin
- check: transition
# the trigger instances which are invoked when leaving the operational state
trigger: alter
# define the plugin chains for the maintenance-required state
maintenance-required:
# define chains as shown with the operational state
check: null
notify: null
trigger: null
# define plugin chains for the in-maintenance state
in-maintenance:
# the following state after passing checks and executing triggers
next: maintenance-required
# define the plugin chains for the maintenance-required state
maintenance-required:
# define chains as shown with the operational state
notify: null
transitions: null
# define plugin chains for the in-maintenance state
in-maintenance:
# multiple notification instances can be used
notify: g && h
transitions:
# check chains support boolean operations which evaluate multiple instances
check: transition && !(a || b)
# multiple notification instances can be used also
notify: g && h
- check: "transition && !(a || b)"
# multiple trigger instances can be used also
trigger: t && u
```
Chains be undefined or empty.
Trigger and Notification chains are configured by specifying the desired instance names separated by ```&&```, e.g. ```alter && othertriggerplugin```
Check chains be build using boolean expression, e.g. ```transition && !(a || b)```
Chains can be undefined or empty.
Trigger and Notification chains are configured by specifying the desired instance names separated by ```&&```, e.g. ```alter && othertriggerplugin```.
Check chains are build using boolean expression, e.g. ```transition && !(a || b)```.
To attach a maintenance profile to a node, the label ```cloud.sap/maintenance-profile=NAME``` has to be assigned the desired profile name.
If that label is not present on a node the controller will use the ```default``` profile, which does nothing at all.
The default profile can be reconfigured, if it is defined within the config file.
Expand Down Expand Up @@ -231,6 +257,22 @@ One can get the current profile in a template using `{{ .Profile.Current }}`.
Be careful about using it in an instance that is invoked during the `operational` state, as all profiles attached to a node are considered for notification.
`{{ .Profile.Last }}` can be used instead, which refers to profile that caused the last state transition.

### Notification Schedules
__periodic__: Notifies after a state change and when the specified interval passed since the last notification if the node is currently not in the operational state.
This reflects the old implicit notification behavior.
```yaml
type: periodic
config:
interval: a duration according to the rules of golangs time.ParseDuration(), required
```
__scheduled__: Notifies at a certain time only on specified weekdays.
```yaml
type: scheduled
config:
instant: the point in time, when the notification should be sent, "hh:mm" format, required
weekdays: weekdays when notification should be sent, e.g. [monday, tuesday, wednesday, thursday, friday, saturday, sunday], required
```

### Trigger Plugins
__alterAnnotation:__ Adds, changes or removes an annotation
```yaml
Expand All @@ -255,63 +297,77 @@ config:
This example requires that the Flatcar-Linux-Update-Agent is present on the nodes.
```yaml
intervals:
requeue: 60s
notify: 5h
requeue: 60s
notify: 5h
instances:
notify:
- slack:
name: approval_required
config:
hook: Your hook
channel: Your channel
message: |
The node {{ .Node.Name }} requires maintenance. Manual approval is required.
Approve to drain and reboot this node by running:
`kubectl annotate node {{ .Node.Name }} cloud.sap/maintenance-approved=true`
- slack:
name: maintenance_started
config:
hook: Your hook
channel: Your channel
message: |
Maintenance for node {{ .Node.Name }} has started.
check:
- hasAnnotation:
name: reboot_needed
config:
key: flatcar-linux-update.v1.flatcar-linux.net/reboot-needed
value: "true"
- hasAnnotation:
name: check_approval
config:
key: cloud.sap/maintenance-approved
value: "true"
trigger:
- alterAnnotation:
name: reboot-ok
config:
key: flatcar-linux-update.v1.flatcar-linux.net/reboot-ok
value: "true"
- alterAnnotation:
name: remove_approval
config:
key: cloud.sap/maintenance-approved
remove: true
- alterAnnotation:
name: remove_reboot_ok
config:
key: flatcar-linux-update.v1.flatcar-linux.net/reboot-ok
remove: true
notify:
- type: slack
name: approval_required
config:
hook: Your hook
channel: Your channel
message: |
The node {{ .Node.Name }} requires maintenance. Manual approval is required.
Approve to drain and reboot this node by running:
`kubectl annotate node {{ .Node.Name }} cloud.sap/maintenance-approved=true`
schedule:
type: periodic
config:
interval: 24h
- type: slack
name: maintenance_started
config:
hook: Your hook
channel: Your channel
message: |
Maintenance for node {{ .Node.Name }} has started.
schedule:
type: periodic
config:
interval: 24h
check:
- type: hasAnnotation
name: reboot_needed
config:
key: flatcar-linux-update.v1.flatcar-linux.net/reboot-needed
value: "true"
- type: hasAnnotation
name: check_approval
config:
key: cloud.sap/maintenance-approved
value: "true"
trigger:
- type: alterAnnotation
name: reboot-ok
config:
key: flatcar-linux-update.v1.flatcar-linux.net/reboot-ok
value: "true"
- type: alterAnnotation
name: remove_approval
config:
key: cloud.sap/maintenance-approved
remove: true
- type: alterAnnotation
name: remove_reboot_ok
config:
key: flatcar-linux-update.v1.flatcar-linux.net/reboot-ok
remove: true
profiles:
flatcar:
operational:
check: reboot_needed
maintenance-required:
check: check_approval
notify: approval_required
- name: flatcar
operational:
transitions:
- check: reboot_needed
next: maintenance-required
maintenance-required:
notify: approval_required
transitions:
- check: check_approval
trigger: remove_approval && reboot-ok
in-maintenance:
check: "!reboot_needed"
notify: maintenance_started
next: in-maintenance
in-maintenance:
notify: maintenance_started
transitions:
- check: "!reboot_needed"
trigger: remove_reboot_ok
next: operational
```
51 changes: 51 additions & 0 deletions common/weekday.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
/*******************************************************************************
*
* Copyright 2020 SAP SE
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You should have received a copy of the License along with this
* program. If not, you may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*******************************************************************************/

package common

import (
"fmt"
"strings"
"time"
)

var WeekdayMap = map[string]time.Weekday{
"monday": time.Monday,
"mon": time.Monday,
"tuesday": time.Tuesday,
"tue": time.Tuesday,
"wednesday": time.Wednesday,
"wed": time.Wednesday,
"thursday": time.Thursday,
"thu": time.Thursday,
"friday": time.Friday,
"fri": time.Friday,
"saturday": time.Saturday,
"sat": time.Saturday,
"sunday": time.Sunday,
"sun": time.Sunday,
}

func WeekdayFromString(s string) (time.Weekday, error) {
weekday, ok := WeekdayMap[strings.ToLower(s)]
if !ok {
return time.Monday, fmt.Errorf("'%v' is not a known weekday", s)
}
return weekday, nil
}
Loading

0 comments on commit e0e33b8

Please sign in to comment.