Skip to content

Commit

Permalink
Implement notification scheduling
Browse files Browse the repository at this point in the history
The triggering of notifications is now handled by an interface, so
additional implementations are possible. The new NotifyScheduled as well
as the old NotifyPeriodic can be configured in the config file.
  • Loading branch information
Nuckal777 committed Feb 22, 2022
1 parent e0b82d7 commit 69a34e2
Show file tree
Hide file tree
Showing 18 changed files with 468 additions and 143 deletions.
70 changes: 59 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,10 @@ A Kubernetes controller to manage node maintenance.
- Concept
- Installation
- Configuration
- General
- Check Plugins
- Notification Plugins
- Notification Schedules
- Trigger Plugins
- Additional integrations
- Example configuration for flatcar update agents
Expand All @@ -28,7 +30,7 @@ It is built with flexibility in mind and should be adaptable to different enviro
This property is achieved with an extensible plugin system.

## Concept
Kubernetes nodes are modelled as finite state machines and can be in one of three states.
Kubernetes nodes are modelled as finite state machines, which can be in one of the following three states:
- Operational
- Maintenance Required
- In Maintenance
Expand All @@ -39,11 +41,10 @@ Such plugin chains can be configured for each state individually via maintenance
Cluster administrators can assign a maintenance profile to a node using the `cloud.sap/maintenance-profile` label.
Before the transition is finished a chain of "trigger plugins" can be invoked, which can perform any action related to termination or startup logic.
While a node is in a certain state, a chain of "notifications plugins" informs the cluster users and administrators regularly about the node being in that state.
Multiple plugins exist.
It is possible to check or alter labels, to be notified via Slack, ...
Multiple plugins exist, so one can check or alter labels, be notified via Slack and so on.

Currently, most actual maintenance actions like Cordoning, Draining and Rebooting nodes are not carried out by the maintenance-controller and are instead delegated to inbuilt or external other controllers.
The maintenance-controller only does the decision making, whether a node can be maintained or not.
Currently, most actual maintenance actions like Cordoning, Draining and Rebooting nodes are not carried out by the maintenance-controller and are instead delegated to inbuilt or external other controllers.
Check out the additional integrations further down.

## Installation
Expand All @@ -53,6 +54,21 @@ Alternatively, execute ```make deploy IMG=sapcc/maintenance-controller```.

## Configuration

### General
The maintenance-controller contains multiple plugins, which are configurable themselves.
`checkLabel` for example needs to know, which label needs to checked for which value.
The combination of a plugin type like `checkLabel` and its specific configuration is referred to as an instance.
Notification instances require a schedule, which describes when and how often to notify about state changes, additionally.
These instances can be chained together to construct more complex check, trigger and notification actions.
In that regard plugin chains refer to instances being used in conjunction.

Profiles describe a single maintenance workflow each by specifying how a node moves through the state machine.
For each state a notification chain can be configured.
Also transitions have to be defined.
These consist of at least of a check chain and the state, which should follow next.
Optionally, a trigger chain can be configured to perform actions, when a node moves from one state into the next one.

### Format
There is a global configuration, which defines some general options, plugin instances and maintenance profiles.
The global configuration should be named ```./config/maintenance.yaml``` and should be placed relative to the controllers working directory preferably via a Kubernetes secret or a config map.
A secret is recommend as some plugins may need authentication data.
Expand All @@ -61,13 +77,21 @@ The basic structure looks like the following:
intervals:
# defines the minimum duration after which a node should be checked again
requeue: 200ms
# defines how frequent to send reminder notifications
# notifications about nodes being operational occur only once
notify: 500ms
# plugin instances are the combination of a plugin and its configuration
instances:
# the are no notification plugins configured here, but their configuration works the same way as for check and trigger plugins
notify: null
# notification plugin instances
notify:
- type: slack # the plugin type
name: somenotificationplugin
config:
hook: slack-webhook
channel: "#the_channel"
message: the message
# notification schedule
schedule:
type: periodic
config:
interval: 24h
# check plugin instances
check:
- type: hasLabel # the plugin type
Expand Down Expand Up @@ -103,7 +127,7 @@ profiles:
# define the plugin chains for the maintenance-required state
maintenance-required:
# define chains as shown with the operational state
check: null
notify: null
transitions: null
# define plugin chains for the in-maintenance state
in-maintenance:
Expand All @@ -115,7 +139,7 @@ profiles:
# multiple trigger instances can be used also
trigger: t && u
```
Chains be undefined or empty.
Chains can be undefined or empty.
Trigger and Notification chains are configured by specifying the desired instance names separated by ```&&```, e.g. ```alter && othertriggerplugin```.
Check chains are build using boolean expression, e.g. ```transition && !(a || b)```.
To attach a maintenance profile to a node, the label ```cloud.sap/maintenance-profile=NAME``` has to be assigned the desired profile name.
Expand Down Expand Up @@ -233,6 +257,22 @@ One can get the current profile in a template using `{{ .Profile.Current }}`.
Be careful about using it in an instance that is invoked during the `operational` state, as all profiles attached to a node are considered for notification.
`{{ .Profile.Last }}` can be used instead, which refers to profile that caused the last state transition.

### Notification Schedules
__periodic__: Notifies after a state change and when the specified interval passed since the last notification if the node is currently not in the operational state.
This reflects the old implicit notification behavior.
```yaml
type: periodic
config:
interval: a duration according to the rules of golangs time.ParseDuration(), required
```
__scheduled__: Notifies at a certain time only on specified weekdays.
```yaml
type: scheduled
config:
instant: the point in time, when the notification should be sent, "hh:mm" format, required
weekdays: weekdays when notification should be sent, e.g. [monday, tuesday, wednesday, thursday, friday, saturday, sunday], required
```

### Trigger Plugins
__alterAnnotation:__ Adds, changes or removes an annotation
```yaml
Expand Down Expand Up @@ -270,13 +310,21 @@ instances:
The node {{ .Node.Name }} requires maintenance. Manual approval is required.
Approve to drain and reboot this node by running:
`kubectl annotate node {{ .Node.Name }} cloud.sap/maintenance-approved=true`
schedule:
type: periodic
config:
interval: 24h
- type: slack
name: maintenance_started
config:
hook: Your hook
channel: Your channel
message: |
Maintenance for node {{ .Node.Name }} has started.
schedule:
type: periodic
config:
interval: 24h
check:
- type: hasAnnotation
name: reboot_needed
Expand Down
51 changes: 51 additions & 0 deletions common/weekday.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
/*******************************************************************************
*
* Copyright 2020 SAP SE
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You should have received a copy of the License along with this
* program. If not, you may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*******************************************************************************/

package common

import (
"fmt"
"strings"
"time"
)

var WeekdayMap = map[string]time.Weekday{
"monday": time.Monday,
"mon": time.Monday,
"tuesday": time.Tuesday,
"tue": time.Tuesday,
"wednesday": time.Wednesday,
"wed": time.Wednesday,
"thursday": time.Thursday,
"thu": time.Thursday,
"friday": time.Friday,
"fri": time.Friday,
"saturday": time.Saturday,
"sat": time.Saturday,
"sunday": time.Sunday,
"sun": time.Sunday,
}

func WeekdayFromString(s string) (time.Weekday, error) {
weekday, ok := WeekdayMap[strings.ToLower(s)]
if !ok {
return time.Monday, fmt.Errorf("'%v' is not a known weekday", s)
}
return weekday, nil
}
10 changes: 3 additions & 7 deletions controllers/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@ type TransitionDescriptor struct {
type ConfigDescriptor struct {
Intervals struct {
Requeue time.Duration `config:"requeue" validate:"required"`
Notify time.Duration `config:"notify" validate:"required"`
} `config:"intervals" validate:"required"`
Instances plugin.InstancesDescriptor
Profiles []ProfileDescriptor
Expand All @@ -61,8 +60,6 @@ type ConfigDescriptor struct {
type Config struct {
// RequeueInterval defines a duration after the a node is reconceiled again by the controller
RequeueInterval time.Duration
// NotificationInterval specifies a duration after which notifications are resend
NotificationInterval time.Duration
// Profiles contains all known profiles
Profiles map[string]state.Profile
// Contains reference to all plugins and their instances
Expand All @@ -87,10 +84,9 @@ func LoadConfig(config *ucfg.Config) (*Config, error) {
return nil, err
}
return &Config{
RequeueInterval: global.Intervals.Notify,
NotificationInterval: global.Intervals.Requeue,
Profiles: profileMap,
Registry: registry,
RequeueInterval: global.Intervals.Requeue,
Profiles: profileMap,
Registry: registry,
}, nil
}

Expand Down
2 changes: 1 addition & 1 deletion controllers/node_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ func reconcileInternal(params reconcileParameters) error {

for _, profile := range profiles {
// construct state
stateObj, err := state.FromLabel(stateLabel, profile.Chains[stateLabel], params.config.NotificationInterval)
stateObj, err := state.FromLabel(stateLabel, profile.Chains[stateLabel])
if err != nil {
return fmt.Errorf("failed to create internal state from unknown label value: %w", err)
}
Expand Down
8 changes: 8 additions & 0 deletions kubernikus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ instances:
channel: "#channel"
title: "Updating the operating system of nodes."
message: '{{ .Node.Name }} will reboot now to update Flatcar Linux from version {{ index .Node.Labels "flatcar-linux-update.v1.flatcar-linux.net/version" }} to version {{ index .Node.Annotations "flatcar-linux-update.v1.flatcar-linux.net/new-version" }}'
schedule:
type: periodic
config:
interval: 24h
- type: slackThread
name: maintenance_kubelet
config:
Expand All @@ -72,6 +76,10 @@ instances:
channel: "#channel"
title: "Updating kubelets."
message: '{{ .Node.Name }} will be replaced for kubelet update.'
schedule:
type: periodic
config:
interval: 24h
check:
- type: hasAnnotation
name: reboot_needed
Expand Down
29 changes: 2 additions & 27 deletions plugin/impl/timewindow.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,30 +22,13 @@ package impl
import (
"errors"
"fmt"
"strings"
"time"

"github.com/elastic/go-ucfg"
"github.com/sapcc/maintenance-controller/common"
"github.com/sapcc/maintenance-controller/plugin"
)

var weekdayMap = map[string]time.Weekday{
"monday": time.Monday,
"mon": time.Monday,
"tuesday": time.Tuesday,
"tue": time.Tuesday,
"wednesday": time.Wednesday,
"wed": time.Wednesday,
"thursday": time.Thursday,
"thu": time.Thursday,
"friday": time.Friday,
"fri": time.Friday,
"saturday": time.Saturday,
"sat": time.Saturday,
"sunday": time.Sunday,
"sun": time.Sunday,
}

const timeFormat = "15:04"
const dayMonthFormat = "Jan 2"

Expand Down Expand Up @@ -87,7 +70,7 @@ func (tw *TimeWindow) New(config *ucfg.Config) (plugin.Checker, error) {
}
timewindow := &TimeWindow{Start: start, End: end}
for _, weekdayStr := range conf.Weekdays {
weekday, err := weekdayFromString(weekdayStr)
weekday, err := common.WeekdayFromString(weekdayStr)
if err != nil {
return nil, err
}
Expand All @@ -103,14 +86,6 @@ func (tw *TimeWindow) New(config *ucfg.Config) (plugin.Checker, error) {
return timewindow, nil
}

func weekdayFromString(s string) (time.Weekday, error) {
weekday, ok := weekdayMap[strings.ToLower(s)]
if !ok {
return time.Monday, fmt.Errorf("'%v' is not a known weekday", s)
}
return weekday, nil
}

// Check checks whether the current time is within specified time window on allowed weekdays.
func (tw *TimeWindow) Check(params plugin.Parameters) (bool, error) {
return tw.checkInternal(time.Now().UTC()), nil
Expand Down
Loading

0 comments on commit 69a34e2

Please sign in to comment.