Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call _nodes/shutdown from pre-stop hook #6544

Merged
merged 18 commits into from
Mar 22, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,10 @@ spec:
- name: PRE_STOP_ADDITIONAL_WAIT_SECONDS
value: "5"
----

The pre-stop lifecycle hook also tries to gracefully shut down the Elasticsearch node in case of a termination that is not caused by the ECK operator. Examples of such terminations could be Kubernetes node maintenance or a Kubernetes upgrade. In these cases the script will try to interact with the Elasticsearch API to notify Elasticsearch of the impending termination of the node. The intent is to avoid relocation and recovery of shards while the Elasticsearch node is only temporarily unavailable.

This is done on a best effort basis. In particular requests to an Elasticsearch cluster already in the process of shutting down might fail if the Kubernetes service has already been removed.
The script allows for `PRE_STOP_MAX_DNS_ERRORS` which default to 2 before giving up.

When using local persistent volumes a different behaviour might be desirable because the Elasticsearch node's associated storage will not be available anymore on the new Kuberenetes node. `PRE_STOP_SHUTDOWN_TYPE` allows to override the default shutdown type to one of the link:https://www.elastic.co/guide/en/elasticsearch/reference/current/put-shutdown.html[possible values]. Please be aware that setting it to anything other than `restart` might mean that the pre-stop hook will run longer than `terminationGracePeriodSeconds` of the Pod while moving data out of the terminating Pod and will not be able to complete unless you also adjust that value in the `podTemplate`.
pebrc marked this conversation as resolved.
Show resolved Hide resolved
8 changes: 7 additions & 1 deletion pkg/controller/elasticsearch/configmap/configmap.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import (
"github.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch/initcontainer"
"github.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch/label"
"github.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch/nodespec"
"github.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch/services"
"github.com/elastic/cloud-on-k8s/v2/pkg/utils/k8s"
)

Expand All @@ -43,11 +44,16 @@ func ReconcileScriptsConfigMap(ctx context.Context, c k8s.Client, es esv1.Elasti
return err
}

preStopScript, err := nodespec.RenderPreStopHookScript(services.InternalServiceURL(es))
if err != nil {
return err
}

scriptsConfigMap := NewConfigMapWithData(
types.NamespacedName{Namespace: es.Namespace, Name: esv1.ScriptsConfigMap(es.Name)},
map[string]string{
nodespec.ReadinessProbeScriptConfigKey: nodespec.ReadinessProbeScript,
nodespec.PreStopHookScriptConfigKey: nodespec.PreStopHookScript,
nodespec.PreStopHookScriptConfigKey: preStopScript,
initcontainer.PrepareFsScriptConfigKey: fsScript,
initcontainer.SuspendScriptConfigKey: initcontainer.SuspendScript,
initcontainer.SuspendedHostsFile: initcontainer.RenderSuspendConfiguration(es),
Expand Down
2 changes: 1 addition & 1 deletion pkg/controller/elasticsearch/nodespec/defaults.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ var (
func DefaultEnvVars(httpCfg commonv1.HTTPConfig, headlessServiceName string) []corev1.EnvVar {
return defaults.ExtendPodDownwardEnvVars(
[]corev1.EnvVar{
{Name: settings.EnvProbePasswordPath, Value: path.Join(esvolume.ProbeUserSecretMountPath, user.ProbeUserName)},
{Name: settings.EnvProbePasswordPath, Value: path.Join(esvolume.PodMountedUsersSecretMountPath, user.ProbeUserName)},
{Name: settings.EnvProbeUsername, Value: user.ProbeUserName},
{Name: settings.EnvReadinessProbeProtocol, Value: httpCfg.Protocol()},
{Name: settings.HeadlessServiceName, Value: headlessServiceName},
Expand Down
187 changes: 184 additions & 3 deletions pkg/controller/elasticsearch/nodespec/lifecycle_hook.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,15 @@
package nodespec

import (
"bytes"
"path"
"path/filepath"
"text/template"

v1 "k8s.io/api/core/v1"

"github.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch/label"
"github.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch/user"
"github.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch/volume"
)

Expand All @@ -20,7 +25,8 @@ func NewPreStopHook() *v1.LifecycleHandler {
}

const PreStopHookScriptConfigKey = "pre-stop-hook-script.sh"
const PreStopHookScript = `#!/usr/bin/env bash

var preStopHookScriptTemplate = template.Must(template.New("pre-stop").Parse(`#!/usr/bin/env bash

set -euo pipefail

Expand All @@ -36,5 +42,180 @@ set -euo pipefail
# target the Pod IP before Elasticsearch stops.
PRE_STOP_ADDITIONAL_WAIT_SECONDS=${PRE_STOP_ADDITIONAL_WAIT_SECONDS:=50}

sleep $PRE_STOP_ADDITIONAL_WAIT_SECONDS
`
# PRE_STOP_SHUTDOWN_TYPE controls the type of shutdown that will be communicated to Elasticsearch. This should not be
# changed to anything but restart. Specifically setting remove can lead to extensive data migration that might exceed the
# terminationGracePeriodSeconds and lead to an incomplete shutdown.
shutdown_type=${PRE_STOP_SHUTDOWN_TYPE:=restart}

# capture response bodies in a temp file for better error messages and to extract necessary information for subsequent requests
resp_body=$(mktemp)
trap "rm -f $resp_body" EXIT

script_start=$(date +%s)

# compute time in seconds since the given start time
function duration() {
local start=$1
end=$(date +%s)
echo $((end-start))
}

# use DNS errors as a proxy to abort this script early if there is no chance of successful completion
# DNS errors are for example expected when the whole cluster including its service is being deleted
# and the service URL can no longer be resolved even though we still have running Pods.
max_dns_errors=${PRE_STOP_MAX_DNS_ERRORS:=2}
global_dns_error_cnt=0

function request() {
local status exit
status=$(curl -k -sS -o $resp_body -w "%{http_code}" "$@")
exit=$?
if [ "$exit" -ne 0 ] || [ "$status" -lt 200 ] || [ "$status" -gt 299 ]; then
# track curl DNS errors separately
if [ "$exit" -eq 6 ]; then ((global_dns_error_cnt++)); fi
# make sure we have a non-zero exit code in the presence of errors
if [ "$exit" -eq 0 ]; then exit=1; fi
echo $status $resp_body
return $exit
fi
global_dns_error_cnt=0
return 0
}

function retry() {
local retries=$1
shift

local count=0
until "$@"; do
exit=$?
wait=$((2 ** count))
count=$((count + 1))
if [ $global_dns_error_cnt -gt $max_dns_errors ]; then
error_exit "too many DNS errors, giving up"
fi
if [ $count -lt "$retries" ]; then
printf "Retry %s/%s exited %s, retrying in %s seconds...\n" "$count" "$retries" "$exit" "$wait" >&2
pebrc marked this conversation as resolved.
Show resolved Hide resolved
sleep $wait
else
printf "Retry %s/%s exited %s, no more retries left.\n" "$count" "$retries" "$exit" >&2
return $exit
fi
done
return 0
}

function error_exit() {
echo $1 1>&2
exit 1
}

function delayed_exit() {
local elapsed=$(duration $script_start)
sleep $(($PRE_STOP_ADDITIONAL_WAIT_SECONDS - $elapsed))
exit 0
}

function is_master(){
labels="{{.LabelsFile}}"
grep 'master="true"' $labels
}

function supports_node_shutdown() {
local version="$1"
version=${version#[vV]}
major="${version%%\.*}"
minor="${version#*.}"
minor="${minor%.*}"
patch="${version##*.}"
# node shutdown is supported as of 7.15.2
if [ "$major" -lt 7 ] || ([ "$major" -eq 7 ] && [ "$minor" -eq 15 ] && [ "$patch" -lt 2 ]); then
return 1
fi
return 0
}

version=""
if [[ -f "{{.LabelsFile}}" ]]; then
# get Elasticsearch version from the downward API
version=$(grep "{{.VersionLabelName}}" {{.LabelsFile}} | cut -d '=' -f 2)
# remove quotes
version=$(echo "${version}" | tr -d '"')
fi

# if ES version does not support node shutdown exit early
if ! supports_node_shutdown $version; then
delayed_exit
fi

# setup basic auth if credentials are available
if [ -f "{{.PreStopUserPasswordPath}}" ]; then
PROBE_PASSWORD=$(<{{.PreStopUserPasswordPath}})
BASIC_AUTH="-u {{.PreStopUserName}}:${PROBE_PASSWORD}"
else
BASIC_AUTH=''
fi

ES_URL={{.ServiceURL}}

if is_master; then
retry 10 request -X POST "$ES_URL/_cluster/voting_config_exclusions?node_names=$POD_NAME" $BASIC_AUTH
pebrc marked this conversation as resolved.
Show resolved Hide resolved
# we ignore the error here and try to call at least node shutdown
fi

echo "retrieving node ID"
retry 10 request -X GET "$ES_URL/_cat/nodes?full_id=true&h=id,name" $BASIC_AUTH
if [ "$?" -ne 0 ]; then
error_exit $status
fi

NODE_ID=$(grep $POD_NAME $resp_body | cut -f 1 -d ' ')

# check if there is an ongoing shutdown request
request -X GET $ES_URL/_nodes/$NODE_ID/shutdown $BASIC_AUTH
pebrc marked this conversation as resolved.
Show resolved Hide resolved
if grep -q -v '"nodes":\[\]' $resp_body; then
delayed_exit
fi

echo "initiating node shutdown"
pebrc marked this conversation as resolved.
Show resolved Hide resolved
retry 10 request -X PUT $ES_URL/_nodes/$NODE_ID/shutdown $BASIC_AUTH -H 'Content-Type: application/json' -d"
{
\"type\": \"$shutdown_type\",
\"reason\": \"pre-stop hook\"
}
"
if [ "$?" -ne 0 ]; then
error_exit "Failed to call node shutdown API" $resp_body
fi

while :
do
echo "waiting for node shutdown to complete"
request -X GET $ES_URL/_nodes/$NODE_ID/shutdown $BASIC_AUTH
if [ "$?" -ne 0 ]; then
continue
fi
if grep -q -v 'IN_PROGRESS\|STALLED' $resp_body; then
break
fi
sleep 10
done

delayed_exit
`))

func RenderPreStopHookScript(svcURL string) (string, error) {
vars := map[string]string{
"PreStopUserName": user.PreStopUserName,
"PreStopUserPasswordPath": filepath.Join(volume.PodMountedUsersSecretMountPath, user.PreStopUserName),
// edge case: protocol change (http/https) combined with external node shutdown might not work out well due to
// script propagation delays. But it is not a legitimate production use case as users are not expected to change
// protocol on production systems
"ServiceURL": svcURL,
"LabelsFile": filepath.Join(volume.DownwardAPIMountPath, volume.LabelsFile),
"VersionLabelName": label.VersionLabelName,
}
var script bytes.Buffer
err := preStopHookScriptTemplate.Execute(&script, vars)
return script.String(), err
}
2 changes: 1 addition & 1 deletion pkg/controller/elasticsearch/nodespec/podspec_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ func TestBuildPodTemplateSpec(t *testing.T) {
initContainerEnv := defaults.ExtendPodDownwardEnvVars(
[]corev1.EnvVar{
{Name: "my-env", Value: "my-value"},
{Name: settings.EnvProbePasswordPath, Value: path.Join(esvolume.ProbeUserSecretMountPath, user.ProbeUserName)},
{Name: settings.EnvProbePasswordPath, Value: path.Join(esvolume.PodMountedUsersSecretMountPath, user.ProbeUserName)},
{Name: settings.EnvProbeUsername, Value: user.ProbeUserName},
{Name: settings.EnvReadinessProbeProtocol, Value: sampleES.Spec.HTTP.Protocol()},
{Name: settings.HeadlessServiceName, Value: HeadlessServiceName(esv1.StatefulSet(sampleES.Name, nodeSet.Name))},
Expand Down
2 changes: 1 addition & 1 deletion pkg/controller/elasticsearch/nodespec/volumes.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ func buildVolumes(
configVolume := settings.ConfigSecretVolume(esv1.StatefulSet(esName, nodeSpec.Name))
probeSecret := volume.NewSelectiveSecretVolumeWithMountPath(
esv1.InternalUsersSecret(esName), esvolume.ProbeUserVolumeName,
esvolume.ProbeUserSecretMountPath, []string{user.ProbeUserName},
esvolume.PodMountedUsersSecretMountPath, []string{user.ProbeUserName, user.PreStopUserName},
)
httpCertificatesVolume := volume.NewSecretVolumeWithMountPath(
certificates.InternalCertsSecretName(esv1.ESNamer, esName),
Expand Down
7 changes: 5 additions & 2 deletions pkg/controller/elasticsearch/user/predefined.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,12 @@ const (

// ControllerUserName is the controller user to interact with ES.
ControllerUserName = "elastic-internal"
// ProbeUserName is used for the Elasticsearch readiness probe.
ProbeUserName = "elastic-internal-probe"
// MonitoringUserName is used for the Elasticsearch monitoring.
MonitoringUserName = "elastic-internal-monitoring"
// PreStopUserName is used for API interactions from the pre-stop Pod lifecycle hook
PreStopUserName = "elastic-internal-pre-stop"
// ProbeUserName is used for the Elasticsearch readiness probe.
ProbeUserName = "elastic-internal-probe"
)

// reconcileElasticUser reconciles a single secret holding the "elastic" user password.
Expand Down Expand Up @@ -84,6 +86,7 @@ func reconcileInternalUsers(
existingFileRealm,
users{
{Name: ControllerUserName, Roles: []string{SuperUserBuiltinRole}},
{Name: PreStopUserName, Roles: []string{ClusterManageRole}},
{Name: ProbeUserName, Roles: []string{ProbeUserRole}},
{Name: MonitoringUserName, Roles: []string{RemoteMonitoringCollectorBuiltinRole}},
},
Expand Down
16 changes: 8 additions & 8 deletions pkg/controller/elasticsearch/user/predefined_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -205,8 +205,8 @@ func Test_reconcileInternalUsers(t *testing.T) {
// passwords and hashes should be reused
require.Equal(t, []byte("controllerUserPassword"), u[0].Password)
require.Equal(t, []byte("$2a$10$lUuxZpa.ByS.Tid3PcMII.PrELwGjti3Mx1WRT0itwy.Ajpf.BsEG"), u[0].PasswordHash)
require.Equal(t, []byte("probeUserPassword"), u[1].Password)
require.Equal(t, []byte("$2a$10$8.9my2W7FVDqDnh.E1RwouN5RzkZGulQ3ZMgmoy3CH4xRvr5uYPbS"), u[1].PasswordHash)
require.Equal(t, []byte("probeUserPassword"), u[2].Password)
require.Equal(t, []byte("$2a$10$8.9my2W7FVDqDnh.E1RwouN5RzkZGulQ3ZMgmoy3CH4xRvr5uYPbS"), u[2].PasswordHash)
},
},
{
Expand All @@ -229,9 +229,9 @@ func Test_reconcileInternalUsers(t *testing.T) {
require.Equal(t, []byte("controllerUserPassword"), u[0].Password)
require.Equal(t, []byte("$2a$10$lUuxZpa.ByS.Tid3PcMII.PrELwGjti3Mx1WRT0itwy.Ajpf.BsEG"), u[0].PasswordHash)
// password of probe user should be reused, but hash should be re-computed
require.Equal(t, []byte("probeUserPassword"), u[1].Password)
require.Equal(t, []byte("probeUserPassword"), u[2].Password)
require.NotEmpty(t, u[1].PasswordHash)
require.NotEqual(t, "does-not-match-password", u[1].PasswordHash)
require.NotEqual(t, "does-not-match-password", u[2].PasswordHash)
},
},
{
Expand All @@ -254,8 +254,8 @@ func Test_reconcileInternalUsers(t *testing.T) {
require.Equal(t, []byte("controllerUserPassword"), u[0].Password)
require.Equal(t, []byte("$2a$10$lUuxZpa.ByS.Tid3PcMII.PrELwGjti3Mx1WRT0itwy.Ajpf.BsEG"), u[0].PasswordHash)
// password of probe user should be reused, and hash should be re-computed
require.Equal(t, []byte("probeUserPassword"), u[1].Password)
require.NotEmpty(t, u[1].PasswordHash)
require.Equal(t, []byte("probeUserPassword"), u[2].Password)
require.NotEmpty(t, u[2].PasswordHash)
},
},
}
Expand All @@ -265,9 +265,9 @@ func Test_reconcileInternalUsers(t *testing.T) {
got, err := reconcileInternalUsers(context.Background(), c, es, tt.existingFileRealm, testPasswordHasher)
require.NoError(t, err)
// check returned users
require.Len(t, got, 3)
require.Len(t, got, 4)
controllerUser := got[0]
probeUser := got[1]
probeUser := got[2]
// names and roles are always the same
require.Equal(t, ControllerUserName, controllerUser.Name)
require.Equal(t, []string{SuperUserBuiltinRole}, controllerUser.Roles)
Expand Down
6 changes: 3 additions & 3 deletions pkg/controller/elasticsearch/user/reconcile_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -89,13 +89,13 @@ func Test_aggregateFileRealm(t *testing.T) {
require.NoError(t, err)
require.NotEmpty(t, controllerUser.Password)
actualUsers := fileRealm.UserNames()
require.ElementsMatch(t, []string{"elastic", "elastic-internal", "elastic-internal-probe", "elastic-internal-monitoring", "user1", "user2", "user3"}, actualUsers)
require.ElementsMatch(t, []string{"elastic", "elastic-internal", "elastic-internal-pre-stop", "elastic-internal-probe", "elastic-internal-monitoring", "user1", "user2", "user3"}, actualUsers)
}

func Test_aggregateRoles(t *testing.T) {
c := k8s.NewFakeClient(sampleUserProvidedRolesSecret...)
roles, err := aggregateRoles(context.Background(), c, sampleEsWithAuth, initDynamicWatches(), record.NewFakeRecorder(10))
require.NoError(t, err)
require.Len(t, roles, 52)
require.Contains(t, roles, ProbeUserRole, "role1", "role2")
require.Len(t, roles, 53)
require.Contains(t, roles, ProbeUserRole, ClusterManageRole, "role1", "role2")
}
9 changes: 6 additions & 3 deletions pkg/controller/elasticsearch/user/roles.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ package user
import (
"fmt"

"gopkg.in/yaml.v3"

beatv1beta1 "github.com/elastic/cloud-on-k8s/v2/pkg/apis/beat/v1beta1"
esclient "github.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch/client"

"gopkg.in/yaml.v3"
)

const (
Expand All @@ -19,6 +19,8 @@ const (

// SuperUserBuiltinRole is the name of the built-in superuser role.
SuperUserBuiltinRole = "superuser"
// ClusterManageRole is the name of a custom role to manage the cluster.
ClusterManageRole = "elastic-internal_cluster_manage"
// ProbeUserRole is the name of the role used by the internal probe user.
ProbeUserRole = "elastic_internal_probe_user"
// RemoteMonitoringCollectorBuiltinRole is the name of the built-in remote_monitoring_collector role.
Expand Down Expand Up @@ -58,7 +60,8 @@ const (
var (
// PredefinedRoles to create for internal needs.
PredefinedRoles = RolesFileContent{
ProbeUserRole: esclient.Role{Cluster: []string{"monitor"}},
ProbeUserRole: esclient.Role{Cluster: []string{"monitor"}},
ClusterManageRole: esclient.Role{Cluster: []string{"manage"}},
ApmUserRoleV6: esclient.Role{
Cluster: []string{"monitor", "manage_index_templates"},
Indices: []esclient.IndexRole{
Expand Down