Skip to content

Commit

Permalink
Remove outdated kill method to stop sidecar
Browse files Browse the repository at this point in the history
Sending the kill signal was needed for Istio < v1.3 and 1.3 has been
released 4 years ago.

If the `ENVOY_ADMIN_API` is set to the default port `15000` the
`ISTIO_QUIT_API` configured correctly by default.
On deployments that restart the containers the shutdown behaviour might
be undesirable.
The Istio sidecar will be restarted but when the pod has multiple
containers that means that meaningful work in the containers that did
not crash could be interrupted.

So for pods that have multiple containers that need traffic through the
service mesh, it is recommended to set `NEVER_KILL_ISTIO` to `true`.
  • Loading branch information
kvij committed Oct 1, 2023
1 parent 263551b commit 5069fe7
Show file tree
Hide file tree
Showing 4 changed files with 75 additions and 54 deletions.
27 changes: 2 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ will poll indefinitely with backoff, waiting for envoy to report itself as live,

All signals are passed to the underlying application. Be warned that `SIGKILL` cannot be passed, so this can leave behind a orphaned process.

When the application exits, unless `NEVER_KILL_ISTIO_ON_FAILURE` has been set and the exit code is non-zero, `scuttle` will instruct envoy to shut down immediately.
When the application exits, unless `NEVER_KILL_ISTIO_ON_FAILURE` or `NEVER_KILL_ISTIO` have been set to `true`, `scuttle` will instruct envoy to shut down immediately.

## Environment variables

Expand All @@ -19,34 +19,11 @@ When the application exits, unless `NEVER_KILL_ISTIO_ON_FAILURE` has been set an
| `SCUTTLE_LOGGING` | If provided and set to `true`, `scuttle` will log various steps to the console which is helpful for debugging |
| `START_WITHOUT_ENVOY` | If provided and set to `true`, `scuttle` will not wait for envoy to be LIVE before starting the main application. However, it will still instruct envoy to exit. |
| `WAIT_FOR_ENVOY_TIMEOUT` | If provided and set to a valid `time.Duration` string greater than 0 seconds, `scuttle` will wait for that amount of time before starting the main application. By default, it will wait indefinitely. If `QUIT_WITHOUT_ENVOY_TIMEOUT` is set as well, it will take precedence over this variable |
| `ISTIO_QUIT_API` | If provided `scuttle` will send a POST to `/quitquitquit` at the given API. Should be in format `http://127.0.0.1:15020`. This is intended for Istio v1.3 and higher. When not given, Istio will be stopped using a `pkill` command. |
| `ISTIO_QUIT_API` | This is the path to envoy's pilot agent interface, in the format `http://127.0.0.1:15020`. If not provided and the `ENVOY_ADMIN_API` is configured with the default port `15000`, the setting is configured automatically. If present (configured or deducted) `scuttle` will send a POST to `/quitquitquit` at the url. |
| `GENERIC_QUIT_ENDPOINTS` | If provided `scuttle` will send a POST to the URL given. Multiple URLs are supported and must be provided as a CSV string. Should be in format `http://myendpoint.com` or `http://myendpoint.com,https://myotherendpoint.com`. The status code response is logged (if logging is enabled) but is not used. A 200 is treated the same as a 404 or 500. `GENERIC_QUIT_ENDPOINTS` is handled before Istio is stopped. |
| `QUIT_REQUEST_TIMEOUT` | A deadline provided as a valid `time.Duration` string for requests to the `/quitquitquit` and/or the generic endpoints. If the deadline is exceeded `scuttle` gives up and exits cleanly. The default value is `5s`. |
| `QUIT_WITHOUT_ENVOY_TIMEOUT` | If provided and set to a valid duration, `scuttle` will exit if Envoy does not become available before the end of the timeout and not continue with the passed in executable. If `START_WITHOUT_ENVOY` is also set, this variable will not be taken into account. Also, if `WAIT_FOR_ENVOY_TIMEOUT` is set, this variable will take precedence. |

## How Scuttle stops Istio

Scuttle has two methods to stop Istio. You should configure Scuttle appropriately based on the version of Istio you are using.

| Istio Version | Method |
|---------------|--------|
| 1.3 and higher| `/quitquitquit` endpoint |
| 1.2 and lower | `pkill` command |

### Istio 1.3 and higher

Version 1.3 of Istio introduced an endpoint `/quitquitquit` similar to Envoy. By default this endpoint is available at `http://127.0.0.1:15020` which is the Pilot Agent service, responsible for managing envoy. ([Source](https://github.com/istio/istio/issues/15041))

To enable this, set the environment variable `ISTIO_QUIT_API` to `http://127.0.0.1:15020`.

### Istio 1.2 and lower

Versions 1.2 and lower of Istio have no supported method to stop Istio Sidecars. As a workaround Scuttle stops Istio using the command `pkill -SIGINT pilot-agent`.

To enable this, you must add `shareProcessNamespace: true` to your **Pod** definition in Kubernetes. This allows Scuttle to stop the service running on the sidecar container.

*Note:* This method is used by default if `ISTIO_QUIT_API` is not set

## Example usage in your Job's `Dockerfile`

```dockerfile
Expand Down
29 changes: 2 additions & 27 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -127,11 +127,6 @@ func kill(exitCode int) {
case config.NeverKillIstioOnFailure && exitCode != 0:
log(fmt.Sprintf(logLineUnformatted, "Skipping Istio kill", "NEVER_KILL_ISTIO_ON_FAILURE is true", exitCode))
os.Exit(exitCode)
case config.IstioQuitAPI == "":
// No istio API sent, fallback to Pkill method
log(fmt.Sprintf(logLineUnformatted, "Stopping Istio with pkill", "ISTIO_QUIT_API is not set", exitCode))
killGenericEndpoints()
killIstioWithPkill()
default:
// Stop istio using api
log(fmt.Sprintf(logLineUnformatted, "Stopping Istio with API", "ISTIO_QUIT_API is set", exitCode))
Expand Down Expand Up @@ -167,35 +162,15 @@ func killGenericEndpoints() {
func killIstioWithAPI() {
log(fmt.Sprintf("Stopping Istio using Istio API '%s' (intended for Istio >v1.2)", config.IstioQuitAPI))

responseSuccess := false
ctx, cancel := context.WithTimeout(context.Background(), config.QuitRequestTimeout)
defer cancel()
url := fmt.Sprintf("%s/quitquitquit", config.IstioQuitAPI)
code, err := postKill(ctx, url)
if err != nil {
log(fmt.Sprintf("Sent quitquitquit to Istio, error: %d", err))
} else {
log(fmt.Sprintf("Sent quitquitquit to Istio, status code: %d", code))
responseSuccess = code >= 200 && code < 300
}

if !responseSuccess && config.IstioFallbackPkill {
log(fmt.Sprintf("quitquitquit failed, will attempt pkill method"))
killIstioWithPkill()
}
}

func killIstioWithPkill() {
log("Stopping Istio using pkill command (intended for Istio <v1.3)")

cmd := exec.Command("sh", "-c", "pkill -SIGINT pilot-agent")
_, err := cmd.Output()
if err == nil {
log("Process pilot-agent successfully stopped")
} else {
errorMessage := err.Error()
log("pilot-agent could not be stopped, err: " + errorMessage)
return
}
log(fmt.Sprintf("Sent quitquitquit to Istio, status code: %d", code))
}

func waitForEnvoy() context.Context {
Expand Down
22 changes: 20 additions & 2 deletions scuttle_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,15 @@ package main

import (
"fmt"
"net/url"
"os"
"strings"
"time"
)

const defaultAdminAPIPort = 15000
const defaultQuitAPIPort = 15020

// ScuttleConfig ... represents Scuttle's configuration based on environment variables or defaults.
type ScuttleConfig struct {
LoggingEnabled bool
Expand All @@ -15,7 +19,6 @@ type ScuttleConfig struct {
WaitForEnvoyTimeout time.Duration
IstioQuitAPI string
NeverKillIstio bool
IstioFallbackPkill bool
NeverKillIstioOnFailure bool
GenericQuitEndpoints []string
QuitRequestTimeout time.Duration
Expand All @@ -38,13 +41,16 @@ func getConfig() ScuttleConfig {
WaitForEnvoyTimeout: getDurationFromEnv("WAIT_FOR_ENVOY_TIMEOUT", time.Duration(0), loggingEnabled),
IstioQuitAPI: getStringFromEnv("ISTIO_QUIT_API", "", loggingEnabled),
NeverKillIstio: getBoolFromEnv("NEVER_KILL_ISTIO", false, loggingEnabled),
IstioFallbackPkill: getBoolFromEnv("ISTIO_FALLBACK_PKILL", false, loggingEnabled),
NeverKillIstioOnFailure: getBoolFromEnv("NEVER_KILL_ISTIO_ON_FAILURE", false, loggingEnabled),
GenericQuitEndpoints: getStringArrayFromEnv("GENERIC_QUIT_ENDPOINTS", make([]string, 0), loggingEnabled),
QuitRequestTimeout: getDurationFromEnv("QUIT_REQUEST_TIMEOUT", time.Second*5, loggingEnabled),
QuitWithoutEnvoyTimeout: getDurationFromEnv("QUIT_WITHOUT_ENVOY_TIMEOUT", time.Duration(0), loggingEnabled),
}

if config.IstioQuitAPI == "" {
config.IstioQuitAPI = replacePort(config.EnvoyAdminAPI, defaultAdminAPIPort, defaultQuitAPIPort)
}

return config
}

Expand Down Expand Up @@ -124,3 +130,15 @@ func getDurationFromEnv(name string, defaultVal time.Duration, logEnabled bool)

return defaultVal
}

// replacePort returns a URL with the port replaced when the sourceURL is valid and has the original port set.
// If the original port does not match or the sourceURL is invalid an empty string is returned.
func replacePort(sourceURL string, original, replacement int) string {
u, err := url.Parse(sourceURL)
if err != nil || (u.Port() != fmt.Sprintf("%d", original)) {
return ""
}

u.Host = fmt.Sprintf("%s:%d", u.Hostname(), replacement)
return u.String()
}
51 changes: 51 additions & 0 deletions scuttle_config_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
package main

import "testing"

func Test_replacePort(t *testing.T) {
type args struct {
sourceURL string
original int
replacement int
}
tests := []struct {
name string
args args
want string
}{
{
name: "Happy flow",
args: args{
sourceURL: "http://localhost:15000",
original: 15000,
replacement: 15020,
},
want: "http://localhost:15020",
},
{
name: "Invalid URL",
args: args{
sourceURL: "notaurl^^ :15000",
original: 15000,
replacement: 15020,
},
want: "",
},
{
name: "Port not matching",
args: args{
sourceURL: "http://localhost:14000",
original: 15000,
replacement: 15020,
},
want: "",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := replacePort(tt.args.sourceURL, tt.args.original, tt.args.replacement); got != tt.want {
t.Errorf("replacePort() = %v, want %v", got, tt.want)
}
})
}
}

0 comments on commit 5069fe7

Please sign in to comment.