-
Notifications
You must be signed in to change notification settings - Fork 91
Add proxy watcher and auto-configuration for Envoy #32
Conversation
Just a note: this will change completely once Envoy has CDS with polling. We won't need to do hot restarts. Will review tomorrow AM. |
Yeah, the plan is that CDS/RDS would fill out most of the config at runtime. We still need first-time config generation in the proxy container, so the code will probably stay. |
|
||
// Spin up a new Envoy process. | ||
cmd := exec.Command("envoy", "-c", s.config, "--restart-epoch", fmt.Sprint(restartEpoch)) | ||
cmd := exec.Command(s.binary, "-c", fname, "--restart-epoch", fmt.Sprint(s.epoch), | ||
"--drain-time-s", "60", "--parent-shutdown-time-s", "90") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also recently discovered the --drain-time-s
and --parent-shutdown-time-s
flags were needed for quicker envoy config file change propagation in amalgam8 . . . what factors contributed to you deciding on 60 and 90 seconds for these values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommendation from Envoy documentation.
Ultimately, this delay should be determined by the frequency of restarts, and we do not know that yet.
60s seems like reasonable time.
|
||
// Add the new Envoy process to the known set of running Envoy processes. | ||
s.cmdMap[cmd] = restartEpoch | ||
s.cmdMap[cmd] = s.epoch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To successfully launch a new Envoy process that will replace the running Envoy processes, the restart epoch of the new process must be exactly 1 greater than the highest restart epoch of the currently running Envoy processes. The original restart epoch logic handled the case where envoy may have crashed. This PR always increments the restart epoch but does not handle the case where the last process exited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are gaps in the the restart epoch, it will throw a core dump
# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 130 54 0 18:54 ? 00:00:00 envoy -c /etc/envoy/envoy.json --restart-epoch 2
# envoy -c /etc/envoy/envoy.json --restart-epoch 4 &
[4] 140
root@06314a4f87f1:/# [2017-01-10 18:54:56.513][140][warning][main] initializing epoch 4 (hot restart version=3.2490504)
[2017-01-10 18:54:56.514][140][critical][assert] assert failure: rc != -1: /source/source/exe/hot_restart.cc:251
[4]+ Aborted (core dumped) envoy -c /etc/envoy/envoy.json --restart-epoch 4
# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 130 54 0 18:54 ? 00:00:00 envoy -c /etc/envoy/envoy.json --restart-epoch 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out!
This was explained in the code but I was looking at the Envoy documentation which is not as explicit.
There is still a race condition possible if the latest Envoy dies in the middle of Reload().
I added a short delay to avoid stressing this race. I really think Envoy should relax that epoch+1 requirement; otherwise, I just don't see how to solve this race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Add proxy watcher and auto-configuration for Envoy * Make the hot restart faster: 90s instead of 15m * Correct restart epoch and active config determination
Complete the proxy supervising process basic implementation.
In this configuration, proxy container has a watcher process that watches for changes to CDS and triggers a hot-reload in Envoy. The watcher creates an Envoy config file from CDS service definitions and inserts a reference to another manager instance acting as SDS.
Eventually, once CDS API is available, we do not need the hot reload. Hot reload would still be necessary for RDS or anything else that's not captured by Envoy runtime APIs.