Heartbeats #4

ethanrowe · 2014-05-30T15:44:32Z

This addresses:

The unit tests are a tad coupled to the implementation. This stuff would be better done with an integration test, which is something I'm happy to work out a little later; we're still in an experimental phase, and I'm okay with using TDD at the unit level to make sure things are pieces together right, using manual runs to verify the full end-to-end stuff.

To facilitate better control of the consul agent, monitor its state, and give more flexibility to what we do with it concurrently. (It would be nicer to put something like this within a god or bluepill app, but I think the various processes are too interdependent for either to be a good choice. Plus I want a single entry point that stays in the foreground, ideal for running in a container). The AgentProcess class is instantiated with the args as one wants passed to "consul agent", and: * It launches the thing and waits in a separate thread; we get immediate notification on exit. * It attempts to verify that the agent is up by running "consul info" against it, which will give a non-zero exit code if the agent isn't up. * It provides for "on up" callbacks and invokes them verification succeeds. Next things: * handle stopping * handling signals * revise launch_and_join to use the new object.

AutoConsul::Runner::AgentProcess.stop! will signal the running agent with SIGINT to cause a clean shutdown. It'll fire :on_stopping callbacks first. These are registered with AgentProcess#on_stopping. It'll put itself into :stopping status before signaling. When the process actually goes down, the AgentProcess#thread, which is already waiting on that process, will see and flip the status to :down. Switching to :down will run callbacks registered with the AgentProcess#on_down method. Tweaks: - fixed mistake in formatting of :spawn command - AgentProcess#launch! puts the agent runner thread into :abort_on_exception mode, so a failed call to the consul will blow everything up.

AutoConsul::Runner::AgentProcess: - #run! will just #launch! and #verify_up!, purely for convenience. - #wait will wait for the agent to stop (by joining the waiting thread) and return the exit code. Also for convenience. So a basic usage pattern would be: runner = AutoConsul::Runner::AgentProcess.new(%w[ -server -bootstrap -node my-name ]) runner.run! # Exit with status code returned by the agent process. exit(runner.wait)

AutoConsul::Runner refactored methods to use the new AgentProcess class for execution: - Runner::run_agent! -> Runner::agent_runner - Runner::run_server! -> Runner::server_runner - Same params as before - Both return an AgentRunner with cluster join logic added in an :on_up callback. Revised the main entry point to account for this, and to use exit codes for the various commands.

To facilitate concurrent operations more easily, the AutoConsul::Runner::AgentProcess#while_up helper lets you supply a block that you want to run in parallel while the agent is running. When the AgentProcess moves to an :up status, your block will start in a new thread. When the AgentProcess moves to either :stopping or :down states, that thread will be killed. If you want to poll things or write out periodic metrics or whatever, this is a reasonable way to hook into the lifecycle of the agent.

Add a "-t" or "--ticks" option that, when used with the "run" command, causes heartbeats to issue at the specified interval in a background thread as long as the agent is in the "up" state. This means we get a single entry point for running the full discoverable agent. Additionally, the "-d" option was broken and only used the default of "/tmp/consul/state". Now it's properly honored, which means it'll blow up if you don't have permissions to write to the requested path.

AutoConsul::Runner::AgentProcess - "consul agent" is spawned into its own process group, so we can control signals through the parent proces. - This means a SIGKILL to the parent won't go to the agent process, which is by design. - SIGTERM and SIGINT are trapped and result in a stop! call to cause a clean shutdown. - To avoid thread deadlocks, the signal handler puts the received signal onto a Queue; a separate thread pops from the queue and attempts the stop! call.

Addresses issues: * #1 * #2 * #3

ethanrowe added 8 commits May 28, 2014 12:35

Update README to reflect new -t option.

1547500

ethanrowe added a commit that referenced this pull request May 30, 2014

Merge pull request #4 from ethanrowe/heartbeats

6c63e95

Addresses issues: * #1 * #2 * #3

ethanrowe merged commit 6c63e95 into master May 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heartbeats #4

Heartbeats #4

ethanrowe commented May 30, 2014

Heartbeats #4

Heartbeats #4

Conversation

ethanrowe commented May 30, 2014