"wait" command in einhornsh #14

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
2 participants
@ConradIrwin
Contributor

ConradIrwin commented Dec 6, 2012

Hi guys,

This seems like a pretty useful thing to have for two reasons:

  1. It's nice if upgrade command doesn't exit until the upgrade is done.
  2. It's nice to get an idea of what einhorn is intending (the logs tell you only what it's doing).

In particular I want to integrate the below script into our deploy so that we can see what the problem is if something is taking a surprising amount of time:

echo "wait upgrade" | einhornsh "$@" & to_wait="$to_wait $!"
tail -f /var/log/app.log & trap "kill $!" EXIT
wait $to_wait

(It sounds like something like this would have helped in Issue #2, and this solves Issue #6 to some extent)

Before I had "wait upgrade" I was just looping on echo state | einhornsh, and waiting for the number of unique versions to == 1.

That said, it introduces quite a lot of extra clunk to the einhornsh interface, so it may not be the best change to enable this kind of thing. A simpler interface might be to have a "ready" command so that I can loop on 'echo "ready" | einhornsh' instead.

Are you interested in having something like this in Einhorn, and if so, is this a sane implementation, or should I rethink?

Conrad

ConradIrwin added some commits Dec 6, 2012

Allow einhornsh to wait for state to stabilize
This lets the deployment process wait until the upgrade is finished
before returning, which should make it easier to notice problems.
@gdb

This comment has been minimized.

Show comment Hide comment
@gdb

gdb Dec 7, 2012

Contributor

I like this functionality a lot.

I think we can make the implementation a lot more elegant (and possibly reusable for other potential features, such as attaching a client to stream the einhorn logs). I'd love to see better abstraction here.

One possible answer is implementing this as another class under event/ (kind of like Timer) -- let's say WorkerStreamer. New clients could either attach themselves to a global WorkerStreamer, or possibly get their own instance. Not sure which is cleaner, but I suspect you'll want the latter since that way you can e.g. retain a buffer of messages destined for a particular client, whereas (I believe) the current implementation can miss some state changes (in between shell:wait messages).

What do you think?

Contributor

gdb commented Dec 7, 2012

I like this functionality a lot.

I think we can make the implementation a lot more elegant (and possibly reusable for other potential features, such as attaching a client to stream the einhorn logs). I'd love to see better abstraction here.

One possible answer is implementing this as another class under event/ (kind of like Timer) -- let's say WorkerStreamer. New clients could either attach themselves to a global WorkerStreamer, or possibly get their own instance. Not sure which is cleaner, but I suspect you'll want the latter since that way you can e.g. retain a buffer of messages destined for a particular client, whereas (I believe) the current implementation can miss some state changes (in between shell:wait messages).

What do you think?

@@ -5,6 +5,8 @@ module Einhorn::Command
module Interface
@@commands = {}
@@command_server = nil
+ @@waiting = Set.new

This comment has been minimized.

Show comment Hide comment
@gdb

gdb Dec 7, 2012

Contributor

As a general rule, you'll probably not want to add any new variables here, but instead use Einhorn::State.

Einhorn::State persists across reloads, whereas all of these variables will be blown away.

@gdb

gdb Dec 7, 2012

Contributor

As a general rule, you'll probably not want to add any new variables here, but instead use Einhorn::State.

Einhorn::State persists across reloads, whereas all of these variables will be blown away.

@gdb

This comment has been minimized.

Show comment Hide comment
@gdb

gdb Dec 7, 2012

Contributor

Oh wait, just realized I misread how you're doing the waiting -- I think I was wrong and you won't actually lose state changes. I guess right now you're effectively queuing them in the connection's write buffer, and each time you send a shell:wait you pop another line?

Contributor

gdb commented Dec 7, 2012

Oh wait, just realized I misread how you're doing the waiting -- I think I was wrong and you won't actually lose state changes. I guess right now you're effectively queuing them in the connection's write buffer, and each time you send a shell:wait you pop another line?

- reconnect
-
- retry
+ if command == 'wait'

This comment has been minimized.

Show comment Hide comment
@gdb

gdb Dec 7, 2012

Contributor

I think the shell can be made unaware of "wait", but instead just always interpret the wait flag on the response message. That seems like a much more general strategy.

@gdb

gdb Dec 7, 2012

Contributor

I think the shell can be made unaware of "wait", but instead just always interpret the wait flag on the response message. That seems like a much more general strategy.

+ puts update['message']
+ end while update['wait']
+ rescue Interrupt => e
+ send_command('shell:unwait')

This comment has been minimized.

Show comment Hide comment
@gdb

gdb Dec 7, 2012

Contributor

This shouldn't be necessary -- the connection should be automatically removed from the waiting list when its closed.

@gdb

gdb Dec 7, 2012

Contributor

This shouldn't be necessary -- the connection should be automatically removed from the waiting list when its closed.

This comment has been minimized.

Show comment Hide comment
@ConradIrwin

ConradIrwin Dec 7, 2012

Contributor

I wanted to be able to stop waiting and then run other commands. Without this I think subsequent commands would see messages intended for the waiter (as they're both multiplexed on the same socket). Now I think harder, this solution is a little racey, but it should be fixable.

@ConradIrwin

ConradIrwin Dec 7, 2012

Contributor

I wanted to be able to stop waiting and then run other commands. Without this I think subsequent commands would see messages intended for the waiter (as they're both multiplexed on the same socket). Now I think harder, this solution is a little racey, but it should be fixable.

This comment has been minimized.

Show comment Hide comment
@gdb

gdb Dec 7, 2012

Contributor

I see. That seems reasonable.. I suspect to support that the wire protocol
should grow message ids, so you can correlate requests and responses. And
then you could theoretically do things like leave a wait running in the
background while issuing other commands.

On Friday, December 7, 2012, Conrad Irwin wrote:

In bin/einhornsh:

@@ -68,6 +88,15 @@ EOF
ehlo if interactive?
end

  • def wait_loop
  •  begin
    
  •    update = send_command('shell:wait')
    
  •    puts update['message']
    
  •  end while update['wait']
    
  • rescue Interrupt => e
  •  send_command('shell:unwait')
    

I wanted to be able to stop waiting and then run other commands. Without
this I think subsequent commands would see messages intended for the waiter
(as they're both multiplexed on the same socket). Now I think harder, this
solution is a little racey, but it should be fixable.


Reply to this email directly or view it on GitHubhttps://github.com/stripe/einhorn/pull/14/files#r2346585.

Sent from mobile

@gdb

gdb Dec 7, 2012

Contributor

I see. That seems reasonable.. I suspect to support that the wire protocol
should grow message ids, so you can correlate requests and responses. And
then you could theoretically do things like leave a wait running in the
background while issuing other commands.

On Friday, December 7, 2012, Conrad Irwin wrote:

In bin/einhornsh:

@@ -68,6 +88,15 @@ EOF
ehlo if interactive?
end

  • def wait_loop
  •  begin
    
  •    update = send_command('shell:wait')
    
  •    puts update['message']
    
  •  end while update['wait']
    
  • rescue Interrupt => e
  •  send_command('shell:unwait')
    

I wanted to be able to stop waiting and then run other commands. Without
this I think subsequent commands would see messages intended for the waiter
(as they're both multiplexed on the same socket). Now I think harder, this
solution is a little racey, but it should be fixable.


Reply to this email directly or view it on GitHubhttps://github.com/stripe/einhorn/pull/14/files#r2346585.

Sent from mobile

+ command 'shell:wait' do |conn, _|
+ unless @@waiting.include?(conn)
+ @@waiting << conn
+ notify_waiters :always => true

This comment has been minimized.

Show comment Hide comment
@gdb

gdb Dec 7, 2012

Contributor

If I'm reading this right, whenever a new client runs shell:wait, all clients will get a state message. That's probably not desirable (and another reason to create a WorkerStreamer instance per client).

@gdb

gdb Dec 7, 2012

Contributor

If I'm reading this right, whenever a new client runs shell:wait, all clients will get a state message. That's probably not desirable (and another reason to create a WorkerStreamer instance per client).

+ return if @@waiting.length == 0
+
+ bits = []
+ if Einhorn::WorkerPool.signaled_workers.length > 0

This comment has been minimized.

Show comment Hide comment
@gdb

gdb Dec 7, 2012

Contributor

If we end up going down the WorkerStreamer-per-connection route, probably still want to make sure that this code gets invoked at most once per tick.

@gdb

gdb Dec 7, 2012

Contributor

If we end up going down the WorkerStreamer-per-connection route, probably still want to make sure that this code gets invoked at most once per tick.

This comment has been minimized.

Show comment Hide comment
@ConradIrwin

ConradIrwin Dec 7, 2012

Contributor

In particular we have to be careful because sending a message to the shell in the current scheme causes a tick. Avoiding that busy loop is the main reason for tracking @@last_message.

@ConradIrwin

ConradIrwin Dec 7, 2012

Contributor

In particular we have to be careful because sending a message to the shell in the current scheme causes a tick. Avoiding that busy loop is the main reason for tracking @@last_message.

@ConradIrwin

This comment has been minimized.

Show comment Hide comment
@ConradIrwin

ConradIrwin Dec 7, 2012

Contributor

Thanks for the detailed feedback. I like the idea of a WorkerStream abstraction, particularly if it can be made to 'just work' across reloads. (I have to confess to having no experience with programs that can reload themselves :)

I'll try to have a go at implementing this over the weekend, though if it's all clear in your head, feel free to steal my thunder.

Contributor

ConradIrwin commented Dec 7, 2012

Thanks for the detailed feedback. I like the idea of a WorkerStream abstraction, particularly if it can be made to 'just work' across reloads. (I have to confess to having no experience with programs that can reload themselves :)

I'll try to have a go at implementing this over the weekend, though if it's all clear in your head, feel free to steal my thunder.

@gdb

This comment has been minimized.

Show comment Hide comment
@gdb

gdb Dec 7, 2012

Contributor

Cool. Yeah, the fact that Einhorn knows how reload itself means you think a
bit about where your state lives. Including Persistent in an event class
will mark it for serialization / deserialization across the reload. I use
Einhorn::TransientState for most other internal state, though there are a
few class variables where cleaner.

On Friday, December 7, 2012, Conrad Irwin wrote:

Thanks for the detailed feedback. I like the idea of a WorkerStreamabstraction, particularly if it can be made to 'just work' across reloads.
(I have to confess to having no experience with programs that can reload
themselves :)

I'll try to have a go at implementing this over the weekend, though if
it's all clear in your head, feel free to steal my thunder.


Reply to this email directly or view it on GitHubhttps://github.com/stripe/einhorn/pull/14#issuecomment-11124319.

Sent from mobile

Contributor

gdb commented Dec 7, 2012

Cool. Yeah, the fact that Einhorn knows how reload itself means you think a
bit about where your state lives. Including Persistent in an event class
will mark it for serialization / deserialization across the reload. I use
Einhorn::TransientState for most other internal state, though there are a
few class variables where cleaner.

On Friday, December 7, 2012, Conrad Irwin wrote:

Thanks for the detailed feedback. I like the idea of a WorkerStreamabstraction, particularly if it can be made to 'just work' across reloads.
(I have to confess to having no experience with programs that can reload
themselves :)

I'll try to have a go at implementing this over the weekend, though if
it's all clear in your head, feel free to steal my thunder.


Reply to this email directly or view it on GitHubhttps://github.com/stripe/einhorn/pull/14#issuecomment-11124319.

Sent from mobile

@gdb

This comment has been minimized.

Show comment Hide comment
@gdb

gdb Dec 12, 2012

Contributor

Did you have a chance to put together a revised patch yet? Happy to talk things through if you have any questions.

Contributor

gdb commented Dec 12, 2012

Did you have a chance to put together a revised patch yet? Happy to talk things through if you have any questions.

@ConradIrwin

This comment has been minimized.

Show comment Hide comment
@ConradIrwin

ConradIrwin Dec 13, 2012

Contributor

I began having a go (ConradIrwin/einhorn@65c4254
— doesn't make sense yet), but got tied up about whether to multiplex the stream across the same socket, or open a new server for each stream that clients can then connect to.

The main problem with opening a new server for each stream is that it's not obvious whether the servers should always exist, and then clients subscribe to them directly (perhaps with a command that returns a map of stream-name to socket path); or whether we should lazily create servers when clients ask to subscribe to streams and then delete them when the last client disconnects.

The main problem with multiplexing is that you need to have explicit unsubscripe to tell the stream to stop, but keep the connection open, which gets a bit fiddly.

Do you have opinions?

I think the model of having a small set of stream names (say, 'stabilising' stream for this stuff, or 'debug' stream for streaming the debug-level logs) each of which has multiple subscibers seems to be about right. Einhorn code can then publish to the named stream whenever it wants (i.e. Einhorn.log_info can be updated to also publish to the 'debug' and 'info' streams) and the stream can deal with queuing messages to any active subscribers (at the moment I'm just using the existing connection for that).

Contributor

ConradIrwin commented Dec 13, 2012

I began having a go (ConradIrwin/einhorn@65c4254
— doesn't make sense yet), but got tied up about whether to multiplex the stream across the same socket, or open a new server for each stream that clients can then connect to.

The main problem with opening a new server for each stream is that it's not obvious whether the servers should always exist, and then clients subscribe to them directly (perhaps with a command that returns a map of stream-name to socket path); or whether we should lazily create servers when clients ask to subscribe to streams and then delete them when the last client disconnects.

The main problem with multiplexing is that you need to have explicit unsubscripe to tell the stream to stop, but keep the connection open, which gets a bit fiddly.

Do you have opinions?

I think the model of having a small set of stream names (say, 'stabilising' stream for this stuff, or 'debug' stream for streaming the debug-level logs) each of which has multiple subscibers seems to be about right. Einhorn code can then publish to the named stream whenever it wants (i.e. Einhorn.log_info can be updated to also publish to the 'debug' and 'info' streams) and the stream can deal with queuing messages to any active subscribers (at the moment I'm just using the existing connection for that).

@gdb

This comment has been minimized.

Show comment Hide comment
@gdb

gdb Dec 13, 2012

Contributor

What about tweaking the model to be along the lines of the following:

  • Internally, there's a global River singleton which exposes a River.publish(event, tags) method. Things like log_info publish their events directly.
  • You can subscribe to any set of tags via stream:subscribe and stream:unsubscribe.
  • Whenever a message is published for a tag you are subscribed to, that event is written to your connection.
  • The Einhorn protocol grows request/response IDs, so it's easy to multiplex commands over a single connection. The client may have to become a bit smarter than it is, but that's ok.

In particular, multiplexing over the single socket seems like the clear right answer to me. I don't think it'll actually be a ton of code, though it might be slightly tricky code.

One feature that might be cool for something like stabilization would be the ability to replay an old part of the stream, so if e.g. I attach my shell and want to see what it's been up to. I think that's not needed for a v0 though.

Contributor

gdb commented Dec 13, 2012

What about tweaking the model to be along the lines of the following:

  • Internally, there's a global River singleton which exposes a River.publish(event, tags) method. Things like log_info publish their events directly.
  • You can subscribe to any set of tags via stream:subscribe and stream:unsubscribe.
  • Whenever a message is published for a tag you are subscribed to, that event is written to your connection.
  • The Einhorn protocol grows request/response IDs, so it's easy to multiplex commands over a single connection. The client may have to become a bit smarter than it is, but that's ok.

In particular, multiplexing over the single socket seems like the clear right answer to me. I don't think it'll actually be a ton of code, though it might be slightly tricky code.

One feature that might be cool for something like stabilization would be the ability to replay an old part of the stream, so if e.g. I attach my shell and want to see what it's been up to. I think that's not needed for a v0 though.

@gdb

This comment has been minimized.

Show comment Hide comment
@gdb

gdb Oct 15, 2013

Contributor

Closed via #28

Contributor

gdb commented Oct 15, 2013

Closed via #28

@gdb gdb closed this Oct 15, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment