Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of StatusStream #917

Merged
merged 25 commits into from
Jul 13, 2016

Conversation

mwildehahn
Copy link
Contributor

Split out setting up events from: #916

Calling these "Scheduler Events" was too confusing given that we also have "Empire Events" that publish to various sources. I went with calling this a "StatusStream" that can be created and passed to an action (like Deploy) and can be published and read from throughout the lifecycle of the action.

I ended up creating a separate package status because otherwise we would have had to import the scheduler within deployments.go which felt off to me.

@@ -78,5 +80,22 @@ func (s *deployerService) Deploy(ctx context.Context, db *gorm.DB, opts DeployOp
return r, err
}

if s, ok := opts.Updates.(status.SubscribableStream); ok {
for update := range s.Subscribe() {
msg := fmt.Sprintf("Status: %s", update.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will already get prefixed with Status: when the jsonmessage is written.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that, maybe not.

@ejholmes
Copy link
Contributor

ejholmes commented Jul 8, 2016

I ended up creating a separate package status because otherwise we would have had to import the scheduler within deployments.go which felt off to me.

remind101/empire already depends on remind101/empire/scheduler, so it would be fine to put StatusStream within remind101/empire/scheduler. I actually think that would be better until there's a reason to extract it.

@ejholmes
Copy link
Contributor

ejholmes commented Jul 8, 2016

This is looking great so far.

@mwildehahn
Copy link
Contributor Author

i'm fine moving within remind101/empire/scheduler. just to confirm, that is just moving remind101/empire/status/status.go to remind101/empire/scheduler/status.go or should we just include everything within remind101/empire/scheduler/scheduler.go?

@ejholmes
Copy link
Contributor

ejholmes commented Jul 8, 2016

I think everything in remind101/empire/scheduler/scheduler.go would be best.

@mwildehahn
Copy link
Contributor Author

ok cool

Done(error)
}

type SubscribableStream interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just food for thought. Another possibility here would be to forgo channels entirely and make the interface:

// StatusStream is an interface for publishing status updates while executing
// an Empire action.
type StatusStream interface {
        // Publish publishes an update to the status stream
        Publish(Status) error

        // Done finalizes the status stream
        Done(error)

        // Wait returns a channel that that receives once Done() is called.
        // Consumers should call the Err() method to determine if an error
        // ocurred.
        Wait() <-chan struct{}

        // Returns the error from calling Done().
        Err() error
}

Then we just pass in an implementation that writes to the logstream:

type jsonmessageStatusStream struct {
    sync.Mutex
    done chan struct{}
    err  error
    w    io.Writer
}

func (s *jsonmessageStream) Publish(message scheduler.Status) {
    select {
    case <-done:
        panic("Publish called on finalized stream")
    default:
    }

    json.NewEncoder(s.w).Encode(jsonmessage.Message{Status: status.Message})
}

func (s *jsonmessageStream) Done(err error) {
    close(s.done)
    s.err = err
}

func (s *jsonmessageStream) Err() error {
    return s.err
}

func (s *jsonmessageStream) Wait() <-chan struct{} {
    return s.done
}

Main advantage would be that we don't need to worry about buffering the channel, calls to Publish will immediately write to the stream and it's easy to wrap with middleware (e.g. writing to the apps kinesis stream, etc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like that.

another thing i'm coming across is having to do: https://github.com/remind101/empire/pull/917/files#diff-48d76aef8c283046a79db32824933290R523.

not the end of the world, but definitely annoying.

i was thinking of adding an active flag within the stream, if it never gets activated, then the subscriber never blocks. there could either be a specific activate method or as soon as you publish the stream would get activated. but that defeats the purpose of being able to write to the stream at any point within a go routine and having the client subscribe.

also, i'm not a fan of throwing: panic("Publish called on finalized stream"). given these are just status updates, it feels aggressive to throw a panic. what would be the proper way to warn instead? i couldn't find any examples in the code.

Copy link
Contributor

@ejholmes ejholmes Jul 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, i'm not a fan of throwing: panic("Publish called on finalized stream"). given these are just status updates, it feels aggressive to throw a panic. what would be the proper way to warn instead? i couldn't find any examples in the code.

Agreed. If we change the interface to Publish(context.Context, Status), then you can do a logger.Warn(ctx, "message").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok cool

@mwildehahn
Copy link
Contributor Author

ok, i implemented the jsonmessageStatusStream as well as the other pieces of feedback.

the only part i'm not happy with is having to do: https://github.com/remind101/empire/pull/917/files#diff-48d76aef8c283046a79db32824933290R523

var msg jsonmessage.JSONMessage

r, err := s.deploy(ctx, db, opts)
tx := s.db.Begin()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd also wanna rollback here. Might be easier to wrap the transaction logic in a separate method:

func (s *deployerService) deployInTransaction(ctx context.Context, stream scheduler.StatusStream, opts De
        tx := s.db.Begin()
        r, err := s.deploy(ctx, tx, stream, opts)
        if err != nil {
                tx.Rollback()
                return r, err
        }

        if err := tx.Commit().Error; err != nil {
                return r, err
        }

        return r, nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was rolling back here: https://github.com/remind101/empire/pull/917/files#diff-ff47559c0383dd1959cdbe50cd2c036dR76 which is roughly equivalent, but +1 on the separation, i was thinking the same

@ejholmes
Copy link
Contributor

ejholmes commented Jul 8, 2016

I just hacked together a couple changes in the CloudFormation backend to use this. So awesome seeing more context in the deployment stream:

https://asciinema.org/a/b6vnccpg2u2uykf9wyyb1ijej

This looks gtm 👍. We can make changes to Scheduler implementations to use this in a separate PR (or if you wanna just open PR's against this branch, that's cool too).

@phobologic
Copy link
Contributor

raw


// StatusStream is an interface for publishing status updates while a scheduler
// is executing.
type StatusStream interface {
Copy link
Contributor

@ejholmes ejholmes Jul 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking some more about this last night and I think there's one more small change we might wanna do that will make this much simpler. We can give semantic meaning to when the StatusStream is nil, and change the signature of scheduler.Submit to:

// When StatusStream is nil, Submit should return as quickly as possible,
// usually when the new version has been received, and validated. If
// StatusStream is not nil, it's recommended that the method not return until
// the deployment has fully completed.
scheduler.Submit(context.Context, *App, StatusStream) error

This would mean, that we can remove the Done, Wait, and Err methods on the StatusStream interface so it's just Publish. For the CloudFormation backend, we'd just change this line to be if ss != nil.

The primary advantage will be in cases like the migration scheduler where we'd call Submit on two different schedulers, that might both call Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this. i think it helps differentiate EventStream as async empire notifications and StatusStream as synchronous notifications during the execution of an action.

@ejholmes
Copy link
Contributor

Awesome 👍 (sorry for the churn :)).

@mwildehahn
Copy link
Contributor Author

@ejholmes c6441b7 is actually the commit I was talking about, I forgot I had done it in a different branch.

Without this the db transaction surrounds both the creation and the submission of the release, which breaks concurrency.

@mwildehahn
Copy link
Contributor Author

@ejholmes ok i think this is good to go. i added an -s flag so we'll only setup the status stream if someone enables it. This means people can opt in to the new UX.

@mwildehahn
Copy link
Contributor Author

Deploy now looks like:

$ ./build/emp deploy remind101/acme-inc:master -s
master: Pulling from remind101/acme-inc

420890c9e918: Already exists
a3ed95caeb02: Already exists
1e14aab0083d: Already exists
4393485f2bb3: Already exists
a2a051cf14f7: Already exists
1f9177248efa: Already exists
Digest: sha256:d05682347b56191a950f55ab06f11a6c6e34041921af32281b01f4a6a9a0f75f
Status: Image is up to date for remind101/acme-inc:master
Status: Created cloudformation template: https://empire-0b96756d-templatebucket-fzr5jqii49z4.s3.amazonaws.com/acme-inc/e904d241-a249-4b13-aa77-9dc0a3639617/57c5f02918c0fd1714d4689075924b42523c4fb6
Status: Stack update submitted
Status: Waiting for stack update to complete
Status: Stack update complete
Status: Finished processing events for release v9 of acme-inc

When a deploy has been superseded:

$ ./build/emp deploy remind101/acme-inc:master -s
master: Pulling from remind101/acme-inc

420890c9e918: Already exists
a3ed95caeb02: Already exists
1e14aab0083d: Already exists
4393485f2bb3: Already exists
a2a051cf14f7: Already exists
1f9177248efa: Already exists
Digest: sha256:d05682347b56191a950f55ab06f11a6c6e34041921af32281b01f4a6a9a0f75f
Status: Image is up to date for remind101/acme-inc:master
Status: Created cloudformation template: https://empire-0b96756d-templatebucket-fzr5jqii49z4.s3.amazonaws.com/acme-inc/86874600-74f0-4ee4-9517-51c13a3e7240/3c156706ec606aa39375c6dfd7182f2f8a1d0ee4
Status: Waiting for existing stack operation to complete
Status: Operation superseded by newer release
Status: Finished processing events for release v4 of acme-inc

While waiting for another deploy to finish:

$ ./build/emp deploy remind101/acme-inc:master -s
master: Pulling from remind101/acme-inc

420890c9e918: Already exists
a3ed95caeb02: Already exists
1e14aab0083d: Already exists
4393485f2bb3: Already exists
a2a051cf14f7: Already exists
1f9177248efa: Already exists
Digest: sha256:d05682347b56191a950f55ab06f11a6c6e34041921af32281b01f4a6a9a0f75f
Status: Image is up to date for remind101/acme-inc:master
Status: Created cloudformation template: https://empire-0b96756d-templatebucket-fzr5jqii49z4.s3.amazonaws.com/acme-inc/86874600-74f0-4ee4-9517-51c13a3e7240/d5e7525e4fb98ee49bd3aceceed7e726a9029e64
Status: Waiting for existing stack operation to complete
Status: Stack update submitted
Status: Waiting for stack update to complete
Status: Stack update complete
Status: Finished processing events for release v5 of acme-inc

Create looks like:

$ ./build/emp deploy remind101/acme-inc:master -s
master: Pulling from remind101/acme-inc

420890c9e918: Already exists
a3ed95caeb02: Already exists
1e14aab0083d: Already exists
4393485f2bb3: Already exists
a2a051cf14f7: Already exists
1f9177248efa: Already exists
Digest: sha256:d05682347b56191a950f55ab06f11a6c6e34041921af32281b01f4a6a9a0f75f
Status: Image is up to date for remind101/acme-inc:master
Status: Created cloudformation template: https://empire-0b96756d-templatebucket-fzr5jqii49z4.s3.amazonaws.com/acme-inc/04a75f65-c7c4-4e36-943b-0a09614db1b4/67f96646de01688a45cc9d8a5f25ff9cb1ce479c
Status: Creating stack
Status: Stack created
Status: Finished processing events for release v2 of acme-inc

I wanted to get the actual cloudformation url, but i didn't see that we had easy access to the stack arn.

@@ -208,6 +208,10 @@ func (s *Scheduler) submit(ctx context.Context, tx *sql.Tx, app *scheduler.App,
return err
}

if err := scheduler.Publish(ss, fmt.Sprintf("Created cloudformation template: %v", *t.URL)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me feels like we should just ignore errors from Publish. If for example, the emp client disconnected from the network, this would result in err == io.EOF and would cause the deployment to fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea i was thinking the same thing initially but also didn't want to just swallow errors. maybe we just log these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

@ejholmes
Copy link
Contributor

We should also add Stream: true to

_, err = d.empire.Deploy(ctx, empire.DeployOpts{
Image: img,
Output: p,
User: &empire.User{Name: event.Deployment.Creator.Login},
})
.


// Schedule the new release onto the cluster.
return r, s.Release(ctx, r)
return r, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, you can just return releasesCreate(db, r).

@ejholmes
Copy link
Contributor

Couple of minor things, which we can follow up with later. This is awesome 👍.

@ejholmes
Copy link
Contributor

Oh, can you update the changelog too?

We'll log these as warnings but we don't want it to disrupt a deploy
@mwildehahn mwildehahn merged commit d240a32 into remind101:master Jul 13, 2016
@mwildehahn mwildehahn deleted the status-stream branch July 13, 2016 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants