Start LXD containers up after migration #13

Merged
merged 3 commits into from Aug 9, 2017

Conversation

Projects
None yet
2 participants
Member

axw commented Aug 7, 2017

After migrating LXC containers to LXD,
start the LXD containers up so that
other commands can SSH into them to
run the remaining parts of the upgrade,
such as upgrade-agents. After starting
the LXD containers, the Juju agents are
stopped.

The "serviceCall" and "serviceCommand"
functions are consolidated into a new
function, agentServiceCommand. This
function will return an error if any
of the executions exit with a non-zero
code. Failed executions will have their
output written to stderr, prefixed with
the machine ID.

Start LXD containers up after migration
After migrating LXC containers to LXD,
start the LXD containers up so that
other commands can SSH into them to
run the remaining parts of the upgrade,
such as upgrade-agents. After starting
the LXD containers, the Juju agents are
stopped.

The "serviceCall" and "serviceCommand"
functions are consolidated into a new
function, agentServiceCommand. This
function will return an error if any
of the executions exit with a non-zero
code. Failed executions will have their
output written to stderr, prefixed with
the machine ID.

This is mostly great, but could you please either flatten down the two nested layers of parallelism in waitLXDContainersReady to one or pull the outer closure out to a function so that the two loops can be understood separately?

commands/exec.go
+ w.buf.Write(data)
+ for {
+ line, err := w.buf.ReadBytes('\n')
+ if err != nil {
@babbageclunk

babbageclunk Aug 9, 2017

Member

I found the behaviour of Write for data that doesn't end with a '\n confusing, but looking at how it's used I guess it's because io.Copy will break the data at arbitrary points and this prevents doubling up the prefix? Probably worth a comment?

@axw

axw Aug 9, 2017

Member

it was also broken, I've fixed it and simplified it by removing the buffer

+ }
+ if err := waitLXDContainersReady(
+ lxcByHost, containerNames,
+ time.Minute, // should be long enough for anyone
@babbageclunk

babbageclunk Aug 9, 2017

Member

Ha, sounds like famous last words.

@axw

axw Aug 9, 2017

Member

tongue in cheek :)

commands/migratelxc.go
+ lxdByHost map[*state.Machine]map[string]*lxdContainer,
+ containerNames map[*state.Machine]containerNames,
+) error {
+ running := make(map[string]bool) // keyed by LXD container name
@babbageclunk

babbageclunk Aug 9, 2017

Member

set.Strings here? Not much difference since you're only doing doing contains checks below.

@axw

axw Aug 9, 2017

Member

done

+ }
+ }
+ var group errgroup.Group
+ for host, containers := range lxcByHost {
@babbageclunk

babbageclunk Aug 9, 2017

Member

What about containers that were already migrated and renamed but not yet restarted? Oh, they'll still be in lxcByHost since that's populated from state, not from the actual machines, right?

@axw

axw Aug 9, 2017

Member

yup. added a NOTE to hopefully clarify that

commands/migratelxc.go
+
+ ctx, cancel := context.WithTimeout(context.Background(), timeout)
+ defer cancel()
+ group, ctx := errgroup.WithContext(ctx)
@babbageclunk

babbageclunk Aug 9, 2017

Member

Ha, finally! Also, the context stuff with timeouts is really neat.

commands/migratelxc.go
+ group, ctx := errgroup.WithContext(ctx)
+ const interval = time.Second // time to wait between checks
+
+ waitHostLXDContainersReady := func(host *state.Machine) error {
@babbageclunk

babbageclunk Aug 9, 2017

Member

I feel like two nested levels of closures and parallel execution are too many. Would it be difficult to extract waitHostLXDContainersReady to a separate method/function so that it can be understood in isolation? Then I can get my head around the fanout that you do there (from a host to its containers), so I can ignore it when understanding the fanout to the hosts.

Actually, does the parallelism need to be two levels? Can't you just flatten them down to a list of containers (wherever they're hosted) and then parallelize the sshing to them? I think that would be easier to follow.

@axw

axw Aug 9, 2017

Member

must've been on crack, waitHostLXDContainersReady was iterating over all containers, regardless of host. I've greatly simplified the code - good call!

+ lxcByHost map[*state.Machine][]*state.Machine,
+) error {
+ // Stop the Juju agents on the LXD containers.
+ var flatMachines []FlatMachine
@babbageclunk

babbageclunk Aug 9, 2017

Member

Yeah, like this does.

axw added some commits Aug 9, 2017

Simplify waitLXDContainersReady
and address other review comments

This is way nicer, thanks!

@axw axw merged commit 1dec50c into juju:master Aug 9, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment