migrations: implement log transfer phase #6622

Merged
merged 16 commits into from Nov 29, 2016

Conversation

Projects
None yet
4 participants
Member

babbageclunk commented Nov 28, 2016

Streams model logs from the source controller using the debug-log endpoint, and forwards them on to the /migrate/logtransfer endpoint in the target controller.

At the moment this isn't restartable - if the migration worker is killed and restarted while transferring logs the target model will end up with duplicates of all of the records that had already been transferred. This will be added in a follow-up PR.

QA steps:

  • bootstrap a source controller
  • add a model to be migrated
  • deploy something to the model
  • take note of the number of log messages, and the first and last messages: juju debug-log --replay
  • bootstrap a destination controller
  • migrate the model: juju migrate the-model dest-controller
  • verify that the log messages for the migrated model include all of the expected messages
Member

babbageclunk commented Nov 28, 2016

I haven't squished the commits for this - should I?

@babbageclunk babbageclunk reopened this Nov 28, 2016

Member

babbageclunk commented Nov 28, 2016

Gah, those buttons are so close.

Member

babbageclunk commented Nov 28, 2016

!!build!!

worker/migrationmaster/worker.go
+func (w *Worker) transferLogs(targetInfo coremigration.TargetInfo, modelUUID string) error {
+ sent := 0
+ reportProgress := func(sent int) {
+ w.setInfoStatus("successful, transferring logs to target controller (%d sent)", sent)
@babbageclunk

babbageclunk Nov 28, 2016

Member

The progress reporting was useful during my testing, not sure whether it's too spammy. Should I remove it?

@howbazaar

howbazaar Nov 28, 2016

Owner

If it writes to the log file, then maybe. Can we reduce the number of times we write it out? Some models may have a lot of logs, and writing each thousand may be too often. Can we report percentages? Do we have a count at the start?

@mjs

mjs Nov 28, 2016

Contributor

This will write to the logs - as well as feeding the migration status shown in the "juju status" output. I think the progress reporting is useful but agree that 1000 is probably too frequent. Making it percentage based or logging an update every N seconds might be better.

FWIW the migrationmaster already reports progress every 30s when waiting for minions to report back at various migration phases. Maybe that const should be shared?

@babbageclunk

babbageclunk Nov 28, 2016

Member

This is streaming the records from the debug-log endpoint, so there's no easy way to get a percentage indicator without having the MigrationMaster facade grow another method to find out the total. Doing it by time is pretty straightforward - I'll do that (although I guess testing it will be a bit fiddly).

Member

babbageclunk commented Nov 28, 2016

Build was killed with a "no more instances allowed" error.

Member

babbageclunk commented Nov 28, 2016

!!build!!

api/base/testing/apicaller.go
@@ -48,6 +49,10 @@ func (APICallerFunc) ConnectStream(path string, attrs url.Values) (base.Stream,
return nil, errors.New("stream connection unimplemented")
}
+func (APICallerFunc) ConnectControllerStream(path string, attrs url.Values, headers http.Header) (base.Stream, error) {
+ return nil, errors.New("stream connection unimplemented")
@howbazaar

howbazaar Nov 28, 2016

Owner

We have a NotImplementedError.

+ headers := http.Header{}
+ headers.Add("thomas", "cromwell")
+ headers.Add("anne", "boleyn")
+ s.PatchValue(api.WebsocketDialConfig, catcher.recordLocation)
api/common/logs.go
+ "github.com/juju/juju/apiserver/params"
+)
+
+// TODO(ericsnow) Fold DebugLogParams into params.LogStreamConfig.
@howbazaar

howbazaar Nov 28, 2016

Owner

Can you please see if this makes sense to do, and if not, remove the comment?

@babbageclunk

babbageclunk Nov 29, 2016

Member

LogStreamConfig has AllModels, Sink (basically a label), MaxLookbackDuration and MaxLookbackRecords
DebugLogParams has 4 fields for including/excluding entities/modules, + Limit, Backlog, Replay, NoTail

So the only overlap is MaxLookbackRecords/Backlog - the other fields would need some amount of dev work to support them in logstream or debug-log.

I think it would make sense to do it at some point, but probably not right now. Want me to get rid of the comment?

@mjs

mjs Nov 29, 2016

Contributor

If the suggested change still makes sense then leave it

+ // the goroutine will be leaked. This is OK when used from the command
+ // line, but is a problem if it happens in jujud. Change it to accept
+ // a stop channel and use a read deadline so that the client can stop
+ // it. https://pad.lv/1644084
@howbazaar

howbazaar Nov 28, 2016

Owner

Good catch. As a general rule, bare channel reads or writes should not be in production code, just tests (and even then, not really).

@@ -248,3 +249,10 @@ func groupTagIds(tagStrs []string) ([]string, []string, error) {
}
return machines, units, nil
}
+
+func (c *Client) StreamModelLog() (<-chan common.LogMessage, error) {
+ return common.StreamDebugLog(c.caller.RawAPICaller(), common.DebugLogParams{
@howbazaar

howbazaar Nov 28, 2016

Owner

Aint reuse great?

- st, err := s.ctxt.stateForMigration(req)
+ // Require MigrationModeNone because logtransfer happens after the
+ // model proper is completely imported.
+ st, err := s.ctxt.stateForMigration(req, state.MigrationModeNone)
@howbazaar

howbazaar Nov 28, 2016

Owner

I would have assumed that the mode wouldn't be None, but instead there would be some indicator that the migration was still in progress. Do we not have anything in the target?

@mjs

mjs Nov 28, 2016

Contributor

Unfortunately not at present. Once the migration hits SUCCESS, the MigrationTarget.Active API is called on the target which sets the migration mode to MigrationModeNone, allowing workers to start and the model to function.

I guess we could add a "post import" phase but that will require some updates to various bits of the code which currently only trigger on MigrationModeNone. It is a bit unfortunate that the current situation means the /migrate/logtransfer import is available during normal operation but I'm not too worried about it at this stage.

Xtian: can you please add a card to the board in the model migration section which covers this? We can tidy it up later.

@mjs

mjs Nov 29, 2016

Contributor

Thanks

worker/migrationmaster/worker.go
+func (w *Worker) transferLogs(targetInfo coremigration.TargetInfo, modelUUID string) error {
+ sent := 0
+ reportProgress := func(sent int) {
+ w.setInfoStatus("successful, transferring logs to target controller (%d sent)", sent)
@babbageclunk

babbageclunk Nov 28, 2016

Member

The progress reporting was useful during my testing, not sure whether it's too spammy. Should I remove it?

@howbazaar

howbazaar Nov 28, 2016

Owner

If it writes to the log file, then maybe. Can we reduce the number of times we write it out? Some models may have a lot of logs, and writing each thousand may be too often. Can we report percentages? Do we have a count at the start?

@mjs

mjs Nov 28, 2016

Contributor

This will write to the logs - as well as feeding the migration status shown in the "juju status" output. I think the progress reporting is useful but agree that 1000 is probably too frequent. Making it percentage based or logging an update every N seconds might be better.

FWIW the migrationmaster already reports progress every 30s when waiting for minions to report back at various migration phases. Maybe that const should be shared?

@babbageclunk

babbageclunk Nov 28, 2016

Member

This is streaming the records from the debug-log endpoint, so there's no easy way to get a percentage indicator without having the MigrationMaster facade grow another method to find out the total. Doing it by time is pretty straightforward - I'll do that (although I guess testing it will be a bit fiddly).

Member

babbageclunk commented Nov 28, 2016

Looks like @axw's metrics work has added a test with an APICaller implementation that now needs a ConnectControllerStream method. I'll add that.

Member

babbageclunk commented Nov 28, 2016

Ugh, I guess I can't push that change without it marking all of your comments old (since I had to rebase to pull in the test).

This is very close. Just need to address the remaining comments.

- st, err := s.ctxt.stateForMigration(req)
+ // Require MigrationModeNone because logtransfer happens after the
+ // model proper is completely imported.
+ st, err := s.ctxt.stateForMigration(req, state.MigrationModeNone)
@howbazaar

howbazaar Nov 28, 2016

Owner

I would have assumed that the mode wouldn't be None, but instead there would be some indicator that the migration was still in progress. Do we not have anything in the target?

@mjs

mjs Nov 28, 2016

Contributor

Unfortunately not at present. Once the migration hits SUCCESS, the MigrationTarget.Active API is called on the target which sets the migration mode to MigrationModeNone, allowing workers to start and the model to function.

I guess we could add a "post import" phase but that will require some updates to various bits of the code which currently only trigger on MigrationModeNone. It is a bit unfortunate that the current situation means the /migrate/logtransfer import is available during normal operation but I'm not too worried about it at this stage.

Xtian: can you please add a card to the board in the model migration section which covers this? We can tidy it up later.

@mjs

mjs Nov 29, 2016

Contributor

Thanks

worker/migrationmaster/worker.go
+func (w *Worker) transferLogs(targetInfo coremigration.TargetInfo, modelUUID string) error {
+ sent := 0
+ reportProgress := func(sent int) {
+ w.setInfoStatus("successful, transferring logs to target controller (%d sent)", sent)
@babbageclunk

babbageclunk Nov 28, 2016

Member

The progress reporting was useful during my testing, not sure whether it's too spammy. Should I remove it?

@howbazaar

howbazaar Nov 28, 2016

Owner

If it writes to the log file, then maybe. Can we reduce the number of times we write it out? Some models may have a lot of logs, and writing each thousand may be too often. Can we report percentages? Do we have a count at the start?

@mjs

mjs Nov 28, 2016

Contributor

This will write to the logs - as well as feeding the migration status shown in the "juju status" output. I think the progress reporting is useful but agree that 1000 is probably too frequent. Making it percentage based or logging an update every N seconds might be better.

FWIW the migrationmaster already reports progress every 30s when waiting for minions to report back at various migration phases. Maybe that const should be shared?

@babbageclunk

babbageclunk Nov 28, 2016

Member

This is streaming the records from the debug-log endpoint, so there's no easy way to get a percentage indicator without having the MigrationMaster facade grow another method to find out the total. Doing it by time is pretty straightforward - I'll do that (although I guess testing it will be a bit fiddly).

babbageclunk added some commits Nov 22, 2016

Extract common.StreamDebugLog from api.Client
So we can use it from the migrationmaster worker.
Add api.state.ConnectControllerStream
And add it to the api.base.APICaller interface.

This will be used by the migrationmaster worker to transfer log messages to the
/migrate/logtransfer stream endpoint.
Change logtransfer endpoint required migration mode
The other /migrate/* endpoints require the target model to be imported,
but logtransfer happens after the model has finished importing, so it
should be in MigrationModeNone.
Extracted transferLogs from doLOGTRANSFER
Added a progress report (was useful for testing, not sure whether it's
that useful for actual use).
Added stubAPICaller.ConnectControllerStream method
These tests landed while I was working on migration logtransfer.
Remove TODO
Merging DebugLogParams and params.LogStreamConfig is probably a sensible
thing to do, but since they're almost entirely disjoint it's really
equivalent to merging the logstream and debug-log endpoints.

mjs approved these changes Nov 29, 2016

Good stuff

+func (s *Suite) TestLogTransferReportsProgress(c *gc.C) {
+ s.facade.queueStatus(s.makeStatus(coremigration.LOGTRANSFER))
+ messages := []common.LogMessage{
+ {Message: "captain beefheart"},
@mjs

mjs Nov 29, 2016

Contributor

:)

worker/migrationmaster/worker_test.go
+ s.facade.logMessages = func(d chan<- common.LogMessage) {
+ for _, message := range messages {
+ safeSend(c, d, message)
+ s.clock.WaitAdvance(15*time.Second, coretesting.ShortWait, 1)
@mjs

mjs Nov 29, 2016

Contributor

What does this do to the test run time?

@babbageclunk

babbageclunk Nov 29, 2016

Member

The individual test is reported as taking 0.000s, there's no measurable increase to the suite runtime.

Use LongWait instead of ShortWait
Since we expect this to happen.
Member

babbageclunk commented Nov 29, 2016

$$merge$$

Contributor

jujubot commented Nov 29, 2016

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit c8ffb68 into juju:develop Nov 29, 2016

1 check passed

github-check-merge-juju Built PR, ran unit tests, and tested LXD deploy. Use !!.*!! to request another build. IE, !!build!!, !!retry!!
Details

@babbageclunk babbageclunk deleted the babbageclunk:logtransfer-worker branch Nov 29, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment