Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
A number of fixes for restore-backup. #6282
Conversation
howbazaar
added some commits
Sep 19, 2016
| @@ -242,14 +269,17 @@ func (b *backups) Restore(backupId string, dbInfo *DBInfo, args RestoreArgs) (na | ||
| if err != nil { | ||
| return nil, errors.Trace(err) | ||
| } | ||
| - machines := []*state.Machine{} | ||
| + machines := []machineModel{} |
| @@ -5,11 +5,11 @@ package apiaddressupdater | ||
| import ( | ||
| "github.com/juju/errors" | ||
| - "github.com/juju/juju/api/machiner" |
| @@ -216,10 +216,6 @@ do | ||
| sed -i.old -r "/^(stateaddresses|apiaddresses):/{ | ||
| n | ||
| s/- .*(:[0-9]+)/- {{.Address}}\1/ | ||
| - }" $agent/agent.conf | ||
| - sed -i.old -r "/^(stateaddresses|apiaddresses):/{ |
howbazaar
Sep 20, 2016
Owner
I tested the sed script interactively and it works with both one and two addresses.
|
$$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
jujubot
merged commit 6b9824c
into
juju:master
Sep 20, 2016
howbazaar
deleted the
howbazaar:restore-debugging
branch
Sep 20, 2016
howbazaar
referenced this pull request
Sep 20, 2016
Merged
Revert "A number of fixes for restore-backup." #6288
added a commit
that referenced
this pull request
Sep 20, 2016
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
howbazaar commentedSep 20, 2016
On xenial, systemd does not automatically start a service that is added, unlike upstart on trusty. The restore function would ensure that the juju-db service existed. This would cause it to be stopped, removed, and re-added. Unfortunately it was missing a start causing the restore to hang sometimes.
Also, best to stop the juju-db service before we, possibly, remove all the files from under it.
Now we only remove file for mongo 2.x, and leave them for 3.x. This is due to the different mongo restore methods. 2.x requires mongo to be not running, and 3.x requires a mongo service to be running.
There was also a race between the peergrouper setting the API host ports, and the machine and unit agents of the other machines and units being managed by the controller getting restarted. This is fixed by the restore method explicitly setting the API host ports before moving on to the other machines.
Extra logging is added around the updating of the other machines.
The multiple different ways of stopping an agent are unneeded. All of trusty, xenial, and centos have the service wrapper. Worth noting that there is a problem if there are any windows workload machines, as the method to update the agent configuration won't work for them. But this isn't new with this branch.
QA
A LXD controller was set up with an ubuntu unit deployed in the default model.
A backup was taken, and then restored with the a new bootstrap node.
All agents come back on line, and start properly.
Runs have also been done on the CI machines testing the CI restore tests.
This branch appears to fix http://pad.lv/1606265, although I have to be honest and say I don't know why.