A number of fixes for restore-backup. #6282

Merged
merged 4 commits into from Sep 20, 2016

Conversation

Projects
None yet
5 participants
Owner

howbazaar commented Sep 20, 2016

On xenial, systemd does not automatically start a service that is added, unlike upstart on trusty. The restore function would ensure that the juju-db service existed. This would cause it to be stopped, removed, and re-added. Unfortunately it was missing a start causing the restore to hang sometimes.

Also, best to stop the juju-db service before we, possibly, remove all the files from under it.
Now we only remove file for mongo 2.x, and leave them for 3.x. This is due to the different mongo restore methods. 2.x requires mongo to be not running, and 3.x requires a mongo service to be running.

There was also a race between the peergrouper setting the API host ports, and the machine and unit agents of the other machines and units being managed by the controller getting restarted. This is fixed by the restore method explicitly setting the API host ports before moving on to the other machines.

Extra logging is added around the updating of the other machines.

The multiple different ways of stopping an agent are unneeded. All of trusty, xenial, and centos have the service wrapper. Worth noting that there is a problem if there are any windows workload machines, as the method to update the agent configuration won't work for them. But this isn't new with this branch.

QA

A LXD controller was set up with an ubuntu unit deployed in the default model.
A backup was taken, and then restored with the a new bootstrap node.
All agents come back on line, and start properly.

Runs have also been done on the CI machines testing the CI restore tests.

This branch appears to fix http://pad.lv/1606265, although I have to be honest and say I don't know why.

howbazaar added some commits Sep 19, 2016

Don't delete the mongo db files for mongo 3.x.
Lots of extra logging.
Make sure the APIHostPorts are set before restarting any agents.
Make sure that mongo is restarted after the new service file is written.
@@ -242,14 +269,17 @@ func (b *backups) Restore(backupId string, dbInfo *DBInfo, args RestoreArgs) (na
if err != nil {
return nil, errors.Trace(err)
}
- machines := []*state.Machine{}
+ machines := []machineModel{}
@anastasiamac

anastasiamac Sep 20, 2016

Member

Niiice \o/

@@ -5,11 +5,11 @@ package apiaddressupdater
import (
"github.com/juju/errors"
- "github.com/juju/juju/api/machiner"
@anastasiamac

anastasiamac Sep 20, 2016

Member

thank you :)

@@ -216,10 +216,6 @@ do
sed -i.old -r "/^(stateaddresses|apiaddresses):/{
n
s/- .*(:[0-9]+)/- {{.Address}}\1/
- }" $agent/agent.conf
- sed -i.old -r "/^(stateaddresses|apiaddresses):/{
@perrito666

perrito666 Sep 20, 2016

Contributor

You are sure that there will never be only one address?

@howbazaar

howbazaar Sep 20, 2016

Owner

I tested the sed script interactively and it works with both one and two addresses.

Owner

mitechie commented Sep 20, 2016

$$merge$$

Contributor

jujubot commented Sep 20, 2016

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit 6b9824c into juju:master Sep 20, 2016

@howbazaar howbazaar deleted the howbazaar:restore-debugging branch Sep 20, 2016

jujubot added a commit that referenced this pull request Sep 20, 2016

Merge pull request #6288 from juju/revert-6282-restore-debugging
Revert "A number of fixes for restore-backup."

Reverts juju/juju#6282 due to OSX build failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment