Update from 2.2 #7786

Merged
merged 38 commits into from Aug 24, 2017

Conversation

Projects
None yet
6 participants
Owner

howbazaar commented Aug 24, 2017

Merge in fixes from the 2.2 branch.

howbazaar and others added some commits Aug 7, 2017

Merge pull request #7709 from howbazaar/2.2-save-logging-config
Remember the last logging config in the agent.conf file.

## Description of change

When agents currently restart they all start with --debug. This has two problems, one is that there is significant noise whenever an agent restarts and you are only interested in INFO or greater, and it is impossible to trace the dependency engine startup.

## QA steps

Bootstrap a new controller, and update the logging-config for the controller model.
SSH into one of the machines and look at the agent.conf file. It will have a logging-config value.
Restart the agent and observe it using that config to set up the logging when the agent starts.
Add CheckMachines method to MigrationTarget facade
This will be used by 1.25 upgrades as a sanity check that the provider
and state agree about which machines are in the model. It will also be
called as part of normal model migration.
Merge pull request #7736 from wallyworld/cloud-name-underscore
Allow underscores in cloud names

## Description of change

Cloud names do not support having "_" in them. This messes up bootstrap if the user has chosen to create a cloud with an "_" in the name. The PR pulls in an updated names.v2 dep which fixes the issue.

Also add extra validation to add-cloud to reject invalid cloud names.

Also do a drive by cleanup of unused code.

## QA steps

Bootstrap using a cloud with "_" in the name.

## Documentation changes

From what I can see, no existing doc mentions cloud names having "_".

## Bug reference

https://bugs.launchpad.net/juju/+bug/1709665
Merge pull request #7728 from wallyworld/remove-machine-keep
Add new option to remove-machine to tell the provisioner not to stop the instance

## Description of change

Sometimes MAAS will fail a machine deployment after already telling Juju it will succeed. This can result in Juju having 2 machines in the model with the same instance id (one running, one not). Removing the bad machine in Juju then kills the good machine in MAAS.

Without significant MAAS/Juju changes, this issue cannot be fully fixed. Instead this PR adds some tooling to make things easier to deal with:
1. new "machine-id" instance tag to reflect the juju machine id (enables the user to correlate a cloud instance with the juju machine record)
2. log a warning when we write a duplicate instance id
3. a new argument to juju remove-machine: --keep-instance

When --keep-instance is used, the machine is removed from Juju but the provisioner will not call StopInstance() in the cloud, hence the cloud machine will be left running. This allows a failed machine which happens to have a duplicate instance id to be removed from Juju without stopping a cloud instance belonging to a different Juju machine.

## QA steps

bootstrap
add some machines (with and without units)
check the machine tags to see they include the new machine id tag
on an empty machine:
juju remove-machine --keep-instance
(verify that the machine is removed from juju but not killed in the cloud)
on a machine with units:
juju remove-machine --keep-instance --force
(verify that the machine is removed from juju but not killed in the cloud)

bootstrap juju 2.2
add a machine
juju remove-machine --keep-instance
-> error that juju doesn't support --keep-instance

## Documentation changes

The new --keep-instance argument to remove-machine needs to be documented.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1671588
Merge pull request #7721 from ExternalReality/bug_1705628
Fix incorrect agent provisioning. (1705628)

## Description of change

When adding a unit, the unit should run the same juju agent version as its
model. However, units were being added with the agent version of their
controller not their model. This fix corrects the error.

## QA steps

Bootstrap a controller with juju version 2.1.x.

```sh
juju bootstrap lxd bug_1705628
```

Deploy an application to the default model

```sh
juju deploy ubuntu -m default
```

Check the agent version of the running application

```sh
juju status --format=yaml | grep version -a5
```
The ubuntu appliction unit should be running version 2.1.x of jujud

Upgrade the controller to version 2.2.x or greater, then add an additional unit

```sh
juju upgrade-juju -m controller

juju add-unit ubuntu
```
Check the agent version of the newly added unit. It should be running the agent
version of the default model and not the agent version of the controller model.

## Documentation changes

N/A

## Bug reference

[lp#1705628](https://bugs.launchpad.net/juju/+bug/1705628)
Merge pull request #7737 from babbageclunk/check-machines
migrations: Add CheckMachines method to MigrationTarget facade

## Description of change

This adds a sanity-check method on the target controller so that we can confirm after the migration that the state machines match up with the provider instances. It's being added for use primarily in 1.25 upgrades, but I've added a call to it from the migration master worker to ensure that it works.

## QA steps

* Bootstrap two controllers A and B.
* Create model A:m, and deploy some applications to it.
* Migrate m to controller B.
* The migration should succeed - the log will include a message in the validation phase reporting "checking machines in migrated model".
Merge pull request #7742 from howbazaar/2.2-logger-worker-race
Fix test race, and remove JujuConnSuite dependency.

This branch fixes a test race that was introduced in the recent PR #7709.

As a drive by I removed the JujuConnSuite and introduced an interface for the API calls.
Merge pull request #7743 from wallyworld/new-azure-regions
Add Korean regions to Azure cloud

## Description of change

The Azure provider was missing support for 2 Korean regions.

## QA steps

Try and bootstrap using new regions

## Documentation changes

Not sure if we have doc stating what regions are supported for each cloud?

## Bug reference

https://bugs.launchpad.net/juju/+bug/1710650
Merge pull request #7747 from anastasiamac/add-user-help
Updated add-user help to clarify where the user is stored.

## Description of change

`add-user` help had an outdated information.

## QA steps

`juju help add-user` has updated information.

## Documentation changes

Command help was updated.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1709800
Merge pull request #7753 from howbazaar/2.2-fix-destroy-model
Remove the user/name index entry when model becomes dead.

There are two fixes here:
 - the state pool was not returning the right type of error
   now it returns a not found error
 - the user/name index value was being removed too late
   it is now removed when the model becomes dead rather
   than when all the docs are removed.

## QA steps
```
juju bootstrap lxd foo
juju add-model test
for i in $(seq 1 100); do echo $i; juju destroy-model test -y && echo $? && juju add-model test; done
```
Always adds immediately after, and no errors shown.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1709324
Remove io.Reader parameter from Resources.AddPendingResource
It was only ever passed as nil and muddied the water
considerably (additionally because of the existence of
AddPendingResources and AddPendingResource at the client level).
Rename Client.AddPendingResource to UploadPendingResource
The old name was very unclear, especially in conjunction with the
AddPendingResources method (which really does add pending resources, and
then this method populates them).
Driveby: removed vestigial CombineErrors code
It was being called from a method that used to need to combine 2
potential errors, but no longer does.
Remove pending resources when Deploy fails
The deploy command creates pending resources for an application before
creating the application, and links them to the application using
pending IDs. These are used to resolve the resources during application
creation. If the application creation fails for some reason, these
pending resources need to be cleaned up.

This cleanup was initially done at the state level, but manual testing
showed that if a deployment failed due to validation at the API level,
the pending resources would still be left. Doing the cleanup in the
application API instead prevents this.

Part of the fix for https://bugs.launchpad.net/juju/+bug/1705730
Don't consider pending resources in migration precheck
We won't migrate them, and if their application exists then they're
remains from a failed previous deployment of an application with the
same name.

Fixes https://bugs.launchpad.net/juju/+bug/1705730
Merge pull request #7759 from babbageclunk/pending-resources
resources: Clean up pending resources after a failed deployment

## Description of change

The deploy command creates pending resources for an application before creating the application, and links them to the application using pending IDs. These are used to resolve the resources during 
application creation. If the application creation fails for some reason (like validation), these pending resources need to be removed.

Since there is validation at the API level as well as in `state.AddApplication` the cleanup needs to be done in the API code.

The pending resources were causing migration prechecks to fail. I've removed the check - pending resources for an application that exists indicate that there was a previous failed deployment, but that shouldn't prevent migration. The pending resources won't be migrated.

Includes some driveby changes to make some of the resources code a little easier to follow.
* Distinguish adding a pending charmstore resource and uploading a local pending resource.
* Remove an `io.Reader` argument that is only ever passed nil.
* Remove some vestigial code left from an earlier refactoring.

## QA steps

* Create a new model.
* Deploy an application that uses resources (e.g. ~cmars/mattermost) with options that will cause the deployment to fail - using `--to <nonexistent machine>` works.
* Check that there are no pending resources in the resources collection: `db.resources.find().pretty()` should return nothing.

## Bug reference

Fixes https://bugs.launchpad.net/juju/+bug/1705730
worker/undertaker: don't ignore all errors
The undertaker was ignoring all errors when attempting
to transition a Dying model to Dead. This is wrong; we
should only ignore a couple of specific errors relating
to the contents of the model. After those errors are
received, we'll retry. Other errors should cause the
worker to restart.

Possibly fixes https://bugs.launchpad.net/juju/+bug/1695894
Merge pull request #7761 from axw/lp1695894-undertaker-dont-ignore-er…
…rors

worker/undertaker: don't ignore all errors

## Description of change

The undertaker was ignoring all errors when attempting
to transition a Dying model to Dead. This is wrong; we
should only ignore a couple of specific errors relating
to the contents of the model. After those errors are
received, we'll retry. Other errors should cause the
worker to restart.

## QA steps

Smoke test: bootstrap, add-machine, destroy-controller --destroy-all-models.

## Documentation changes

None.

## Bug reference

Possibly fixes https://bugs.launchpad.net/juju/+bug/1695894
Merge pull request #7766 from howbazaar/2.2-faster-status
Make status faster (in theory).

juju status can be slow on big models in big controllers.

This branch attacks the problem from two different directions. Firstly the missing indices on the applications and machine collections. Both of these are fully loaded when status is executed. In controllers with many models the query would involve table scans.

Secondly when the apiserver code was iterating over the machines an units, each one had multiple calls to some value stored in the statuses collection. Instead of executing these queries one at a time, all the statuses for the model are loaded at once at the start of the status call.

This then reduces the number of calls for the status values by 2 per machine, and 3 per unit (or 4 per unit if SetStatus has never been called on the application. Also reduces the queries on statuses history by 1 per unit by making the full workload version status document available through the cache. As well as the status caching, a single query is made to the db for all the units in the model rather than one call per application.

So, for a model with 50 applications, 2500 units and 300 machines, this branch reduces the query count by 10648 (or 13148 if no app called SetStatus).

## QA steps

Run `juju status`

Looks ok to me with respect to my changes :D

Member

anastasiamac commented Aug 24, 2017

But there are conflicts :D

Owner

howbazaar commented Aug 24, 2017

$$merge$$

Contributor

jujubot commented Aug 24, 2017

Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju

@jujubot jujubot merged commit aacdf44 into juju:develop Aug 24, 2017

1 check passed

continuous-integration/jenkins/pr-merge This commit looks good
Details

@howbazaar howbazaar deleted the howbazaar:update-from-2.2 branch Aug 24, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment