Update from 2.2 #7786

howbazaar · 2017-08-24T05:04:40Z

Merge in fixes from the 2.2 branch.

…the instance

Remember the last logging config in the agent.conf file. ## Description of change When agents currently restart they all start with --debug. This has two problems, one is that there is significant noise whenever an agent restarts and you are only interested in INFO or greater, and it is impossible to trace the dependency engine startup. ## QA steps Bootstrap a new controller, and update the logging-config for the controller model. SSH into one of the machines and look at the agent.conf file. It will have a logging-config value. Restart the agent and observe it using that config to set up the logging when the agent starts.

This will be used by 1.25 upgrades as a sanity check that the provider and state agree about which machines are in the model. It will also be called as part of normal model migration.

Allow underscores in cloud names ## Description of change Cloud names do not support having "_" in them. This messes up bootstrap if the user has chosen to create a cloud with an "_" in the name. The PR pulls in an updated names.v2 dep which fixes the issue. Also add extra validation to add-cloud to reject invalid cloud names. Also do a drive by cleanup of unused code. ## QA steps Bootstrap using a cloud with "_" in the name. ## Documentation changes From what I can see, no existing doc mentions cloud names having "_". ## Bug reference https://bugs.launchpad.net/juju/+bug/1709665

Add new option to remove-machine to tell the provisioner not to stop the instance ## Description of change Sometimes MAAS will fail a machine deployment after already telling Juju it will succeed. This can result in Juju having 2 machines in the model with the same instance id (one running, one not). Removing the bad machine in Juju then kills the good machine in MAAS. Without significant MAAS/Juju changes, this issue cannot be fully fixed. Instead this PR adds some tooling to make things easier to deal with: 1. new "machine-id" instance tag to reflect the juju machine id (enables the user to correlate a cloud instance with the juju machine record) 2. log a warning when we write a duplicate instance id 3. a new argument to juju remove-machine: --keep-instance When --keep-instance is used, the machine is removed from Juju but the provisioner will not call StopInstance() in the cloud, hence the cloud machine will be left running. This allows a failed machine which happens to have a duplicate instance id to be removed from Juju without stopping a cloud instance belonging to a different Juju machine. ## QA steps bootstrap add some machines (with and without units) check the machine tags to see they include the new machine id tag on an empty machine: juju remove-machine --keep-instance (verify that the machine is removed from juju but not killed in the cloud) on a machine with units: juju remove-machine --keep-instance --force (verify that the machine is removed from juju but not killed in the cloud) bootstrap juju 2.2 add a machine juju remove-machine --keep-instance -> error that juju doesn't support --keep-instance ## Documentation changes The new --keep-instance argument to remove-machine needs to be documented. ## Bug reference https://bugs.launchpad.net/juju/+bug/1671588

Fix incorrect agent provisioning. (1705628) ## Description of change When adding a unit, the unit should run the same juju agent version as its model. However, units were being added with the agent version of their controller not their model. This fix corrects the error. ## QA steps Bootstrap a controller with juju version 2.1.x. ```sh juju bootstrap lxd bug_1705628 ``` Deploy an application to the default model ```sh juju deploy ubuntu -m default ``` Check the agent version of the running application ```sh juju status --format=yaml | grep version -a5 ``` The ubuntu appliction unit should be running version 2.1.x of jujud Upgrade the controller to version 2.2.x or greater, then add an additional unit ```sh juju upgrade-juju -m controller juju add-unit ubuntu ``` Check the agent version of the newly added unit. It should be running the agent version of the default model and not the agent version of the controller model. ## Documentation changes N/A ## Bug reference [lp#1705628](https://bugs.launchpad.net/juju/+bug/1705628)

migrations: Add CheckMachines method to MigrationTarget facade ## Description of change This adds a sanity-check method on the target controller so that we can confirm after the migration that the state machines match up with the provider instances. It's being added for use primarily in 1.25 upgrades, but I've added a call to it from the migration master worker to ensure that it works. ## QA steps * Bootstrap two controllers A and B. * Create model A:m, and deploy some applications to it. * Migrate m to controller B. * The migration should succeed - the log will include a message in the validation phase reporting "checking machines in migrated model".

Fix test race, and remove JujuConnSuite dependency. This branch fixes a test race that was introduced in the recent PR juju#7709. As a drive by I removed the JujuConnSuite and introduced an interface for the API calls.

Add Korean regions to Azure cloud ## Description of change The Azure provider was missing support for 2 Korean regions. ## QA steps Try and bootstrap using new regions ## Documentation changes Not sure if we have doc stating what regions are supported for each cloud? ## Bug reference https://bugs.launchpad.net/juju/+bug/1710650

Updated add-user help to clarify where the user is stored. ## Description of change `add-user` help had an outdated information. ## QA steps `juju help add-user` has updated information. ## Documentation changes Command help was updated. ## Bug reference https://bugs.launchpad.net/juju/+bug/1709800

Remove the user/name index entry when model becomes dead. There are two fixes here: - the state pool was not returning the right type of error now it returns a not found error - the user/name index value was being removed too late it is now removed when the model becomes dead rather than when all the docs are removed. ## QA steps ``` juju bootstrap lxd foo juju add-model test for i in $(seq 1 100); do echo $i; juju destroy-model test -y && echo $? && juju add-model test; done ``` Always adds immediately after, and no errors shown. ## Bug reference https://bugs.launchpad.net/juju/+bug/1709324

It was only ever passed as nil and muddied the water considerably (additionally because of the existence of AddPendingResources and AddPendingResource at the client level).

The old name was very unclear, especially in conjunction with the AddPendingResources method (which really does add pending resources, and then this method populates them).

It was being called from a method that used to need to combine 2 potential errors, but no longer does.

The deploy command creates pending resources for an application before creating the application, and links them to the application using pending IDs. These are used to resolve the resources during application creation. If the application creation fails for some reason, these pending resources need to be cleaned up. This cleanup was initially done at the state level, but manual testing showed that if a deployment failed due to validation at the API level, the pending resources would still be left. Doing the cleanup in the application API instead prevents this. Part of the fix for https://bugs.launchpad.net/juju/+bug/1705730

We won't migrate them, and if their application exists then they're remains from a failed previous deployment of an application with the same name. Fixes https://bugs.launchpad.net/juju/+bug/1705730

resources: Clean up pending resources after a failed deployment ## Description of change The deploy command creates pending resources for an application before creating the application, and links them to the application using pending IDs. These are used to resolve the resources during application creation. If the application creation fails for some reason (like validation), these pending resources need to be removed. Since there is validation at the API level as well as in `state.AddApplication` the cleanup needs to be done in the API code. The pending resources were causing migration prechecks to fail. I've removed the check - pending resources for an application that exists indicate that there was a previous failed deployment, but that shouldn't prevent migration. The pending resources won't be migrated. Includes some driveby changes to make some of the resources code a little easier to follow. * Distinguish adding a pending charmstore resource and uploading a local pending resource. * Remove an `io.Reader` argument that is only ever passed nil. * Remove some vestigial code left from an earlier refactoring. ## QA steps * Create a new model. * Deploy an application that uses resources (e.g. ~cmars/mattermost) with options that will cause the deployment to fail - using `--to <nonexistent machine>` works. * Check that there are no pending resources in the resources collection: `db.resources.find().pretty()` should return nothing. ## Bug reference Fixes https://bugs.launchpad.net/juju/+bug/1705730

The undertaker was ignoring all errors when attempting to transition a Dying model to Dead. This is wrong; we should only ignore a couple of specific errors relating to the contents of the model. After those errors are received, we'll retry. Other errors should cause the worker to restart. Possibly fixes https://bugs.launchpad.net/juju/+bug/1695894

…e-errors worker/undertaker: don't ignore all errors ## Description of change The undertaker was ignoring all errors when attempting to transition a Dying model to Dead. This is wrong; we should only ignore a couple of specific errors relating to the contents of the model. After those errors are received, we'll retry. Other errors should cause the worker to restart. ## QA steps Smoke test: bootstrap, add-machine, destroy-controller --destroy-all-models. ## Documentation changes None. ## Bug reference Possibly fixes https://bugs.launchpad.net/juju/+bug/1695894

Make status faster (in theory). juju status can be slow on big models in big controllers. This branch attacks the problem from two different directions. Firstly the missing indices on the applications and machine collections. Both of these are fully loaded when status is executed. In controllers with many models the query would involve table scans. Secondly when the apiserver code was iterating over the machines an units, each one had multiple calls to some value stored in the statuses collection. Instead of executing these queries one at a time, all the statuses for the model are loaded at once at the start of the status call. This then reduces the number of calls for the status values by 2 per machine, and 3 per unit (or 4 per unit if SetStatus has never been called on the application. Also reduces the queries on statuses history by 1 per unit by making the full workload version status document available through the cache. As well as the status caching, a single query is made to the db for all the units in the model rather than one call per application. So, for a model with 50 applications, 2500 units and 300 machines, this branch reduces the query count by 10648 (or 13148 if no app called SetStatus). ## QA steps Run `juju status`

anastasiamac

Looks ok to me with respect to my changes :D

anastasiamac · 2017-08-24T05:13:48Z

But there are conflicts :D

howbazaar · 2017-08-24T09:41:14Z

$$merge$$

jujubot · 2017-08-24T09:42:07Z

Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju

howbazaar and others added 30 commits August 7, 2017 17:41

Remember the last logging config in the agent.conf file.

9882a51

Updates from review.

2eeed87

Add new option to remove-machine to tell the provisioner not to stop …

dd58647

…the instance

Remove test now covered in agent_test.go.

fdd5d50

Enhance some tests and remove unnecessary code

8ab0bb1

Add CheckMachines method to MigrationTarget facade

3f04961

This will be used by 1.25 upgrades as a sanity check that the provider and state agree about which machines are in the model. It will also be called as part of normal model migration.

Add MigrationTarget.CheckMachines to API client

66a6313

Call MigrationTarget.CheckMachines from migrationmaster worker

9a960e3

Allow underscores in cloud names

a745c4a

Fix incorrect tools provisioning. (1705628)

ba2d56a

Review changes, and test status messages

d57bbf8

Fix test race, and remove JujuConnSuite dependency.

6f6057c

Merge pull request juju#7742 from howbazaar/2.2-logger-worker-race

6a2a4b3

Fix test race, and remove JujuConnSuite dependency. This branch fixes a test race that was introduced in the recent PR juju#7709. As a drive by I removed the JujuConnSuite and introduced an interface for the API calls.

Add Korean regions to Azure cloud

e91f2a0

Updated add-user help to clarify where the user is stored.

d8445ff

Remove the user/name index entry when model becomes dead.

40db18d

Remove io.Reader parameter from Resources.AddPendingResource

875c14a

It was only ever passed as nil and muddied the water considerably (additionally because of the existence of AddPendingResources and AddPendingResource at the client level).

Rename Client.AddPendingResource to UploadPendingResource

4acbe35

The old name was very unclear, especially in conjunction with the AddPendingResources method (which really does add pending resources, and then this method populates them).

Driveby: removed vestigial CombineErrors code

a5d70bb

It was being called from a method that used to need to combine 2 potential errors, but no longer does.

Don't consider pending resources in migration precheck

c8439ae

We won't migrate them, and if their application exists then they're remains from a failed previous deployment of an application with the same name. Fixes https://bugs.launchpad.net/juju/+bug/1705730

axw and others added 7 commits August 18, 2017 16:17

Make status faster.

91d7e75

Be more efficient sorting units.

573f9e1

Move the method.

a39a7d2

Resolve conflicts and get the tests passing.

4dace8f

anastasiamac approved these changes Aug 24, 2017

View reviewed changes

Update again.

ee9a14d

howbazaar force-pushed the update-from-2.2 branch from 49d408b to ee9a14d Compare August 24, 2017 09:30

jujubot merged commit aacdf44 into juju:develop Aug 24, 2017

howbazaar deleted the update-from-2.2 branch August 24, 2017 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update from 2.2 #7786

Update from 2.2 #7786

howbazaar commented Aug 24, 2017

anastasiamac left a comment

anastasiamac commented Aug 24, 2017

howbazaar commented Aug 24, 2017

jujubot commented Aug 24, 2017

Update from 2.2 #7786

Update from 2.2 #7786

Conversation

howbazaar commented Aug 24, 2017

anastasiamac left a comment

Choose a reason for hiding this comment

anastasiamac commented Aug 24, 2017

howbazaar commented Aug 24, 2017

jujubot commented Aug 24, 2017