Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from 2.2 #7786

Merged
merged 38 commits into from Aug 24, 2017
Merged

Update from 2.2 #7786

merged 38 commits into from Aug 24, 2017

Conversation

howbazaar
Copy link
Contributor

Merge in fixes from the 2.2 branch.

howbazaar and others added 30 commits August 7, 2017 17:41
Remember the last logging config in the agent.conf file.

## Description of change

When agents currently restart they all start with --debug. This has two problems, one is that there is significant noise whenever an agent restarts and you are only interested in INFO or greater, and it is impossible to trace the dependency engine startup.

## QA steps

Bootstrap a new controller, and update the logging-config for the controller model.
SSH into one of the machines and look at the agent.conf file. It will have a logging-config value.
Restart the agent and observe it using that config to set up the logging when the agent starts.
This will be used by 1.25 upgrades as a sanity check that the provider
and state agree about which machines are in the model. It will also be
called as part of normal model migration.
Allow underscores in cloud names

## Description of change

Cloud names do not support having "_" in them. This messes up bootstrap if the user has chosen to create a cloud with an "_" in the name. The PR pulls in an updated names.v2 dep which fixes the issue.

Also add extra validation to add-cloud to reject invalid cloud names.

Also do a drive by cleanup of unused code.

## QA steps

Bootstrap using a cloud with "_" in the name.

## Documentation changes

From what I can see, no existing doc mentions cloud names having "_".

## Bug reference

https://bugs.launchpad.net/juju/+bug/1709665
Add new option to remove-machine to tell the provisioner not to stop the instance

## Description of change

Sometimes MAAS will fail a machine deployment after already telling Juju it will succeed. This can result in Juju having 2 machines in the model with the same instance id (one running, one not). Removing the bad machine in Juju then kills the good machine in MAAS.

Without significant MAAS/Juju changes, this issue cannot be fully fixed. Instead this PR adds some tooling to make things easier to deal with:
1. new "machine-id" instance tag to reflect the juju machine id (enables the user to correlate a cloud instance with the juju machine record)
2. log a warning when we write a duplicate instance id
3. a new argument to juju remove-machine: --keep-instance

When --keep-instance is used, the machine is removed from Juju but the provisioner will not call StopInstance() in the cloud, hence the cloud machine will be left running. This allows a failed machine which happens to have a duplicate instance id to be removed from Juju without stopping a cloud instance belonging to a different Juju machine.

## QA steps

bootstrap
add some machines (with and without units)
check the machine tags to see they include the new machine id tag
on an empty machine:
juju remove-machine --keep-instance
(verify that the machine is removed from juju but not killed in the cloud)
on a machine with units:
juju remove-machine --keep-instance --force
(verify that the machine is removed from juju but not killed in the cloud)

bootstrap juju 2.2
add a machine
juju remove-machine --keep-instance
-> error that juju doesn't support --keep-instance

## Documentation changes

The new --keep-instance argument to remove-machine needs to be documented.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1671588
Fix incorrect agent provisioning. (1705628)

## Description of change

When adding a unit, the unit should run the same juju agent version as its
model. However, units were being added with the agent version of their
controller not their model. This fix corrects the error.

## QA steps

Bootstrap a controller with juju version 2.1.x.

```sh
juju bootstrap lxd bug_1705628
```

Deploy an application to the default model

```sh
juju deploy ubuntu -m default
```

Check the agent version of the running application

```sh
juju status --format=yaml | grep version -a5
```
The ubuntu appliction unit should be running version 2.1.x of jujud

Upgrade the controller to version 2.2.x or greater, then add an additional unit

```sh
juju upgrade-juju -m controller

juju add-unit ubuntu
```
Check the agent version of the newly added unit. It should be running the agent
version of the default model and not the agent version of the controller model.

## Documentation changes

N/A

## Bug reference

[lp#1705628](https://bugs.launchpad.net/juju/+bug/1705628)
migrations: Add CheckMachines method to MigrationTarget facade

## Description of change

This adds a sanity-check method on the target controller so that we can confirm after the migration that the state machines match up with the provider instances. It's being added for use primarily in 1.25 upgrades, but I've added a call to it from the migration master worker to ensure that it works.

## QA steps

* Bootstrap two controllers A and B.
* Create model A:m, and deploy some applications to it.
* Migrate m to controller B.
* The migration should succeed - the log will include a message in the validation phase reporting "checking machines in migrated model".
Fix test race, and remove JujuConnSuite dependency.

This branch fixes a test race that was introduced in the recent PR juju#7709.

As a drive by I removed the JujuConnSuite and introduced an interface for the API calls.
Add Korean regions to Azure cloud

## Description of change

The Azure provider was missing support for 2 Korean regions.

## QA steps

Try and bootstrap using new regions

## Documentation changes

Not sure if we have doc stating what regions are supported for each cloud?

## Bug reference

https://bugs.launchpad.net/juju/+bug/1710650
Updated add-user help to clarify where the user is stored.

## Description of change

`add-user` help had an outdated information.

## QA steps

`juju help add-user` has updated information.

## Documentation changes

Command help was updated.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1709800
Remove the user/name index entry when model becomes dead.

There are two fixes here:
 - the state pool was not returning the right type of error
   now it returns a not found error
 - the user/name index value was being removed too late
   it is now removed when the model becomes dead rather
   than when all the docs are removed.

## QA steps
```
juju bootstrap lxd foo
juju add-model test
for i in $(seq 1 100); do echo $i; juju destroy-model test -y && echo $? && juju add-model test; done
```
Always adds immediately after, and no errors shown.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1709324
It was only ever passed as nil and muddied the water
considerably (additionally because of the existence of
AddPendingResources and AddPendingResource at the client level).
The old name was very unclear, especially in conjunction with the
AddPendingResources method (which really does add pending resources, and
then this method populates them).
It was being called from a method that used to need to combine 2
potential errors, but no longer does.
The deploy command creates pending resources for an application before
creating the application, and links them to the application using
pending IDs. These are used to resolve the resources during application
creation. If the application creation fails for some reason, these
pending resources need to be cleaned up.

This cleanup was initially done at the state level, but manual testing
showed that if a deployment failed due to validation at the API level,
the pending resources would still be left. Doing the cleanup in the
application API instead prevents this.

Part of the fix for https://bugs.launchpad.net/juju/+bug/1705730
We won't migrate them, and if their application exists then they're
remains from a failed previous deployment of an application with the
same name.

Fixes https://bugs.launchpad.net/juju/+bug/1705730
resources: Clean up pending resources after a failed deployment

## Description of change

The deploy command creates pending resources for an application before creating the application, and links them to the application using pending IDs. These are used to resolve the resources during 
application creation. If the application creation fails for some reason (like validation), these pending resources need to be removed.

Since there is validation at the API level as well as in `state.AddApplication` the cleanup needs to be done in the API code.

The pending resources were causing migration prechecks to fail. I've removed the check - pending resources for an application that exists indicate that there was a previous failed deployment, but that shouldn't prevent migration. The pending resources won't be migrated.

Includes some driveby changes to make some of the resources code a little easier to follow.
* Distinguish adding a pending charmstore resource and uploading a local pending resource.
* Remove an `io.Reader` argument that is only ever passed nil.
* Remove some vestigial code left from an earlier refactoring.

## QA steps

* Create a new model.
* Deploy an application that uses resources (e.g. ~cmars/mattermost) with options that will cause the deployment to fail - using `--to <nonexistent machine>` works.
* Check that there are no pending resources in the resources collection: `db.resources.find().pretty()` should return nothing.

## Bug reference

Fixes https://bugs.launchpad.net/juju/+bug/1705730
axw and others added 7 commits August 18, 2017 16:17
The undertaker was ignoring all errors when attempting
to transition a Dying model to Dead. This is wrong; we
should only ignore a couple of specific errors relating
to the contents of the model. After those errors are
received, we'll retry. Other errors should cause the
worker to restart.

Possibly fixes https://bugs.launchpad.net/juju/+bug/1695894
…e-errors

worker/undertaker: don't ignore all errors

## Description of change

The undertaker was ignoring all errors when attempting
to transition a Dying model to Dead. This is wrong; we
should only ignore a couple of specific errors relating
to the contents of the model. After those errors are
received, we'll retry. Other errors should cause the
worker to restart.

## QA steps

Smoke test: bootstrap, add-machine, destroy-controller --destroy-all-models.

## Documentation changes

None.

## Bug reference

Possibly fixes https://bugs.launchpad.net/juju/+bug/1695894
Make status faster (in theory).

juju status can be slow on big models in big controllers.

This branch attacks the problem from two different directions. Firstly the missing indices on the applications and machine collections. Both of these are fully loaded when status is executed. In controllers with many models the query would involve table scans.

Secondly when the apiserver code was iterating over the machines an units, each one had multiple calls to some value stored in the statuses collection. Instead of executing these queries one at a time, all the statuses for the model are loaded at once at the start of the status call.

This then reduces the number of calls for the status values by 2 per machine, and 3 per unit (or 4 per unit if SetStatus has never been called on the application. Also reduces the queries on statuses history by 1 per unit by making the full workload version status document available through the cache. As well as the status caching, a single query is made to the db for all the units in the model rather than one call per application.

So, for a model with 50 applications, 2500 units and 300 machines, this branch reduces the query count by 10648 (or 13148 if no app called SetStatus).

## QA steps

Run `juju status`
Copy link
Contributor

@anastasiamac anastasiamac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me with respect to my changes :D

@anastasiamac
Copy link
Contributor

But there are conflicts :D

@howbazaar
Copy link
Contributor Author

$$merge$$

@jujubot
Copy link
Collaborator

jujubot commented Aug 24, 2017

Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju

@jujubot jujubot merged commit aacdf44 into juju:develop Aug 24, 2017
@howbazaar howbazaar deleted the update-from-2.2 branch August 24, 2017 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants