Merge develop into feature branch #8126

Merged
merged 60 commits into from Nov 24, 2017

Conversation

Projects
None yet
10 participants
Member

axw commented Nov 24, 2017

Merges in develop, but also makes some small changes to stop trampolining upgrade and restore status through the agent. We now have the restore-watcher worker output a function to get the restore status, and pass the upgrade gate into the apiserver manifold.

externalreality and others added some commits Nov 10, 2017

Quick improvement for bug #1732509.
Model.Users() does a lookup based on model-uuid, but we don't have an
index. So on controllers with many models (but maybe not many users in
any given model), it has to do a collection scan over all model*users
for every model, leading to model*(model*users) = M^2. This at least
improves that bit, though we can do much better with some rewrites.
Merge pull request #8085 from jameinel/2.3-quick-index-1732509
Quick improvement for bug #1732509.

## Description of change

Model.Users() does a lookup based on model-uuid, but we don't have an
index. So on controllers with many models (but maybe not many users in
any given model), it has to do a collection scan over all model*users
for every model, leading to model*(model*users) = M^2. This at least
improves that bit, though we can do much better with some rewrites.

I'm working on a much more complete rewrite and fix that shows significant promise, but this is a *very* quick win that will still help.

https://github.com/jameinel/juju/tree/2.3-direct-models-1732384

## QA steps

```
$ juju bootstrap lxd
$ for i in `seq 200`; do juju add-model m$i; done
$ time juju models # repeat a few times
$ # create the index
$ time juju models # much better
```

## Documentation changes

No

## Bug reference

[lp:1732509](https://bugs.launchpad.net/juju/+bug/1732509)
Merge pull request #8090 from howbazaar/rename-bundle-config
Rename --bundle-config to --overlay

When --bundle-config was first introduced it was literally that, configuration for the primary bundle where all the applications had to already be defined in the main bundle.

Now the overlay bundles are full bundles in their own right and are overlayed over the primary bundle in the order that they appear on the command line.

## Documentation changes

Yes the docs will need to be updated.
environs.tools.PreferredStream -> PreferredStreams
It now returns a slice of fallback streams in preference order. For the
moment callers have been updated to just use the first one, they'll be
updated to do the fallback next.
Fix a couple indices.
For Migrations, if we just want the most recent,
we can put that information directly into the index,
and it will still use the model-uuid portion for other
queries that don't touch attempts.

Similarly we don't need two indexes that both cover
the same prefix.
Make environs.tools.FindTools accept multiple streams
It will query the metadata for each stream in turn, combining the
binaries found.

Update callers that already use PreferredStreams to pass them.
Merge pull request #8048 from ExternalReality/bug#1726226_rework
Ensure credential paths are "Finalized"  to their contents.

## Description of change

When bootstrapping a controller on `GCE` with `JUJU_DATA` set to the path of an empty directory, Juju will try to look for a path to a file holding `GCE` credentials in the environment variable `GOOGLE_APPLICATION_CREDENTIALS`. Juju incorrectly tries to use the file path itself as the credentials instead of using the contents of that file. This PR fixes that error.  

## Why is this change needed?

Fixes referenced bug.

## QA steps

1. set `GOOGLE_APPLICATION_CREDENTIALS` to the path of your application credentials.
2. set `JUJU_DATA` to an empty directory (This way `Juju` ignores its current set of credentials.)
3. You should see your controller successfully bootstrap on `GCE` whereas without this patch it would error expecting the file path string to be valid `JSON`.

## Documentation changes

N/A

## Does it affect current user workflow? CLI? API?

No

## Bug reference

https://bugs.launchpad.net/juju/+bug/1726226
Merge pull request #8092 from babbageclunk/stream-fallback
Find agent binaries using fallback across streams

### Description of change

If someone bootstraps with a beta or with an explicitly set devel stream, we weren't then considering released agent binaries as possible candidates for upgrades (because they are on the released stream). 
Change the metadata lookup so that we check the released stream if a controller is on the devel stream.

This is another go at #8080 without trying to change the simplestreams code - handling mirrors and resolve info was too involved. This version combines the results just above `tools/simplestreams/Fetch`, which means it handles mirrors right for each stream and doesn't need to combine resolve info because it's thrown away at that point anyway.

## QA steps

Bootstrap a controller with 2.2-beta3 (using a released 2.2 binary). Then run `juju upgrade-juju -m controller`. The controller should be upgraded to 2.2.6.

## Documentation changes

It's possible this needs documentation? I couldn't find any documentation for how we find agent binaries in streams, but if there is we should add a mention of the fallback to it.
Merge pull request #8094 from jameinel/2.3-migration-index
Fix a couple indices.

## Description of change

For Migrations, if we just want the most recent,
we can put that information directly into the index,
and it will still use the model-uuid portion for other
queries that don't touch attempts.

Similarly we don't need two indexes that both cover
the same prefix.

## QA steps

I found this by looking at mongotop load when doing `juju models` calls on an LXD system that had 200 models. With the change to this index on migrations the load dropped from being the highest cost to being nonexistant.

## Documentation changes

None

## Bug reference

None (though there are plenty of bugs around 'juju models' being slow, and this should help.)
Merge pull request #8095 from hmlanigan/bug1732614
fix bug 1732614

## Description of change

Prevent panics in firewaller, check pointers in neutron security group rules are not nil before using.

## QA steps

new unit tests

## Documentation changes

n/a

## Bug reference

https://bugs.launchpad.net/juju/+bug/1732614
Merge pull request #8079 from Veebers/ci-adding-initial-cmr-testing
CI: Initial CMR functional tests for offers and consumption.

Initial feature tests for the Cross Model Relation (CMR) feature.

Tested in this commit:
  - Creating, listing and deleting an offer of an endpoint (both default and explicitly set name)
  - Consumption of an offered endpoint from models in:
      a. The same controller
      b. Different controllers (can be in different providers, script argument provided.)
storageprovisioner: ignore incoherent volumes
In Juju, volumes cannot be provisioned without
an attachment to a machine. This is because
volumes are typically required to exist in the
same availability zone as the machine; so we
create the volume in the same AZ as the machine
to which it is intitially attached.

Because of this, the storage provisioner and
storage providers assume that an unprovisioned,
Alive, volume will have an attachment. It has
been observed in the wild that this does not
always hold true.

This change prevents the storage provisioner
from panicking upon seeing such volumes, but
does nothing to prevent the incoherent state
from occurring in the first place. It is not
clear how this can happen yet; it is possibly
an artifact of upgrading from an older version.

Partially fixes https://bugs.launchpad.net/juju/+bug/1722818
Merge pull request #8098 from axw/lp1732616-volume-events-incoherent
storageprovisioner: ignore incoherent volumes

## Description of change

In Juju, volumes cannot be provisioned without
an attachment to a machine. This is because
volumes are typically required to exist in the
same availability zone as the machine; so we
create the volume in the same AZ as the machine
to which it is intitially attached.

Because of this, the storage provisioner and
storage providers assume that an unprovisioned,
Alive, volume will have an attachment. It has
been observed in the wild that this does not
always hold true.

This change prevents the storage provisioner
from panicking upon seeing such volumes, but
does nothing to prevent the incoherent state
from occurring in the first place. It is not
clear how this can happen yet; it is possibly
an artifact of upgrading from an older version.

## QA steps

(does not mimic a realistic scenario, as we have no reproduction steps; --disks is not an advertised feature)
1. juju bootstrap openstack
2. juju add-machine --disks cinder,1G && juju remove-machine 0
3. juju debug-log -m controller
(should not be any panics)

## Documentation changes

None.

## Bug reference

Partially fixes https://bugs.launchpad.net/juju/+bug/1732616
Merge pull request #8102 from wallyworld/upgrade-juju-fixes
Fix upgrade-juju --dry-run  and upgrades from custom binaries

## Description of change

There were several small issues with upgrade-juju. The main one occurred when running a custom build (with a build number > 0). --dry-run would report n upgrade available, but then running the upgrade would not use the available version and instead upload a new custom build.
Other fixes include some message tweaks and not actually building a local jujud when using dry run.

## QA steps

Run up a custom build, eg 2.3-beta2.1
Use juju upgrade-juju with no args and ensure it upgrades to 2.3-beta3
(it would fail before)

## Documentation changes

None, just fixing bad behaviour.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1732879
Merge pull request #8103 from nskaggs/inc-2.3-rc2
Increment juju to 2.3-rc2

Bump to 2.3-rc2
Merge pull request #8105 from jameinel/2.3-get-model-status-1732779
If we are reporting an error object, report the right one.

## Description of change

We just checked that 'err' is nil, and then check that status[0].Error is not nil, and then we report the former instead of the latter.

## QA steps

Failures during model destruction should report the error that we get, instead of 'nil' as reported in the original bug.

## Documentation changes

None.

## Bug reference

[lp:1732779](https://bugs.launchpad.net/juju/+bug/1732779)
Merge pull request #8107 from howbazaar/pubsub-forwarder-increase-tim…
…eout

Longer api.Open timeout for pubsub/forwarder

Since all agent connections can get rate limited, make the api.Open longer than the default rate limit backoff + max delay time for the rate limiter.
Prevent upgrade-juju from trying to downgrade
Add a check to upgrade-juju that the target version that build-agent
wouldn't be a downgrade, rather than letting the API catch it after
we've built the agent and uploaded it to the controller.
environs: introduce AvailabilityZoneError
Introduce the environs.AvailabilityZoneError
interface, and an accompanying helper function
IsAvailabilityZoneIndependent. Providers'
StartInstance methods may return an error
implementing environs.AvailabilityZoneError
to indicate that an error was not influenced
by the specified availability zone. This is
used by the provisioner and bootstrap code
to decide whether or not to attempt with an
alternative availability zone.

The existing ErrAvailabilityZoneFailed error
value has been removed, and all providers
have been updated to use the new interface
where appropriate. An error type,
StartInstanceError, has been added to the
provider/common package which implements the
interface.
Merge pull request #8111 from wallyworld/use-stable-charm-repos
Update to use stable charm.v6 and charmrepo.v2 upstreams

## Description of change

The upstream charm and charmrepo repositories have gone stable so we use those versions.
Sadly, we still need to keep using charmstore.v5-unstable since the newer v5 pulls in old dependencies incompatible with Juju, eg idmclient. It also pulls in bakery.v-unstable. But the charmstore repo is only used  in tests.

## QA steps

bootstrap and deploy a charm
cmd/jujud/agent: don't run mongoupgrader
Unused, just getting in the way.
Merge pull request #8108 from babbageclunk/invalid-upgrade
upgrade-juju: Reject a version downgrade when using --build-agent

## Description of change

If you bootstrapped a controller with one version of Juju and then ran `juju upgrade-juju --build-agent` on the controller model with a juju binary with a lower version (e.g. from changing branches), it would successfully upload the binary and set the agent-version. Thenthe upgrade would stall when the worker actually tried to do the upgrade, with the error: `cannot sanely upgrade from 2.3-beta4.1 to 2.3-beta3.1`. The controller would then be stuck in the upgrade-in-progress state.

Prevent this by checking that the version is an upgrade in state before setting the new AgentVersion, and also check for it in the `upgrade-juju` command for quicker feedback.

## QA steps

Bootstrap a controller with 2.3-rc2, then change the version number to be 2.3-rc1, by editing `version/version.go` and rebuilding juju.
Running `juju upgrade-juju -m controller --build-agent` should refuse the downgrade without building and uploading the binary.

## Bug reference

Fixes https://bugs.launchpad.net/juju/+bug/1732156
Merge pull request #8112 from wallyworld/stop-metrics-listeners-on-error
If metrics workers error, ensure socket listeners are stopped

## Description of change

Bad data, eg zero length files in /var/lib/juju/metricspool/, cause the metric workers to bounce. But upon doing so, they don't close their socket. We add error handling to the "run" func passed to the periodic workers so that if there's an error, stop is called on the particular metrics structs, which stops the listener.

Also, if there's an error creating the content of the metric file, don't try and use it.

## QA steps

https://bugs.launchpad.net/juju/+bug/1733469/comments/6
Create zero length files and check for leaks.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1733469
provider/common: ZoneIndependentError
Remove StartInstanceError, and replace it with
the function ZoneIndependentError. This new
function wraps the given error such that it
will satisfy environs.IsAvailabilityZoneIndependent.
Merge pull request #8109 from axw/lp1732564-start-instance-error
environs: introduce AvailabilityZoneError

## Description of change

Introduce the environs.AvailabilityZoneError
interface, and an accompanying helper function
IsAvailabilityZoneIndependent. Providers'
StartInstance methods may return an error
implementing environs.AvailabilityZoneError
to indicate that an error was not influenced
by the specified availability zone. This is
used by the provisioner and bootstrap code
to decide whether or not to attempt with an
alternative availability zone.

The existing ErrAvailabilityZoneFailed error
value has been removed, and all providers
have been updated to use the new interface
where appropriate. A helper function,
ZoneIndependentError, has been added to the
provider/common package which wraps a given
function such that it satisfies
environs.IsAvailabilityZoneIndependent.

## QA steps

bootstrap
deploy ubuntu -n 3
(check units spread over AZs)
destroy-controller

repeat in a cloud with broken AZs, check that the provisioner retries after failing in one AZ

## Documentation changes

None.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1732564
Romulus doesn't ever talk to the plain juju api.
It only uses the Wallet Client api, so we don't need this interface.
Consider model in upgrade/restore login checks
When a controller machine agent is running
upgrade steps, or is restoring, we now check
the model when allowing through agents with
the same tag as the controller machine agent.

Move the upgrade/restore checks into the
apiserver package itself, and get rid of
the "LoginValidator". The agent just passes
in functions that the API server can call
to check the current upgrade or restore
status.

Removed some tests from the machine agent
related to restore status, which weren't really
testing anything meaningful. We should move the
restore watching code to the API server in 2.4,
and test it properly. See:
  https://bugs.launchpad.net/juju/+bug/1733552

Fixes https://bugs.launchpad.net/juju/+bug/1733250
Merge pull request #8114 from jameinel/2.3-driveby
Romulus doesn't ever talk to the plain juju api.

## Description of change

It only uses the Wallet Client api, so we don't need this interface. Just a trivial cleanup for something that was importing params and referencing ModelInfo that doesn't actually do anything with it.

## QA steps

Nothing should change.

## Documentation changes

None

## Bug reference

None
Merge pull request #8113 from axw/lp1733250-login-validation
Consider model in upgrade/restore login checks

## Description of change

When a controller machine agent is running
upgrade steps, or is restoring, we now check
the model when allowing through agents with
the same tag as the controller machine agent.

Move the upgrade/restore checks into the
apiserver package itself, and get rid of
the "LoginValidator". The agent just passes
in functions that the API server can call
to check the current upgrade or restore
status. The "mongoupgrade" worker is no
longer run by the machine agent; it is no
longer required, and allows us to simplify
the login logic.

Removed some tests from the machine agent
related to restore status, which weren't really
testing anything meaningful. We should move the
restore watching code to the API server in 2.4,
and test it properly. See:
  https://bugs.launchpad.net/juju/+bug/1733552

## QA steps

1. juju bootstrap localhost
2. juju enable-ha
3. juju deploy ubuntu
4. juju upgrade-juju -m controller --build-agent
(tail logs, check for errors indicating that agents can't log into each other)

## Documentation changes

None.

## Bug reference

Fixes https://bugs.launchpad.net/juju/+bug/1733250
Change Status.get_unit to find subordinates
Since subordinate units don't hang off their application but off their
principal unit, get_unit wasn't finding them. Make it also check the
subordinates of each unit.
Merge pull request #8115 from babbageclunk/really-fix-migration-test
acceptancetests: Really make migration test wait for subordinates to be ready

## Description of change

The previous PR #8073 didn't work - the test would still sometimes fail on the precheck for missing scopes, because the ntp units hadn't finished entering the relation scope. Waiting for all existing units to be idle didn't work because the ntp units hadn't been created at that point.

Explicitly wait for the ntp/0 and ntp/1 units to be idle instead.

## QA steps

Run the model migration acceptance test (ideally multiple times) and see that it doesn't fail because of prechecks.
Merge pull request #8118 from anastasiamac/remove-unwanted-recursion
Remove unwanted recursion.

## Description of change

Oversight...
Merge pull request #8119 from wallyworld/quiter-metrics-workers
Do not bounce metrics workers if there's a data error

## Description of change

If the metrics collection or sender workers error due to a data encoding or other issue like that, there's no point in bouncing the workers as it won't correct the problem. To make the logs quieter, log at debug level and don't return an error.

## QA steps

Create an empty metrics file and check logs.
apiserver: tidy up API root restrictions
Move around code to tidy up the server-side
authentication procedures. Move the API root
restriction after authentication and rate-limiting,
which will enable us (in a proceeding commit)
to permit all controller machines to log in
during maintenance.
apiserver: allow all controllers logins
Do not restrict any controller agents from
logging in during a maintenance event.

Incidentally fixes an issue where we inform
login observers that a login to the controller
model is not from a controller agent when it is.

Fixes https://bugs.launchpad.net/juju/+bug/1733259
Merge pull request #8120 from axw/apiserver-restrict-login-allow-cont…
…rollers

apiserver: allow all controller logins

## Description of change

Do not restrict any controller agents from
logging in during a maintenance event.
API root restriction has been moved to after
rate limiting and authentication.

Incidentally fixes an issue where we inform     
login observers that a login to the controller  
model is not from a controller agent when it is.

## QA steps

1. juju bootstrap localhost
2. juju enable-ha
(wait)
3. juju upgrade-model -m controller --build-agent
(juju debug-log -m controller; there should be no login errors)

## Documentation changes

None.

## Bug reference

Fixes https://bugs.launchpad.net/juju/+bug/1733259
Member

axw commented Nov 24, 2017

See a6af200 for the non-merge changes.

Member

axw commented Nov 24, 2017

$$merge$$

Contributor

jujubot commented Nov 24, 2017

Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju

Contributor

jujubot commented Nov 24, 2017

Build failed: Tests failed
build url: http://ci.jujucharms.com/job/github-merge-juju/577

axw added some commits Nov 24, 2017

Member

axw commented Nov 24, 2017

$$merge$$

Contributor

jujubot commented Nov 24, 2017

Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju

@jujubot jujubot merged commit 38586e0 into juju:state-controller-refactor Nov 24, 2017

1 check failed

continuous-integration/jenkins/pr-merge This commit cannot be built
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment