Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev] merge 2.9 #13329

Merged
merged 74 commits into from Sep 13, 2021
Merged

[dev] merge 2.9 #13329

merged 74 commits into from Sep 13, 2021

Conversation

achilleasa
Copy link
Contributor

@achilleasa achilleasa commented Sep 13, 2021

This PR forward ports 2.9 into develop. The following PRs are included in this port:

The following files had merge conflicts that had to be resolved (please double-check the changes in last commit):

  • caas/kubernetes/provider/bootstrap_test.go
  • feature/flags.go
  • scripts/win-installer/setup.iss
  • snap/snapcraft.yaml
  • state/pool.go
  • version/version.go
  • worker/uniter/relation/state_test.go

Juan Tirado and others added 30 commits July 26, 2021 17:01
The following changes ensures that the eth0 device gets a unique name.
This is given to us via LXD. Unfortunately we're unsure currently why
you would have interfaces first time around and then devices another
time around, which seems to be the only way to trip this up?

More information from the host machine would be required to find the
underlying bug `ip -a` and a list of all the nics.

For now this seems like it has the potential to unblock containers being
created.
…shell-2.9

juju#13309

See also juju#13299

/etc/profile.d/juju-introspection.sh contains a few bash arrays, breaking POSIX shell compatibility and at least some packages post-installation as it is loaded as a profile script.

For example, on a model deployed with Juju 2.9.11, installing spamassassin yields a _Syntax error_ as spamassassin post-installation script attempts to run sa-update as debian-spamd user, having /bin/sh as shell.

The change includes variables name changes (and a comment) to avoid future updates breaking POSIX compatibility.
Feel free to update the comment to whatever you see fit.

## QA steps

* Check it builds properly
* Deploy a model with spamassassin installation (eg. charm cs:mailman3-core) and check it doesn't fail during SA install.

## Bug reference

Fixes https://bugs.launchpad.net/juju/+bug/1942430
Probable iteration of https://bugs.launchpad.net/juju/+bug/1770437
juju#13312

As seen in the linked bug, there can be a panic if the txn watcher worker bounces and closes a "watcherStarted" channel twice. However, that channel and associated pubsub topic is unnecessary in the first place, so this PR removes them both.

## QA steps

bootstrap and deploy a couple of charms and relate

## Bug reference

https://bugs.launchpad.net/juju/+bug/1942421
juju#13221

A very common user behaviour is to watch the status of deployments for a while. Actually, in the documents this is done using the `watch` command in conjunction with `juju status`. I have defined an additional flag for the `status` command called `--watch` followed by a duration value that runs 'status' commands every given interval of time endlessly.

## QA steps

To test simply deploy any application and watch the iterative status calls.

```sh
juju deploy hello-juju
# print status and exit
juju status
# print the status every two seconds
juju status --watch=2s
```
The operation continues until the program is killed.

## Documentation changes

The flag is documented in the command docs.
SimonRichardson and others added 23 commits September 9, 2021 16:04
Until we can correctly pull in the lxd upstream project, we have to use
a work around to check for the error status. This is an abridged version
of how the LXD code base does it internally[1].

 1. https://github.com/lxc/lxd/blob/master/shared/api/error.go#L32-L58
juju#13320

The following PR reverts the 2.9 changes around LXD. As the LXD project is
using 1.16 version of go and we're using 1.14 there are some issues around
the exact packages we're building with. Running `go mod tidy` on the tip of
2.9 returns an error (see below) which indicates something is using the latest
`io/fs` packages and we're not able to consume them, so it's finding alternative 
dependencies to use from the sum file (at a guess). As stated before, go mod
along with the sum files, are not a lock file, it only describes what's downloaded
and to ensure they're not corrupt.

The fix is simple, revert the LXD changes and work around the LXD status error
by creating an interface to validate against.

```
go mod tidy
github.com/juju/juju/apiserver/logsink imports
 gopkg.in/natefinch/lumberjack.v2 tested by
 gopkg.in/natefinch/lumberjack.v2.test imports
 github.com/BurntSushi/toml imports
 io/fs: package io/fs is not in GOROOT (/usr/local/go/src/io/fs)
github.com/juju/juju/apiserver/logsink imports
 gopkg.in/natefinch/lumberjack.v2 tested by
 gopkg.in/natefinch/lumberjack.v2.test imports
 github.com/BurntSushi/toml tested by
 github.com/BurntSushi/toml.test imports
 testing/fstest: package testing/fstest is not in GOROOT (/usr/local/go/src/testing/fstest)
github.com/juju/juju/apiserver/logsink imports
 gopkg.in/natefinch/lumberjack.v2 tested by
 gopkg.in/natefinch/lumberjack.v2.test imports
 github.com/BurntSushi/toml tested by
 github.com/BurntSushi/toml.test imports
 github.com/BurntSushi/toml/internal/toml-test imports
 embed: package embed is not in GOROOT (/usr/local/go/src/embed)
```

## QA steps


```sh
$ snap install juju --classic
$ /snap/bin/juju bootstrap lxd test
$ juju upgrade-controller --build-agent
$ juju debug-log -m controller
```
Ensure that the single-controller does settle and has started. 

```sh
$ juju ssh -m controller 0
$ juju_engine_report | less
```

## Bug reference

https://bugs.launchpad.net/juju/+bug/1943075
juju#13319

Some time ago we did a significant overhaul of the payload delivery for updating machine address configuration. Included were changes to use members of the `NetworkConfig.Addresses` entries instead of those on the top level `NetworkConfig`. One of these was the address subnet CIDR.

What we missed at the time was to change`NetworkConfigAPI.fixUpFanSubnets` so that it applied the Fan updates to the `Addresses` entries and not the top-level `CIDR` member.

This meant that Fan addresses were being stored with subnets indicating the /8 overlay instead of the zone-specific /12 segments in AWS, resulting in the observed bug.

Here we fix `fixUpFanSubnets` so that we get the fixed Fan segment subnets against incoming addresses.

## QA steps

- `juju bootstrap aws patch-test --debug --no-gui --build-agent`
- `juju add-space space-1 172.31.0.0/20`
- `juju add-machine --constraints "spaces=space-1" --series bionic`
- `juju deploy cs:ubuntu --to lxd:0 --series focal --constraints "spaces=space-1" --bind "space-1"`
- Observe that the container is provisioned.

## Documentation changes

None.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1942950
juju#13321

The secret-create and secret-update hook commands can now set additional secret metadata:
- description
- tags
- status

Secret status defaults to 'active' and can be set to 'pending' via the CLI option `--pending`. It can be used when rotating secrets.

The secret URL format has changed also. The main change is the removal of version from the URL and the introduction of an 'app' namespace component (more to come). Also, the use of '.' as a path separator is removed, replaced by '/'.

## QA steps

bootstrap and deploy ubuntu charm

```
juju deploy ubuntu
$ juju exec --unit ubuntu/0 "secret-create password --description 'my secret' --tag foo=bar --tag hello=world data=s3cret!"
secret://app/ubuntu/password

$ juju exec --unit ubuntu/0 "secret-get secret://app/ubuntu/password"
s3cret!

$ juju exec --unit ubuntu/0 "secret-update password --rotate 5h --description 'ssshhh'"
secret://app/ubuntu/password?revision=1

$ juju secrets --format yaml
- ID: 1
 URL: secret://app/ubuntu/password
 revision: 1
 path: app/ubuntu/password
 status: active
 rotate-interval: 5h0m0s
 version: 1
 description: ssshhh
 tags:
 foo: bar
 hello: world
 backend: juju
 create-time: 2021-09-10T00:48:49Z
 update-time: 2021-09-10T00:50:26Z

$ juju exec --unit ubuntu/0 "secret-update password --pending data=another"
secret://app/ubuntu/password?revision=2

$ juju secrets --format yaml --show-secrets
- ID: 1
 URL: secret://app/ubuntu/password
 revision: 2
 path: app/ubuntu/password
 status: pending
 rotate-interval: 5h0m0s
 version: 1
 description: ssshhh
 tags:
 foo: bar
 hello: world
 backend: juju
 create-time: 2021-09-10T00:48:49Z
 update-time: 2021-09-10T00:52:42Z
 value:
 data: another
```
The host_name should be generated directly from LXD, instead of Juju.
We only set the host_name to identify the devices/nics when looking at
them via `ip a`. As this was just a convenience, we should fallback to
letting LXD do it. This will prevent naming collisions at the kernel
level, where they haven't gc'd the devices in time.
It is no longer required to pass the machine ID through to generate the
device name any longer.
This test is meant to catch regressions for LP1942328.
…s-config-host-name

juju#13274

The following changes ensures that the eth0 device gets a unique name.
This is given to us via LXD. Unfortunately, we're unsure currently why
you would have interfaces the first time around and then devices another
time around, which seems to be the only way to trip this up?

More information from the host machine would be required to find the
underlying bug `ip a` and a list of all the nics.

For now, this seems like it has the potential to unblock containers being
created.

## QA steps

Can you run this branch and then if you encounter the issue, give us the
results of `ip a` and all the nics?

## Bug reference

https://bugs.launchpad.net/juju/+bug/1932180
In order to improve readability of why a raft worker is restarting, the
following introduces logging error messages.

The problem isn't helped by the masked ErrStartTimeout waiting for a new
raft channel. As the catacomb is killed by the ErrStartTimeout, we loose
the original error message.

We should revisit if this is the best way to do this!
juju#13325

In order to improve the readability of why a raft worker is restarting, the
following introduces logging error messages.

The problem isn't helped by the masked ErrStartTimeout waiting for a new
raft channel. As the catacomb is killed by the ErrStartTimeout, we lose
the original error message.

## QA steps

```sh
$ snap install juju --classic
$ /snap/bin/juju bootstrap lxd test
$ go get github.com/lxc/lxd@172613bf35d31b8e788f00fe93069b8d0d19b947
$ juju upgrade-controller --build-agent
$ juju debug-log -m controller
```

You should see the following error message

```
machine-0: 16:26:49 ERROR juju.worker.raft Failed to setup raft instance, err: failed to get last log at index 52: msgpack decode error [pos 13]: invalid length of bytes for decoding time - expecting 4 or 8 or 12, got 15
```

## Bug reference

https://bugs.launchpad.net/juju/+bug/1943075
Remove unused azure network/storage imports in py test
juju#13328

When running the series-upgrade acceptance test, just ignore whatever return code comes back from rebooting the machine via SSH. It is not the part that we should be hinging test failure on.

## QA steps

- With the appropriate virtualenv activated: `(cd acceptancetests ; ./assess upgrade_series --juju $YOUR-JUJU --logging-config '<root>=DEBUG')`
- You might see this output in the test logging:
```
ERROR subprocess encountered error code 255
2021-09-13 11:38:22 INFO Ignoring `juju ssh` exit code 1 after triggering reboot
```

## Documentation changes

None.

## Bug reference

N/A
…-test

juju#13324

This PR adds a regression test for ensuring that Juju correctly retains empty `openvswitch` blocks (such as the ones injected by MAAS to composed KVM machines with OVS bridges) when merging netplan configurations.

## QA steps

The test is wired to use a dual-spaced machine with an OVS bridge on our internal finfolk MAAS.
To run the QA steps, define a MAAS cloud and point it to finfolk. Then:

```sh
$ cd tests
$ ./main.sh -v -p maas -c $finfolk-cloud-name ovs_maas
```

### Setting up a suitable machine for this test on your local MAAS instance

The following steps will allow you to prepare a machine with the right NIC devices for running the above test. The network setup is based on [this](https://openstackdevops.wordpress.com/2021/05/03/converged-networking-with-charmed-openstack/) blog post about converged networking.

Prior to running the following steps you must:
 - be running MAAS 2.9.2+.
 - define two spaces named **space1** and **space2**.

Using the MAAS UI:
1) Compose a new KVM machine (1 core, 4G RAM)
2) **Abort** the commissioning attempt for the new machine.
3) Connect to the KVM host and edit the machine definition using `virsh`. For example, if your machine is called 'foo' run:
 `virsh edit foo`.
4) Locate the XML block that describes the one and only NIC assigned to the machine. It should look something like:
 ```xml
 <interface type='network'>
 <mac address='52:54:00:24:a7:07'/>
 <source network='maas'/>
 <model type='virtio'/>
 <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
 </interface>
 ```
5) Clone this block two times (and change the **slot** attribute to **0x08** and **0x0a** respectively). Use [this](https://gist.github.com/viz3/6591201) script to generate a MAC for each new NIC.
6) Go back to the MAAS UI and commission the machine.
7) Edit the machine and attach the `ovs` tag to it.
8) In the networking tab for the machine you should now see three NICs.
 - Select the first two NICS and click the `create bond` button; leave the default values as-is and save the bond.
 - Select the bond and click the `create bridge` button. Under bridge type select `open vswitch`. Then, select the appropriate fabric / subnet values for `space1` and `Auto assign` for the IP configuration option.
 - Select the remaining NIC from the list, edit it and select the appropriate fabric/subnet values for `space2` and `Auto assign` for the IP configuration option.
9) Deploy the machine and ensure that the operation succeeds. If the machine gets stuck in `Rebooting` ask the MAAS folks for help.
Resolved Conflicts:
  caas/kubernetes/provider/bootstrap_test.go
  feature/flags.go
  scripts/win-installer/setup.iss
  snap/snapcraft.yaml
  state/pool.go
  version/version.go
  worker/uniter/relation/state_test.go
Copy link
Member

@hmlanigan hmlanigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked the files mentioned - LGTM.

@hmlanigan
Copy link
Member

$$merge$$

@jujubot jujubot merged commit a0f25a5 into juju:develop Sep 13, 2021
@achilleasa achilleasa deleted the dev-merge-2.9 branch September 14, 2021 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet