Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.9 into develop #13143

Merged
merged 47 commits into from Jul 8, 2021
Merged

2.9 into develop #13143

merged 47 commits into from Jul 8, 2021

Conversation

SimonRichardson
Copy link
Member

Merge 2.9 into develop

6ad00ba (upstream/2.9, origin/2.9, 2.9) Merge pull request #13142 from SimonRichardson/2.8-into-2.9-cache
f4b4ab4 Merge pull request #13140 from hpidcock/pull-secrets
185fac6 Merge pull request #13135 from wallyworld/cass-image-repo-port
2b67b0a Merge pull request #13134 from achilleasa/2.9-backport-equinix-provider-support
43b8197 (achilleasa/2.9) Merge pull request #13129 from ycliuhw/fix/lp-1934180
25e8028 Merge pull request #13128 from tlm/oauth-cred-k8s-lp1929910
7b18431 Merge pull request #13133 from SimonRichardson/pod-ready-timeout
c139c11 Merge pull request #13130 from SimonRichardson/set-config-empty-string

Conflicts:

CONFLICT (content): Merge conflict in cmd/juju/commands/bootstrap_test.go
CONFLICT (content): Merge conflict in cloudconfig/podcfg/podcfg_test.go
CONFLICT (content): Merge conflict in cloudconfig/podcfg/image.go
CONFLICT (content): Merge conflict in cloud/clouds_test.go
CONFLICT (content): Merge conflict in apiserver/facades/client/application/application_unit_test.go
CONFLICT (content): Merge conflict in apiserver/facades/agent/uniter/uniter.go

achilleasa and others added 30 commits June 22, 2021 13:37
This improves visibility in cases such as LP1930899 where
underprovisioned mongodb instance prevented log entries from being
persisted with no additional error output.

As the DB error was returned back to the caller (log sender worker), the
worker would keep restarting thus preventing models from logging
anything.

While this commit does not fix the underlying problem (underprovisioned
mongo) it should at least make the cause of the problem more obvious to
users that are trying to diagnose similar issues by inspecting the
logsink.log file contents.
…rsisting-logs-to-db-fails

juju#13097

This improves visibility in cases such as LP1930899 where
underprovisioned mongodb instance prevented log entries from being
persisted with no additional error output.

As the DB error was returned back to the caller (log sender worker), the
worker would keep restarting thus preventing models from logging
anything.

While this commit does not fix the underlying problem (underprovisioned
mongo) it should at least make the cause of the problem more obvious to
users that are trying to diagnose similar issues by inspecting the
logsink.log file contents.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1930899

(NOTE: the PR does not fix ^ but helps diagnose the root cause of the issue)
The juju ssh --proxy command uses the controller as a jumpbox and
invokes 'nc' to forward ssh traffic to the target machine. While this
approach works fine for regular machines, it fails (for substrates as
ec2) for container machines (lxd/kvm) when they only have a FAN address.

This is generally acceptable as juju only requires for machines to be
able to connect to the controller and not vice-versa. In particular, ec2
does not allow FAN connectivity from the controller model to any other
Juju model (and vice versa).

To workaround this limitation, we now first check whether the machine we
are trying to connect to is a container. If that's the case, we identify
the address of the machine that hosts the container and modify the ssh
proxy command to use a 2-level proxying scheme: we ssh to the
controller, then ssh to the machine hosting the container and then run
'nc' to route the ssh traffic to the intended machine.
…oxy-to-fan-subnets

juju#13108

The juju ssh --proxy command uses the controller as a jumpbox and
invokes 'nc' to forward ssh traffic to the target machine. While this
approach works fine for regular machines, it fails (for substrates as
ec2) for container machines (lxd/kvm) when they only have a FAN address.

This is generally acceptable as juju only requires for machines to be
able to connect to the controller and not vice-versa. In particular, ec2
does not allow FAN connectivity from the controller model to any other
Juju model (and vice versa).

To workaround this limitation, we now first check whether the machine we
are trying to connect to is a container. If that's the case, we identify
the address of the machine that hosts the container and modify the ssh
proxy command to use a 2-level proxying scheme: we ssh to the
controller, then ssh to the machine hosting the container and then run
'nc' to route the ssh traffic to the intended machine.

Caveats: this workaround cannot be applied in a scenario where we 
`juju scp` to copy files between two different **container remotes**
as each remote in this case would need a custom ssh proxy command.

## QA steps

Bootstrap to ec2. Then:

```sh
$ juju deploy mysql --to lxd

# should fail
$ juju ssh mysql/0
ERROR cannot connect to any address: [252.40.132.122:22]

# should work with this patch
$ juju ssh --proxy mysql/0 hostname
juju-12d453-0-lxd-0
Connection to 252.40.132.122 closed.

# try scp
$ echo "BTC" > wallet
$ juju scp --proxy -- -3 wallet mysql/0:
$ juju scp --proxy -- -3 wallet 0:

# Verify the files have been uploaded
$ juju ssh --proxy 0 cat wallet
BTC
Connection to 172.31.40.132 closed.

$ juju ssh --proxy 0/lxd/0 cat wallet
BTC
Connection to 252.40.132.122 closed.
```

## Bug reference
https://bugs.launchpad.net/juju/+bug/1932547
The following changes allow the charm config as a map and the charm
config as a yaml to work as intended. The code previously assumed that
there was always a yaml and that the charm config map would always
overwrite it.
The problem occurs when there isn't a charm yaml and it would blindly
overwrite values of the yaml. This caused empty strings to be converted
to nil, which then caused the --reset flow to be triggered. The --reset
flow essentially says, use the config defaults when you see nil, by
deleting any user config. That's something that we didn't want to
trigger. The change causes us to exit early if there isn't a yaml.

There is a much deeper problem and one that we will need to understand.
API versioning should essentially not touch the old code OR at the very
least have enough testing to ensure that any new changes keeps the old
behaviour. 100% code coverage in these scenarios would be beneficial to
ensure that we didn't trip up here. The problem with attempting to get
to that level, is the amount of levels you have to test to just get that
far. From API server all the way down to state, it's a large task and
maybe not worth it?
To correctly parse other values that aren't strings, we should parse via
the config using ParseSettingsStrings.
…tring

juju#13130

The following changes allow the charm config as a map and the charm
config as a YAML to work as intended. The code previously assumed that
there was always a YAML and that the charm config map would always
overwrite it.

The problem occurs when there isn't a charm YAML and it would blindly
overwrite values. This caused empty strings to be converted
to `nil`, causing the --reset flow to be triggered. The --reset
flow sets the config defaults when it sees nil.

There is a much deeper problem and one that we will need to understand.
API versioning should essentially not touch old code _or_ at the very
least have enough testing to ensure that any new changes keeps the old
behaviour. 100% code coverage in these scenarios would be beneficial to
ensure that we didn't trip up here. The problem with attempting to get
to that level, is the amount of levels you have to test to just get that
far. From API server all the way down to state, it's a large task and
maybe not worth it?

## QA steps

Save bundle.yaml

```sh
cat >bundle.yaml<<EOF
 applications:
 haproxy:
 charm: cs:haproxy
 channel: stable
 options:
 services: ""
EOF
```

```sh
$ juju bootstrap lxd test
$ juju deploy ./bundle.yaml
$ juju config haproxy services

$ juju config haproxy services=""
WARNING the configuration setting "services" already has the value ""
$ juju config haproxy services


```

Notice that the default config shouldn't be shown.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1934151
The following moves the timeout deadline from 1 minute to 10 minutes. If
a pod hasn't become ready in that time, I think it's safe to say it
won't ever!

The change is simple, just add a longer duration.
juju#13133

The following moves the timeout deadline from 1 minute to 10 minutes. If
a pod hasn't become ready in that time, I think it's safe to say it
won't!

The change is simple, just add a longer duration.

## QA steps

See: https://discourse.charmhub.io/t/error-getting-started-with-juju-microk8s/3855/3

## Bug reference

https://bugs.launchpad.net/juju/+bug/1934494
This change updates the Juju facades to return Kubernetes credential
that pre 2.9 releases can understand. This has been done through a
facade bump for all users of the CloudSpec facade.
juju#13128

OAuth 2.8 credentials support in 2.9.

This change updates the Juju facades to return Kubernetes credential that a pre 2.9 releases can understand. This has been done through a facade bump for all users of the CloudSpec facade.


## Checklist

 - [x] Requires a [pylibjuju](https://github.com/juju/python-libjuju) change
 - [x] Added [integration tests](https://github.com/juju/juju/tree/develop/tests) for the PR
 - [x] Added or updated [doc.go](https://discourse.jujucharms.com/t/readme-in-packages/451) related to packages changed
 - [x] Comments answer the question of why design decisions were made

## QA steps

1. Deploy a Juju 2.8 controller with a new model and deploy a workload.
2. Upgrade the controller to this PR
3. Check that the workload agent and support components have no errors and are communicating with the Kubernetes api.
4. Check controller logs for errors

## Bug reference

https://bugs.launchpad.net/juju/+bug/1929910
juju#13129

*Delete cluster role bindings before creating and ensure the resource has been deleted completely before creating;*

Driveby: refactor all caas code to use client-go pointer helper functions;

## Checklist

 - [ ] ~Requires a [pylibjuju](https://github.com/juju/python-libjuju) change~
 - [ ] ~Added [integration tests](https://github.com/juju/juju/tree/develop/tests) for the PR~
 - [ ] ~Added or updated [doc.go](https://discourse.jujucharms.com/t/readme-in-packages/451) related to packages changed~
 - [x] Comments answer the question of why design decisions were made

## QA steps

```console
$ juju deploy /tmp/charm-builds/mariadb-k8s/ --debug --resource mysql_image=mariadb

$ yml2json /tmp/charm-builds/mariadb-k8s/reactive/k8s_resources.yaml| jq '.kubernetesResources.serviceAccounts'
[
 {
 "name": "rbac-foo",
 "roles": [
 {
 "rules": [
 {
 "apiGroups": [
 ""
 ],
 "verbs": [
 "get",
 "watch",
 "list"
 ],
 "resources": [
 "pods"
 ]
 }
 ],
 "global": true,
 "name": "pod-cluster-role"
 }
 ],
 "automountServiceAccountToken": true
 }
]

$ mkubectl -nt1 get clusterrolebinding.rbac.authorization.k8s.io/rbac-foo-t1-pod-cluster-role -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
 annotations:
 controller.juju.is/id: 1bb4c340-1536-4281-815b-ad893b5fcd73
 model.juju.is/id: b3bcd724-b4e0-4bb1-80e2-af346cbba787
 creationTimestamp: "2021-07-02T06:11:17Z"
 labels:
 app.juju.is/created-by: controller
 app.kubernetes.io/managed-by: juju
 app.kubernetes.io/name: mariadb-k8s
 model.juju.is/name: t1
 managedFields:
 - apiVersion: rbac.authorization.k8s.io/v1
 fieldsType: FieldsV1
 fieldsV1:
 f:metadata:
 f:annotations:
 .: {}
 f:controller.juju.is/id: {}
 f:model.juju.is/id: {}
 f:labels:
 .: {}
 f:app.kubernetes.io/managed-by: {}
 f:app.kubernetes.io/name: {}
 f:model.juju.is/name: {}
 f:roleRef:
 f:apiGroup: {}
 f:kind: {}
 f:name: {}
 f:subjects: {}
 manager: juju
 operation: Update
 time: "2021-07-02T06:11:17Z"
 name: rbac-foo-t1-pod-cluster-role
 resourceVersion: "166248"
 selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/rbac-foo-t1-pod-cluster-role
 uid: ace7f3f1-6d2f-4cd3-aca1-b9211126d1d0
roleRef:
 apiGroup: rbac.authorization.k8s.io
 kind: ClusterRole
 name: t1-pod-cluster-role
subjects:
- kind: ServiceAccount
 name: rbac-foo
 namespace: t1

# change podspec a bit then build and upgrade charm
$ juju upgrade-charm mariadb-k8s --path /tmp/charm-builds/mariadb-k8s/ --debug --resource mysql_image=mariadb

$ mkubectl -nt1 get clusterrolebinding.rbac.authorization.k8s.io/rbac-foo-t1-pod-cluster-role -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
 annotations:
 controller.juju.is/id: 1bb4c340-1536-4281-815b-ad893b5fcd73
 model.juju.is/id: b3bcd724-b4e0-4bb1-80e2-af346cbba787
 creationTimestamp: "2021-07-02T06:12:30Z"
 labels:
 app.juju.is/created-by: controller
 app.kubernetes.io/managed-by: juju
 app.kubernetes.io/name: mariadb-k8s
 model.juju.is/name: t1
 managedFields:
 - apiVersion: rbac.authorization.k8s.io/v1
 fieldsType: FieldsV1
 fieldsV1:
 f:metadata:
 f:annotations:
 .: {}
 f:controller.juju.is/id: {}
 f:model.juju.is/id: {}
 f:labels:
 .: {}
 f:app.kubernetes.io/managed-by: {}
 f:app.kubernetes.io/name: {}
 f:model.juju.is/name: {}
 f:roleRef:
 f:apiGroup: {}
 f:kind: {}
 f:name: {}
 f:subjects: {}
 manager: juju
 operation: Update
 time: "2021-07-02T06:12:30Z"
 name: rbac-foo-t1-pod-cluster-role
 resourceVersion: "166401"
 selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/rbac-foo-t1-pod-cluster-role
 uid: 6ccb09f0-f931-408a-a478-f2c7a5ad817c
roleRef:
 apiGroup: rbac.authorization.k8s.io
 kind: ClusterRole
 name: t1-pod-cluster-role
subjects:
- kind: ServiceAccount
 name: rbac-foo
 namespace: t1

```

## Documentation changes

No

## Bug reference

https://bugs.launchpad.net/juju/+bug/1934180
This commit adds EM API entrypoint to the cloud package and it generates
the updated go file.

Signed-off-by: root <jasmin.gacic@gmail.com>
Signed-off-by: Gianluca Arbezzano <ciao@gianarb.it>
This commit contains the struct and functions needed to correctly hook
Equinix Metal API authentication to Juju

Signed-off-by: root <jasmin.gacic@gmail.com>
This is an empty stub for the Equinix Metal Environ and instance
implementation. It
returns panics and notImplemented errors

Signed-off-by: Gianluca Arbezzano <ciao@gianarb.it>
 This commit contains the device implementation for Equinix Metal Device
This commit implement a couple of functions needed to bridge JuJu
requirements against the Equinix Metal API.
This commit contains the logic to render a valid User Data for Equinix
Metal device
This commit bridges the Equinix Metal API to JuJu environ interface
This commit hooks the Equinix Metal provider to gocheck testsuite.

Signed-off-by: Gianluca Arbezzano <ciao@gianarb.it>
This commit contains the init file to register the Equinix Metal
provider to JuJu

Signed-off-by: root <jasmin.gacic@gmail.com>
Signed-off-by: Gianluca Arbezzano <ciao@gianarb.it>
This commit adds supports for juju subnet for the Equinix Metal
provider.

Signed-off-by: root <jasmin.gacic@gmail.com>
gianarb and others added 16 commits July 5, 2021 17:03
Allow the disabling of public ip addresses if the constraint is set.
This is a workaround for lack of a firewall for now.
…ovider-support

juju#13134

This PR backports juju#12983 to 2.9 so it can be made available as part of the next point release.

## QA steps

```console
# clone this branch and compile binaries
$ cp cloud/fallback-public-cloud.yaml ~/.local/share/juju/public-clouds.yaml
# This command shows Equinix as cloud provider now
$ juju clouds --client --all
# Add credentials
$ juju add-credential equinix
# bootstrap the controller
$ juju bootstrap equinix/am test-equinix 
```

At this point should be juju business as usual, what I do is enable-ha and check controller status with:

```console
$ juju enable-ha
$ juju status -m controller
```
The cache would fail to notify if you changed from a value and back to
the original value. This is because the underlying hash value was never
written. This has two consequences:

 1. The cache could never emit a change of the original value
 2. Because we never cached the new value, if the new value was the same
 as the old one, we would still emit the event.

The fix is easy, just save the hash after the notify and we can have
better cache coherency.
juju#13135

A small fix for an issue found by QA folks when setting up a custom docker image repo. If the repo URL configured using `caas-image-repo` has a port, the parsing fails.

## QA steps

There's a unit test which ensures the expected behaviour. There's no 2.9.8 images but do this

` juju bootstrap microk8s test --config caas-image-repo=docker.io:5000/jujusolutions`

and it won't find the image but you can check the correct image URL:

```
microk8s.kubectl -n controller-test get pod/controller-0 -o json | jq .spec.containers[0].image
"docker.io:5000/jujusolutions/jujud-operator:2.9.8"
```

Also bootstrap a standard microk8s controller to regression test default behaviour.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1934707
juju#13137

The cache would fail to notify if you changed from a value and back to
the original value (flip flopping). This is because the underlying hash value 
was never written. 

This has two consequences:

 1. The cache could never emit a change of the original value
 2. Because we never cached the new value, if the new value was the same
 as the old one, we would still emit the event.

The fix is easy, just save the hash after the notify and we can have
better cache coherency.

## QA steps

If you change the model-config logging-config to a new value and back to the 
old value, it should now correctly go back to the old value. Flip-flopping of values 
in the logging-config exposes this relatively easily.

```sh
$ juju bootstrap lxd test
$ juju model-config logging-config="<root>=WARN"
$ juju model-config logging-config="<root>=DEBUG"
```

View `$ juju debug-log -m controller` to ensure that the flip-flop happens.
juju#13140

Adds image pull secrets for sidecar charms with private images such as from charmhub.

## QA steps

Deploy charm with private repo images.
Check image pull secrets exist and images pull.
Refresh charm with public repo images.
Check image pull secrets are cleaned up.

## Documentation changes

N/A

## Bug reference

https://bugs.launchpad.net/juju/+bug/1934416
juju#13142

Merge forward 2.8 into 2.9

2ecb859 (upstream/2.8, origin/2.8, 2.8) Merge pull request juju#13137 from SimonRichardson/cache-coherence
c973d15 (achilleasa/2.8) Merge pull request juju#13108 from achilleasa/2.8-support-ssh-with-proxy-to-fan-subnets
f505b12 Merge pull request juju#13097 from achilleasa/2.8-logsink-error-if-persisting-logs-to-db-fails

Conflicts:

CONFLICT (content): Merge conflict in cmd/juju/commands/ssh_unix_test.go
CONFLICT (content): Merge conflict in cmd/juju/commands/ssh_machine.go
CONFLICT (content): Merge conflict in cmd/juju/commands/ssh.go
CONFLICT (content): Merge conflict in cmd/juju/commands/scp.go
SetApplicationsConfig has been removed in preference to SetConfigs, so
the test are no longer required.
2.9 introduced different return types for GetControllerImagePath and I
missed that in the merge, this fixes that.
@SimonRichardson
Copy link
Member Author

!!build!!

Copy link
Member

@manadart manadart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need to update tests for the anticipated Mongo version.

13:02:34 FAIL: podcfg_test.go:75: podcfgSuite.TestOperatorImagesCustomRepo
13:02:34 
13:02:34 podcfg_test.go:90:
13:02:34     c.Assert(podConfig.GetJujuDbOCIImagePath(), gc.Equals, "path/to/my/repo/juju-db:4.0")
13:02:34 ... obtained string = "path/to/my/repo/juju-db:4.4"
13:02:34 ... expected string = "path/to/my/repo/juju-db:4.0"
13:02:34 
13:02:34 
13:02:34 ----------------------------------------------------------------------
13:02:34 FAIL: podcfg_test.go:58: podcfgSuite.TestOperatorImagesDefaultRepo
13:02:34 
13:02:34 podcfg_test.go:72:
13:02:34     c.Assert(podConfig.GetJujuDbOCIImagePath(), gc.Equals, "jujusolutions/juju-db:4.0")
13:02:34 ... obtained string = "jujusolutions/juju-db:4.4"
13:02:34 ... expected string = "jujusolutions/juju-db:4.0"

I have no idea why this wasn't picked up, probably a CI day activity.
For now fix this so we can land it.
@SimonRichardson
Copy link
Member Author

$$merge$$

2 similar comments
@SimonRichardson
Copy link
Member Author

$$merge$$

@SimonRichardson
Copy link
Member Author

$$merge$$

@jujubot jujubot merged commit 4ae860b into juju:develop Jul 8, 2021
@SimonRichardson SimonRichardson deleted the 2.9-into-develop-cache branch July 9, 2021 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants