Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.8 into 2.9 #12456

Merged
merged 17 commits into from Dec 14, 2020
Merged

2.8 into 2.9 #12456

merged 17 commits into from Dec 14, 2020

Conversation

jameinel
Copy link
Member

This is a merge of 2.8 into 2.9 bringing in several of the 2.8 bugfixes.

Bringing in:

 * Merge pull request #12447 from jameinel/2.8-statuses-upgrade-1907685
 * Merge pull request #12451 from benhoyt/fix-centos-tests
 * Merge pull request #12435 from tlm/lp-1905320 (wrong text in juju
   help bootstrap)
 * Merge pull request #12431 from achilleasa/2.8-set-unit-error-state-when-relation-created-hooks-fail
 * Merge pull request #12429 from manadart/2.8-housekeeping
 * Merge pull request #12416 from tlm/lp-1905320 (juju help bootstrap)

Checklist

  • Requires a pylibjuju change
  • Added integration tests for the PR
  • Added or updated doc.go related to packages changed
  • Comments answer the question of why design decisions were made

QA steps

See individual PRs.

Documentation changes

See individual PRs.

Bug reference

tlm and others added 17 commits December 9, 2020 14:22
- Adds help information for bootstrapping to unknown k8s cluster around
  setting the correct service type.
- Adds a warning message for the user during bootstrap for when the
  above condition happens
juju#12416

- Adds help information for bootstrapping to unknown k8s cluster around
 setting the correct service type.
- Adds a warning message for the user during bootstrap for when the
 above condition happens

## Checklist

 - [x] Requires a [pylibjuju](https://github.com/juju/python-libjuju) change
 - [x] Added [integration tests](https://github.com/juju/juju/tree/develop/tests) for the PR
 - [x] Added or updated [doc.go](https://discourse.jujucharms.com/t/readme-in-packages/451) related to packages changed
 - [x] Comments answer the question of why design decisions were made

## QA steps

- Run juju help bootstrap
- Bootstrap to a minikube cluster

## Bug reference

https://bugs.launchpad.net/juju/+bug/1905320
juju#12429

Small patch eked out while looking at other things. It just removes some unused test code and checks some test errors.

## QA steps

Run unit tests.
Wrong values are using in Bootstrap text for controller service type.
…e-when-relation-created-hooks-fail

juju#12431

The way that the uniter worker handles RunHook errors is by dropping errors to the floor in the resolver main loop and then entering the loop once again under the assumption that the resolver will check the **local state** for a pending RunHook operation (Operation == RunHook and Step == Pending) and move the unit into an error state.

This design assumes that the individual resolvers check that the current local state is in `Continue` mode (meaning that each resolver should go ahead and figure out if it needs to emit an operation) and return `ErrNoOperation` otherwise so that the main resolver can eventually apply the above logic and set the required error state.

Unfortunately, this particular check was not applied by the recently (2.8) added CreatedRelation resolver. As a result, if a `xxx-relation-created` hook failed, the uniter would keep invoking the aforementioned resolver which would happily return a `xxx-relation-created` hook operation causing the unit to get stuck in an infinite loop.

This PR addresses this problem thus allowing the unit to reach an error state as expected. 

## QA steps

*Please replace with how we can verify that the change works.*

```sh
# apply this patch
diff --git a/acceptancetests/repository/charms/ubuntu/metadata.yaml b/acceptancetests/repository/charms/ubuntu/metadata.yaml
index 1846048417..e5b7b19648 100644
--- a/acceptancetests/repository/charms/ubuntu/metadata.yaml
+++ b/acceptancetests/repository/charms/ubuntu/metadata.yaml
@@ -12,4 +12,6 @@ series:
 - bionic
 - eoan
 - focal
-
+peers:
+ peer:
+ interface: http


# Then create this file with the specified contents
$ cat acceptancetests/repository/charms/ubuntu/hooks/peer-relation-created
#!/bin/bash
set -e
# This should fail when executed by a non-leader unit!
relation-set -r $JUJU_RELATION_ID --app foo=bar

# Finally, run the following command and check that the non-leader unit reaches an error state
$ juju deploy -n2 ./acceptancetests/repository/charms/ubuntu

$ juju status
Model Controller Cloud/Region Version SLA Timestamp
default test lxd-with-cache/default 2.8.7.1 unsupported 18:33:58Z

App Version Status Scale Charm Store Rev OS Notes
ubuntu error 2 ubuntu local 1 ubuntu

Unit Workload Agent Machine Public address Ports Message
ubuntu/0* unknown idle 0 10.59.233.117
ubuntu/1 error idle 1 10.59.233.252 hook failed: "peer-relation-created" <---- look for this

Machine State DNS Inst id Series AZ Message
0 started 10.59.233.117 juju-be2af9-0 bionic Running
1 started 10.59.233.252 juju-be2af9-1 bionic Running
```

## Bug reference
https://bugs.launchpad.net/juju/+bug/1906706
On Prodstack, we have 160k statuses documents (which seems too big but is
a different issue.)

This at least should allow upgrades to progress, even if it dies in the
middle, it won't have to load as many documents the next time.
juju#12435

Wrong values are using in Bootstrap text for controller service type.

## Checklist

 - [x] Requires a [pylibjuju](https://github.com/juju/python-libjuju) change
 - [x] Added [integration tests](https://github.com/juju/juju/tree/develop/tests) for the PR
 - [x] Added or updated [doc.go](https://discourse.jujucharms.com/t/readme-in-packages/451) related to packages changed
 - [x] Comments answer the question of why design decisions were made

## QA steps

Bootstrap to a kubernetes cluster using the bootstrap examples.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1905320
Rather than have one 140k record transaction, have lots of smaller ones.
Prodstack has shown that having a giant transaction breaks things.
Accidentally using spaces vs tabs.
juju#12451

@manadart [pointed out](juju#12341 (comment)) that PR juju#12341 broke the unit tests on CentOS. Turned out to be a trivial logic bug that I'd introduced. Typed `if err != nil` one too any times, I guess. :-)

I've tested this locally by using gohack on `github.com/juju/os/v2` and overriding `hostSeries()` to return `centos8` -- breaks before the fix and works fine after, so hopefully it'll fix CentOS CI.
juju#12447

On Prodstack, we have 160k statuses documents (which seems too big but is
a different issue.)

This at least should allow upgrades to progress, even if it dies in the
middle, it won't have to load as many documents the next time.

## Checklist

 - [ ] Requires a [pylibjuju](https://github.com/juju/python-libjuju) change
 - [ ] Added [integration tests](https://github.com/juju/juju/tree/develop/tests) for the PR
 - [ ] Added or updated [doc.go](https://discourse.jujucharms.com/t/readme-in-packages/451) related to packages changed
 - [ ] Comments answer the question of why design decisions were made

## QA steps

I was able to get a copy of the statuses.bson that was in prodstack (160k statuses). However, it appears that we upgrade each model's status docs one model at a time. And we won't upgrade status documents that aren't in models that we know about. So we had to massage the data to match our current models. Also, there will be some txns that no longer exist, so we need to mgopurge them.

I then did:
```
$ sudo snap install juju --channel=2.7/stable
$ /snap/bin/juju bootstrap lxd lxd
$ juju switch controller
$ juju scp ./statuses.bson 0:.
$ juju ssh 0
$ agent=$(cd /var/lib/juju/agents; echo machine-*)
$ pw=$(sudo grep statepassword /var/lib/juju/agents/${agent}/agent.conf | cut '-d ' -sf2)
$ sudo systemctl stop jujud-machine-0
$ mongorestore --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates --host localhost --port 37017 --db juju -c statuses ./statuses.bson
...
2020-12-14T15:03:47.276+0000 finished restoring juju.statuses (161023 documents)
...
$ mongo --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/juju

juju:PRIMARY> db.statuses.find({"neverset": {$exists:1}}).count()
161023
juju:PRIMARY> db.statuses.count()
161038
juju:PRIMARY> target_uuid = db.models.find().limit(1).next()._id
# find the biggest models
juju:PRIMARY> db.statuses.aggregate([{$group: {_id: {uuid: "$model-uuid"}, count: {$sum: 1}}}, {$sort: {count:-1}}])
{ "_id" : { "uuid" : "3159938e-bdb4-461c-8891-6a2cfc3d4e6a" }, "count" : 47373 }
{ "_id" : { "uuid" : "844969a0-e800-4047-887e-70119d1a0b82" }, "count" : 44435 }
...
juju:PRIMARY> db.statuses.find({"model-uuid": "3159938e-bdb4-461c-8891-6a2cfc3d4e6a"}).forEach(function(doc){
 var cp = doc; cp["model-uuid"] = target_uuid;
 cp._id = target_uuid + doc._id.substring(36);
 db.statuses.insert(cp);
})
juju:PRIMARY> db.statuses.find({"model-uuid": "844969a0-e800-4047-887e-70119d1a0b82"}).forEach(function(doc){
 var cp = doc; cp["model-uuid"] = target_uuid;
 cp._id = target_uuid + doc._id.substring(36);
 db.statuses.insert(cp);
})
# confirm the existing model is now the biggest
juju:PRIMARY> db.statuses.aggregate([{$group: {_id: {uuid: "$model-uuid"}, count: {$sum: 1}}}, {$sort: {count:-1}}])
{ "_id" : { "uuid" : "f92733d6-a9c3-4d90-8a78-f995d23b0085" }, "count" : 91814 }
{ "_id" : { "uuid" : "3159938e-bdb4-461c-8891-6a2cfc3d4e6a" }, "count" : 47373 }
{ "_id" : { "uuid" : "844969a0-e800-4047-887e-70119d1a0b82" }, "count" : 44435 }
...
$ sudo snap install mgopurge
$ mgopurge --username ${agent} --password ${pw} --stages purgemissing,resume
$ sudo systemctl start jujud-machine-0
$ juju upgrade-controller --build-agent
```

With this patch in juju debug-log you should see statements like:
```
2020-12-14 17:12:19 INFO juju.state.upgrade upgrades.go:2950 flushing updated 2001 statuses (2001 total)
...
2020-12-14 17:29:25 INFO juju.state.upgrade upgrades.go:2964 updating 1769 statuses (91814 total)
```

## Documentation changes

None.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1907685
This brings in:
 * Merge pull request juju#12447 from jameinel/2.8-statuses-upgrade-1907685
 * Merge pull request juju#12451 from benhoyt/fix-centos-tests
 * Merge pull request juju#12435 from tlm/lp-1905320 (wrong text in juju
   help bootstrap)
 * Merge pull request juju#12431 from achilleasa/2.8-set-unit-error-state-when-relation-created-hooks-fail
 * Merge pull request juju#12429 from manadart/2.8-housekeeping
 * Merge pull request juju#12416 from tlm/lp-1905320 (juju help bootstrap)
@wallyworld
Copy link
Member

$$merge$$

@jujubot jujubot merged commit 6bd766c into juju:2.9 Dec 14, 2020
jujubot added a commit that referenced this pull request Dec 17, 2020
#12472

This is a merge from 2.9 into develop.

This wasn't a clean merge:
 - version numbers conflict: moved to 3.0-beta1
 - conflict around where caas/metadata.go file was located, this wasn't picked up in the merge but a build after-wards.

c1cb4aa (upstream/2.9, origin/2.9, 2.9) Merge pull request #12471 from wallyworld/merge-2.8-20201217a
551e1be Merge pull request #12469 from jujubot/increment-to-2.9-rc4
c5305d8 Merge pull request #12468 from wallyworld/merge-2.8-20201217
0f9a029 Merge pull request #12465 from SimonRichardson/charmhub-hide-local-schema-in-errors-messages
4b2d0f8 Merge pull request #12271 from SimonRichardson/wait-for-summary
2002602 Merge pull request #12461 from SimonRichardson/charmhub-bundle-descriptions
b4037cb Merge pull request #12460 from hmlanigan/bundle-output
97933e4 Merge pull request #12455 from SimonRichardson/charmhub-ensure-charmorigin-output
9f56bc9 Merge pull request #12457 from hmlanigan/unit-resource-doc-not-local
9c119e3 (manadart/2.9) Merge pull request #12448 from SimonRichardson/acceptance-tests-architecture
5b260db Merge pull request #12449 from hmlanigan/charmhub-no-all
236efd7 Merge pull request #12458 from hmlanigan/identify-charms-deployed
6bd766c Merge pull request #12456 from jameinel/2.8-into-2.9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants