-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure cross model app proxy is removed with the offer #13349
Conversation
03daa47
to
edcab56
Compare
edcab56
to
d8fb101
Compare
I managed to get into a state where I think I did things too fast and had a disconnected SAAS.
The good news is that I could go through the steps again and the relation was re-established OK.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transaction logic here is horrific. I think this patch is OK, but I'd like another review on it.
state/applicationoffers.go
Outdated
continue | ||
} | ||
logger.Debugf("destroy consumer proxy %v for offer %v", remoteApp.Name(), op.offerName) | ||
remoteAppOps, err := remoteApp.DestroyOperation(true).Build(attempt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model operations are nice for applying transactionality to a chunk of imperative logic. Side-loading them into another slice of ops is on the smelly side of clever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it's not ideal. I'll see if I can add support for composing nested ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a change to improve the implementation. Not perfect, but better.
A major root cause for the messiness is that we don't support 2 phase commit of operations across models, And even if we did, there's still the scenario where one model has permanently gone away and you need to forcibly sanitise the other model. It's exacerbated by the less than ideal ref count numbers we use on various parent objects; we should instead be using reference objects which can be properly tracked, allowing sensible cleanup logic to be applied. As it is now, across models, without txn guarantees, we can easily get wedged due to ref counts getting out of sync and there's little option right now but to force through the clean up. The ref counting used is also somewhat forced by the limitation of the client side txns used with mongo - moving to serer side txns would allow things to be done better for sure but that's way beyond the scope of this PR. |
d686c7c
to
48a6c68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of questions and a few import ordering issues but LGTM
|
889ee1b
to
26a79ee
Compare
…nted cmr txn logic which is not needed
26a79ee
to
96b7327
Compare
|
|
#13353 When removing and re-consuming a cross model offer, it's possible that the remote proxy might be removed in the middle of the remote relation worker updating things like the macaroon to use. This PR handles that sort of case and make the worker more resilient to not found errors. Also, when a consuming saas application is dying, we ignore status updates from the offering side as these can interfere with a clean status history. On the offering side, we use "force" to destroy the consuming app proxy to ensure it is removed, otherwise it's possible a dying entitiy can remain and wedge the offering processing of events. ## QA steps Deploy a cmr scenario like in #13349 and on the consuming side, remove the saas app and check logs for errors. ## Bug reference https://bugs.launchpad.net/charm-ceph-mon/+bug/1940983
#13354 The offering model creates a proxy for the consuming app. If the consuming side does a `remove-saas` and then consumes the same offer again, the offering model might retain stale info, especially if it is down then the consuming side does a force delete. This PR uses an incrementing version on the consuming side. If the offering side gets a new connection and the version is newer than what it has, it force deletes the consuming proxy and starts again. This is analogous to what is done on the consuming side then the offer is force removed and then consumed again. This is same to do without a facade bump because the default version will be 0. ## QA steps I tested this on top of #13353 but rebased against 2.9 to make this PR. Deploy a cmr scenario like in #13349 and on the consuming side, remove the saas app and consume again. Ensure relation hooks are run on both sides of the relation. ## Bug reference https://bugs.launchpad.net/charm-ceph-mon/+bug/1940983
#13361 Merge from 2.9 to bring forward: - #13360 from wallyworld/simplestreams-compression - #13359 from manadart/2.9-lxd-container-images - #13352 from tlm/aws-instance-profile - #13358 from jujubot/increment-to-2.9.16 - #13354 from wallyworld/refresh-consume-proxy - #13353 from wallyworld/cmr-consume-fixes - #13346 from SimonRichardson/raft-api-client - #13349 from wallyworld/remove-orphaned-cmrapps - #13348 from benhoyt/fix-secretrotate-tests - #13119 from SimonRichardson/pass-context - #13342 from SimonRichardson/raft-facade - #13341 from ycliuhw/feature/quay.io Conflicts (easy resolution): - apiserver/common/crossmodel/interface.go - apiserver/errors/errors.go - apiserver/params/apierror.go - apiserver/testserver/server.go - scripts/win-installer/setup.iss - snap/snapcraft.yaml - version/version.go
Hopefully the final fix for cross model relations issues observed on site.
When an offer is removed, any remote proxies for the consuming side (in the offering model) also need to be deleted. This wasn't always happening, partly because the removal depended on there being relations in play. The logic to remove the app proxies has been moved so it is always run even if there are no relations.
Because there could be orphaned remote proxies, an upgrade step is added to delete them.
Finally, to guard against the possibility that reference counting across models might get out of sync and block relation and unit removal (in a single model things are done inside a txn but that's not possible across models), if the txn slice fails to be applied the first time, cross model relations are more forcibly removed the second time.
QA steps
I used the repro scenario from site, primarily this bundle deployed to a model called
ceph
:Then set up the relations:
You can now repeatedly
For each iteration, you van see the relation created, joined, departed, broken hooks have run:
Bug reference
https://bugs.launchpad.net/charm-ceph-mon/+bug/1940983