APIServer: Raft context #13342

SimonRichardson · 2021-09-17T16:25:16Z

The raft context is a way to mediate between the RaftOpQueue with a
worker listening on the other end and a way for facades to enqueue
commands onto the queue. It's done like this, so it doesn't expose the
raft instance arbitrarily to the facades, limiting abuse and side effects.

Additionally, having a queue in between the instance and the facade
allows us to apply back pressure and deadlines to enqueuing.

The code is simple, but very mechanical, as we have to thread the
queue from the machine manifolds all the way to the facade context.

QA steps

$ juju bootstrap lxd test
$ juju model-config -m controller logging-config="<root>=INFO;juju.worker.lease.raft=TRACE;juju.core.raftlease=TRACE;juju.worker.globalclockupdater.raft=TRACE;juju.worker.raft=TRACE"
$ juju enable-ha
$ juju debug-log -m controller

Adds a new error type - Not Leader to the api server error types. This will be used to tell the client that it can not process the raft application without trying to move the request to the leader. Additionally, I've moved the error types to a new file to make it a lot easier to understand the code.

The raft context is a way to mediate between the RaftOpQueue with a worker listening on the other end and a way for facades to enqueue commands onto the queue. It's done like this, so it doesn't expose the raft instance arbitrary to the facades, limiting abuse and side effects. Additionally, having a queue in between the instance and the facade allows us to apply back pressure and deadlines to enqueuing. The code is simple, but _very_ mechanical, as we have to thread the queue from the machine manifolds all the way to the facade context.

The facade will interact with the underlying raft queue that is exposed via the facade context. The facade is ultra lightweight as we've designed the facade to not have access to the underlying raft instance.

achilleasa · 2021-09-22T08:28:39Z

apiserver/allfacades.go

@@ -319,6 +320,9 @@ func AllFacades() *facade.Registry {

 	reg("ProxyUpdater", 1, proxyupdater.NewFacadeV1)
 	reg("ProxyUpdater", 2, proxyupdater.NewFacadeV2)
+


Why have extra whitespace here?

Readability more than anything.

apiserver/apiserver_test.go

apiserver/errors/errors.go

apiserver/facades/controller/raftlease/facade.go

achilleasa · 2021-09-22T08:56:29Z

apiserver/facades/controller/raftlease/facade.go

+// topic).
+func (facade *Facade) ApplyLease(args params.LeaseOperations) (params.ErrorResults, error) {
+	results := make([]params.ErrorResult, len(args.Operations))
+	for k, op := range args.Operations {


As the operations are applied sequentially, if the leader change while an operation is being applied you may end up in a situation where ops [0-m] are applied and (m, n] fail.

It could be worth investigating whether we could work around this issue by creating a batch apply command which essentially is a list of the individual op.Command byte-streams and applying that in one go.

We can, but I wasn't going to do it for this PR. As we don't consolidate operations yet.

achilleasa · 2021-09-22T08:58:14Z

apiserver/params/apierror.go

@@ -395,3 +396,7 @@ func IsCodeCloudRegionRequired(err error) bool {
 func IsCodeQuotaLimitExceeded(err error) bool {
 	return ErrCode(err) == CodeQuotaLimitExceeded
 }
+
+func IsCodeNotLeader(err error) bool {


I really think we should invest some time to create a juju-dev-helper tool to automate scaffolding work for repeated tasks. E.g

juju-dev-helper add-server-error
juju-dev-helper add-facade
...

apiserver/raft.go

The following is just a mechanical change, moving the queue from worker to core. As rightly pointed out, importing workers from the apiserver seems a little bit off.

The rehydration of not leader error didn't need to jump through if statements to check for values, instead we can do it in one line.

As logging parameters aren't lazy, we have to check if trace is enabled before we try and trace.

The following handles the not leader error differently from a normal error. If we get a not leader error, we should stop processing any more leases and mark all subsequent errors as the same and then exit out. Normal errors, we do want to keep iterating on, as we just don't if we can bail out early unless we have more typed error cases.

achilleasa

LGTM

SimonRichardson · 2021-09-22T13:37:45Z

$$merge$$

#13346 ~~**Requires #13342 to land first**~~ The following adds a raftlease client. The aim is to provide a robust way of sending requests whilst being honest about the error messages that fail. In particular, we drop requested commands when we don't have a connection and it's up to the callee to retry again. Previous implementations just dropped them on the floor. ## QA steps Current tests pass. As this hasn't been turned on yet, we just need to ensure that existing implementations work. ```sh $ juju bootstrap lxd test $ juju model-config -m controller logging-config="<root>=INFO;juju.worker.lease.raft=TRACE;juju.core.raftlease=TRACE;juju.worker.globalclockupdater.raft=TRACE;juju.worker.raft=TRACE" $ juju enable-ha $ juju debug-log -m controller ```

manadart · 2021-09-24T12:04:11Z

apiserver/facades/controller/raftlease/facade.go

+// If no information is supplied, it is expected that the client performs their
+// own algorithm to locate the leader (roundrobin or listen to the apidetails
+// topic).
+func (facade *Facade) ApplyLease(args params.LeaseOperations) (params.ErrorResults, error) {


ApplyLeaseOp might be a better name. A naive glace at this might suggest a lease claim, but they can be claims/extensions/revocations.

#13361 Merge from 2.9 to bring forward: - #13360 from wallyworld/simplestreams-compression - #13359 from manadart/2.9-lxd-container-images - #13352 from tlm/aws-instance-profile - #13358 from jujubot/increment-to-2.9.16 - #13354 from wallyworld/refresh-consume-proxy - #13353 from wallyworld/cmr-consume-fixes - #13346 from SimonRichardson/raft-api-client - #13349 from wallyworld/remove-orphaned-cmrapps - #13348 from benhoyt/fix-secretrotate-tests - #13119 from SimonRichardson/pass-context - #13342 from SimonRichardson/raft-facade - #13341 from ycliuhw/feature/quay.io Conflicts (easy resolution): - apiserver/common/crossmodel/interface.go - apiserver/errors/errors.go - apiserver/params/apierror.go - apiserver/testserver/server.go - scripts/win-installer/setup.iss - snap/snapcraft.yaml - version/version.go

achilleasa self-requested a review September 17, 2021 16:27

SimonRichardson force-pushed the raft-facade branch from 4b8e498 to 345721e Compare September 20, 2021 10:34

SimonRichardson force-pushed the raft-facade branch from 345721e to 6d587d1 Compare September 20, 2021 11:05

hpidcock added the 2.9 label Sep 20, 2021

Add raft facade

6df384a

The facade will interact with the underlying raft queue that is exposed via the facade context. The facade is ultra lightweight as we've designed the facade to not have access to the underlying raft instance.

SimonRichardson force-pushed the raft-facade branch from 8440419 to 6df384a Compare September 21, 2021 09:35

SimonRichardson mentioned this pull request Sep 21, 2021

Raft API Client #13346

Merged

achilleasa reviewed Sep 22, 2021

View reviewed changes

SimonRichardson added 4 commits September 22, 2021 11:50

Raft Queue: Move from worker to core

f1f0540

The following is just a mechanical change, moving the queue from worker to core. As rightly pointed out, importing workers from the apiserver seems a little bit off.

Clean up the rehydration for not leader error

521b878

The rehydration of not leader error didn't need to jump through if statements to check for values, instead we can do it in one line.

Logout trace if trace is enabled

9c9f95f

As logging parameters aren't lazy, we have to check if trace is enabled before we try and trace.

achilleasa approved these changes Sep 22, 2021

View reviewed changes

jujubot merged commit a282d03 into juju:2.9 Sep 22, 2021

SimonRichardson deleted the raft-facade branch September 22, 2021 14:08

manadart reviewed Sep 24, 2021

View reviewed changes

manadart mentioned this pull request Sep 27, 2021

Merge 2.9 into develop #13361

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APIServer: Raft context #13342

APIServer: Raft context #13342

SimonRichardson commented Sep 17, 2021

achilleasa Sep 22, 2021

SimonRichardson Sep 22, 2021

achilleasa Sep 22, 2021

SimonRichardson Sep 22, 2021

achilleasa Sep 22, 2021

SimonRichardson Sep 22, 2021

achilleasa left a comment

SimonRichardson commented Sep 22, 2021

manadart Sep 24, 2021

		@@ -319,6 +320,9 @@ func AllFacades() *facade.Registry {

		reg("ProxyUpdater", 1, proxyupdater.NewFacadeV1)
		reg("ProxyUpdater", 2, proxyupdater.NewFacadeV2)

APIServer: Raft context #13342

APIServer: Raft context #13342

Conversation

SimonRichardson commented Sep 17, 2021

QA steps

achilleasa Sep 22, 2021

Choose a reason for hiding this comment

SimonRichardson Sep 22, 2021

Choose a reason for hiding this comment

achilleasa Sep 22, 2021

Choose a reason for hiding this comment

SimonRichardson Sep 22, 2021

Choose a reason for hiding this comment

achilleasa Sep 22, 2021

Choose a reason for hiding this comment

SimonRichardson Sep 22, 2021

Choose a reason for hiding this comment

achilleasa left a comment

Choose a reason for hiding this comment

SimonRichardson commented Sep 22, 2021

manadart Sep 24, 2021

Choose a reason for hiding this comment