Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APIServer: Raft context #13342

Merged
merged 7 commits into from
Sep 22, 2021
Merged

APIServer: Raft context #13342

merged 7 commits into from
Sep 22, 2021

Conversation

SimonRichardson
Copy link
Member

The raft context is a way to mediate between the RaftOpQueue with a
worker listening on the other end and a way for facades to enqueue
commands onto the queue. It's done like this, so it doesn't expose the
raft instance arbitrarily to the facades, limiting abuse and side effects.

Additionally, having a queue in between the instance and the facade
allows us to apply back pressure and deadlines to enqueuing.

The code is simple, but very mechanical, as we have to thread the
queue from the machine manifolds all the way to the facade context.

QA steps

$ juju bootstrap lxd test
$ juju model-config -m controller logging-config="<root>=INFO;juju.worker.lease.raft=TRACE;juju.core.raftlease=TRACE;juju.worker.globalclockupdater.raft=TRACE;juju.worker.raft=TRACE"
$ juju enable-ha
$ juju debug-log -m controller

Adds a new error type - Not Leader to the api server error types. This
will be used to tell the client that it can not process the raft
application without trying to move the request to the leader.

Additionally, I've moved the error types to a new file to make it a lot
easier to understand the code.
The raft context is a way to mediate between the RaftOpQueue with a
worker listening on the other end and a way for facades to enqueue
commands onto the queue. It's done like this, so it doesn't expose the
raft instance arbitrary to the facades, limiting abuse and side effects.

Additionally, having a queue in between the instance and the facade
allows us to apply back pressure and deadlines to enqueuing.

The code is simple, but _very_ mechanical, as we have to thread the
queue from the machine manifolds all the way to the facade context.
The facade will interact with the underlying raft queue that is exposed
via the facade context.

The facade is ultra lightweight as we've designed the facade to not have
access to the underlying raft instance.
@@ -319,6 +320,9 @@ func AllFacades() *facade.Registry {

reg("ProxyUpdater", 1, proxyupdater.NewFacadeV1)
reg("ProxyUpdater", 2, proxyupdater.NewFacadeV2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have extra whitespace here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability more than anything.

apiserver/apiserver_test.go Outdated Show resolved Hide resolved
apiserver/errors/errors.go Outdated Show resolved Hide resolved
// topic).
func (facade *Facade) ApplyLease(args params.LeaseOperations) (params.ErrorResults, error) {
results := make([]params.ErrorResult, len(args.Operations))
for k, op := range args.Operations {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the operations are applied sequentially, if the leader change while an operation is being applied you may end up in a situation where ops [0-m] are applied and (m, n] fail.

It could be worth investigating whether we could work around this issue by creating a batch apply command which essentially is a list of the individual op.Command byte-streams and applying that in one go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but I wasn't going to do it for this PR. As we don't consolidate operations yet.

@@ -395,3 +396,7 @@ func IsCodeCloudRegionRequired(err error) bool {
func IsCodeQuotaLimitExceeded(err error) bool {
return ErrCode(err) == CodeQuotaLimitExceeded
}

func IsCodeNotLeader(err error) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think we should invest some time to create a juju-dev-helper tool to automate scaffolding work for repeated tasks. E.g

juju-dev-helper add-server-error
juju-dev-helper add-facade
...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dream.

apiserver/raft.go Outdated Show resolved Hide resolved
The following is just a mechanical change, moving the queue from worker
to core. As rightly pointed out, importing workers from the apiserver
seems a little bit off.
The rehydration of not leader error didn't need to jump through if
statements to check for values, instead we can do it in one line.
As logging parameters aren't lazy, we have to check if trace is enabled
before we try and trace.
The following handles the not leader error differently from a normal
error. If we get a not leader error, we should stop processing any more
leases and mark all subsequent errors as the same and then exit out.
Normal errors, we do want to keep iterating on, as we just don't if we
can bail out early unless we have more typed error cases.
Copy link
Contributor

@achilleasa achilleasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SimonRichardson
Copy link
Member Author

$$merge$$

@jujubot jujubot merged commit a282d03 into juju:2.9 Sep 22, 2021
@SimonRichardson SimonRichardson deleted the raft-facade branch September 22, 2021 14:08
jujubot added a commit that referenced this pull request Sep 23, 2021
#13346

~~**Requires #13342 to land first**~~

The following adds a raftlease client. The aim is to provide a robust
way of sending requests whilst being honest about the error messages
that fail. In particular, we drop requested commands when we don't have a
connection and it's up to the callee to retry again. Previous
implementations just dropped them on the floor.

## QA steps

Current tests pass. As this hasn't been turned on yet, we just need
to ensure that existing implementations work.

```sh
$ juju bootstrap lxd test
$ juju model-config -m controller logging-config="<root>=INFO;juju.worker.lease.raft=TRACE;juju.core.raftlease=TRACE;juju.worker.globalclockupdater.raft=TRACE;juju.worker.raft=TRACE"
$ juju enable-ha
$ juju debug-log -m controller
```
// If no information is supplied, it is expected that the client performs their
// own algorithm to locate the leader (roundrobin or listen to the apidetails
// topic).
func (facade *Facade) ApplyLease(args params.LeaseOperations) (params.ErrorResults, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApplyLeaseOp might be a better name. A naive glace at this might suggest a lease claim, but they can be claims/extensions/revocations.

jujubot added a commit that referenced this pull request Sep 28, 2021
#13361

Merge from 2.9 to bring forward:
- #13360 from wallyworld/simplestreams-compression
- #13359 from manadart/2.9-lxd-container-images
- #13352 from tlm/aws-instance-profile
- #13358 from jujubot/increment-to-2.9.16
- #13354 from wallyworld/refresh-consume-proxy
- #13353 from wallyworld/cmr-consume-fixes
- #13346 from SimonRichardson/raft-api-client
- #13349 from wallyworld/remove-orphaned-cmrapps
- #13348 from benhoyt/fix-secretrotate-tests
- #13119 from SimonRichardson/pass-context
- #13342 from SimonRichardson/raft-facade
- #13341 from ycliuhw/feature/quay.io

Conflicts (easy resolution):
- apiserver/common/crossmodel/interface.go
- apiserver/errors/errors.go
- apiserver/params/apierror.go
- apiserver/testserver/server.go
- scripts/win-installer/setup.iss
- snap/snapcraft.yaml
- version/version.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants