-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JUJU-1944] DB-backed lease store implementation #14918
Conversation
fa4cf30
to
148f6f1
Compare
4ff9be8
to
7810906
Compare
Getting build failures with:
But confusingly the unit test are succeeding |
7810906
to
21a13f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just going through it and raising some points for conversation.
core/lease/storedb.go
Outdated
// Leases (lease.Store) returns all leases in the database, | ||
// optionally filtering using the input keys. | ||
func (s *DBStore) Leases(keys ...Key) (map[Key]Info, error) { | ||
// TODO (manadart 2022-11-30): We expect the variadic `keys` argument to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because it always was only 0 or 1 (and we should turn it into a *Key instead)?
IIRC I did implement the listing with N>=2 when doing it elsewhere (and you could do that for the DB by doing an IN query)
I guess is starts getting weird if you have multiple model uuids/namespaces, etc.
You could probably do a few IN queries and then filter at the end. We'd have to see if we need it. (we could also obviously create a composite key with all 3 and just search based on that.)
core/lease/storedb.go
Outdated
AND l.name = ?` | ||
|
||
key := keys[0] | ||
args = []any{key.Namespace, key.ModelUUID, key.Lease} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did check that we have an index (and even a unique index on lease_type_id, model_uuid, name).
Do we have any checking to see that the index itself is actually being used in this case? (given vagaries around t.id vs t.type, etc, I can see that a query optimizer may or may not actually get the index right)
core/lease/storedb.go
Outdated
go func() { | ||
q := ` | ||
INSERT INTO lease (uuid, lease_type_id, model_uuid, name, holder, start, expiry) | ||
SELECT ?, id, ?, ?, ?, datetime('now'), datetime('now', ?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting that sqlite uses datetime
, and pgsql seems to express this differently:
https://www.postgresqltutorial.com/postgresql-date-functions/postgresql-now/ (now()::timestamp
).
I'm surprised that you have to express it differently by backend.
core/lease/storedb.go
Outdated
FROM lease_type | ||
WHERE type = ?`[1:] | ||
|
||
d := fmt.Sprintf("+%d seconds", int64(math.Ceil(request.Duration.Seconds()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that sqlite does support floating point seconds according to the spec. so rather than doing int64(Ceil), we could just do "+%.3f seconds"
core/lease/storedb.go
Outdated
|
||
select { | ||
case <-stop: | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about this request response design. If we do trigger cancel(), shouldn't we still wait on <-err to see that the request has actually been stopped and the goroutine returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you made a chance to call cancel() if we got an error, but I don't see a change that you wait for the cancel to trigger in the goroutine and then wait for the error to come back again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was always calling cancel
, but we wait after cancellation everywhere with:
case <-stop:
cancel()
return errors.Trace(<-errCh)
core/lease/storedb.go
Outdated
|
||
go func() { | ||
q := ` | ||
DELETE FROM lease_pin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be an interesting exercise to think about what these complicated joins look like with sqlair. Since it is primarily input parameters, it should be relatively straightforward, I think. Though you do end up with a "uuid" that doesn't exist on your Key, and an entity that is just a string. So unless we craft a custom type for input, there isn't a simple named thing for all the parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a UUID extension for SQLite, though I don't know the feasibility of compiling it in when building Dqlite.
Then it would be easy then to trigger if-null creation upon insert.
I would like to see us using UUID on our domain objects though - it would be great for traceability to pass them with requests into the API and have objects uniquely identified in the wider system whether-or-not they have made to the the point of persistence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for this particular delete/join the UUID is looked up with a subselect anyway, so type, model_uuid, name, are all part of the key, but the "entity_id" is not. Did we have something which was just ".str" as a way to pass a simple value? I know we supported "Map.name"
but something like:
query := sqlair.Query(`
DELETE FROM lease_pin
WHERE uuid = (
SELECT p.uuid
FROM lease_pin p
JOIN lease l on l.uuid = p.lease_uuid
JOIN lease_type t ON l.lease_type_id = t.id
WHERE t.type = $Key.Namespace
AND l.model_uuid = $Key.model_uuid
AND l.name = $Key.name
AND p.entity_id = Map.entity_id
)`[1:]
err := query.Exec(lease, sqlair.M{"entity_id": entity})
Does that look about right to you for where sqlair would be going?
32cf637
to
9f9a6f1
Compare
// it is not held by the input holder, constituting an invalid request. | ||
if err == nil { | ||
var affected int64 | ||
affected, err = result.RowsAffected() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI:
If a separate thread makes changes on the same database connection while sqlite3_changes() is running then the value returned is unpredictable and not meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not necessarily be the mechanism for populating RowsAffected
. LXD appear to use it in a few places.
return errors.Trace(err) | ||
} | ||
|
||
ctx, cancel := context.WithCancel(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be obsolete when we start passing context.Context through the API server stack.
|
||
select { | ||
case <-stop: | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer cancel()
under the ctx, cancel := context.WithCancel(context.Background())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need to, as there is no path where it isn't called.
We need to record entities that require a lease to remain pinned, not just the fact that it is.
lease query variants.
implementation.
than one filter key.
database-backed lease store.
core/lease package and into worker/lease, where it will be used by the lease manager. The constraint error detection function was moved into the database package. This rearrangement fixes build issues encountered previously.
context cancellation.
is prepared exactly once for the store's lifetime.
9f9a6f1
to
98b70fc
Compare
s.cacheMu.RLock() | ||
if cachedStmt, ok := s.cache[name]; ok { | ||
s.cacheMu.RUnlock() | ||
return cachedStmt, nil | ||
} | ||
s.cacheMu.RUnlock() | ||
|
||
s.cacheMu.Lock() | ||
defer s.cacheMu.Unlock() | ||
|
||
prepared, err := s.db.PrepareContext(ctx, stmt) | ||
if err != nil { | ||
return nil, errors.Trace(err) | ||
} | ||
|
||
s.cache[name] = prepared | ||
return prepared, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure if this will work, I'll investigate, but if you have a leadership change do we keep the prepared statement for all followers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, and it seems to work OK. LXD looks like they prepare up-front too.
#15177 The following brings the 3.0-dqlite feature branch into the develop branch. ### Changes This brings in the dqlite database to sit along side the mongo database. Currently, only leases are implemented in Juju using dqlite, more controller base configuration and data will be subsequently moved over to dqlite once this branch has landed. #### Leases/Raft The whole raft implementation has been removed from Juju completely. This includes the following workers: - raft backstop - raft clusterer - raft log - raft transport - global clock updater In addition, the raft API implementation has also been removed. Instead, the lease has changed to handle the store (dqlite db) directly, improving readability and complexity. ### Jujud The `jujud` agent is now built using musl (specifically musl-gcc). This allows `juju` to be built statically embedding `dqlite` in the same process. There are still some rough edges when building and testing and when this lands, we expect to see some churn to polish any of those issues. Using `go test` is expected to still work as is, this is a last-minute change so that we can utilize sqlite directly for local tests. If you require to test with dqlite (linux only), then running `-tags="dqlite"` with builds/tests/installs is required. All CI jobs are required to run with the dqlite tag. Some notes: 1. `CGO_ENABLED=1` and `CGO_LDFLAGS_ALLOW="(-Wl,-wrap,pthread_create)|(-Wl,-z,now)"` are required if you're using dqlite directly. 2. You are expected to install musl directly on your system if you want to build, using `make musl-install`. This will require sudo. 3. For development purposes we will download dqlite `.a` files from an s3 bucket to facilitate the setup process. The tar file is sha256 summed to ensure no MITM. You can build these locally if you want to bypass s3 using `make dqlite-build-lxd`. This will spin up an lxd container to build. **Do not attempt** to run `make dqlite-build` locally, unless you know what you're doing. 4. To access dqlite from a controller, use `make repl`, this will open up a pseudo repl when you can then explore the database with. `.open <db name>` and then you can use SQL from there. 5. Cross compilation to other architectures can be done using `GOARCH` and `GOOS` before `make install` or `make build`. There are probably some things I've forgotten, expect a discourse post soon, which will highlight the development flow. ---- Two conflicts when merging. The resolution was to bring in the secret backends for the manifold tests and the controller config type changed for `DefaultMigrationMinionWaitMax`. ``` CONFLICT (content): Merge conflict in cmd/jujud/agent/machine/manifolds_test.go CONFLICT (content): Merge conflict in controller/config.go ``` c141b2e (upstream/3.0-dqlite) Merge pull request #15159 from SimonRichardson/system-install-musl-by-default 83656e2 Merge pull request #15156 from SimonRichardson/change-log-ddl 125c19d Fix static-analysis pipeline (#15168) 5abfa24 Merge pull request #15140 from SimonRichardson/allow-testing-on-mac 1dc60f6 (3.0-dqlite) Merge pull request #15153 from SimonRichardson/content-addressable-deps 5a1cd24 Merge pull request #15150 from jack-w-shaw/JUJU-2615_symlink_sudo 4502d63 Merge pull request #15148 from SimonRichardson/better-install-method 88941dd Merge pull request #15134 from SimonRichardson/bootstrap-dqlite-agent-tests 2551ffc Merge pull request #15130 from SimonRichardson/build-jujud-snap 0180a53 (origin/3.0-dqlite, manadart/3.0-dqlite) Merge pull request #15123 from SimonRichardson/fix-manifold-lease-expiry-tests fdf9cc7 Merge pull request #15115 from SimonRichardson/remove-jujud-main-test-file bf58843 Merge pull request #15113 from SimonRichardson/remove-api-raftlease-api-client f9419c0 Merge pull request #15112 from SimonRichardson/fix-initializing-state-twice 334d557 Merge pull request #15108 from SimonRichardson/github-action-go-build 2ee6e1a Merge pull request #15107 from SimonRichardson/cross-building-jujud 5a93305 Merge pull request #15087 from SimonRichardson/ensure-placement-of-file da95dc0 Merge pull request #15086 from SimonRichardson/more-sudo-changes 7b86376 Merge pull request #15085 from SimonRichardson/sudo-apt-get c4d4eb6 Merge pull request #15057 from SimonRichardson/dqlite-local-build 0ac79b3 Merge pull request #15061 from manadart/develop-into-3.0-dqlite adc20f7 Merge pull request #15043 from SimonRichardson/allow-overriding-arch-machine 8c02f22 Merge pull request #15048 from SimonRichardson/static-analysis-fix 4547c06 Merge pull request #15050 from manadart/dqlite-address-option d51b324 Merge pull request #15049 from manadart/dqlite-bootstrap-options 3801b78 Merge pull request #15047 from manadart/develop-into-3.0-dqlite 22d5247 Merge pull request #15037 from SimonRichardson/standardise-dqlite-build 25640a2 Merge pull request #15036 from SimonRichardson/remove-batch-fsm-controller-config dfa4cb1 Merge pull request #15041 from manadart/dqlite-fix-mock caf9481 Merge pull request #15034 from manadart/develop-into-3.0-dqlite c91985d Merge pull request #15035 from SimonRichardson/remove-typed-lease-error 42d17be Merge pull request #15009 from SimonRichardson/allow-repl-via-juju-ssh d798238 Merge pull request #15002 from manadart/dqlite-use-lease-store e4f0d39 Merge pull request #14918 from manadart/3.0-dqlite-lease-store 8315fb7 Merge pull request #14986 from manadart/dqlite-build-from-tags a73b947 Merge pull request #14927 from manadart/3.0-dqlite-lease-store-interface 1657a1d Merge pull request #14910 from manadart/3.0-dqlite-db-supply 27b23f3 Merge pull request #14909 from manadart/3.0-into-3.0-dqlite 6adff35 Merge pull request #14756 from manadart/develop-into-3.0-dqlite 37d81ff Merge pull request #14717 from manadart/develop-into-3.0-dqlite fe2edb8 Merge pull request #14671 from manadart/3.0-simplify-dbaccessor 1a09836 Merge pull request #14604 from manadart/3.0-bootstrap-controller-db 5ad011e Merge pull request #14652 from manadart/develop-into-3.0-dqlite 1c3d250 Merge pull request #14591 from manadart/develop-into-3.0-dqlite 229cd3e Merge pull request #14578 from manadart/3.0-dqlite-simplify 9d715ba Merge pull request #14565 from manadart/develop-into-3.0-dqlite 92ffd88 Merge pull request #14466 from manadart/develop-into-3.0-dqlite 57f67ce Merge pull request #14336 from manadart/develop-into-3.0-dqlite 648d354 Merge pull request #14364 from manadart/update-musl 198621d Merge pull request #14241 from manadart/develop-into-3.0-dqlite 0360db6 Merge pull request #14153 from manadart/develop-into-3.0-dqlite 17950b2 Merge pull request #14053 from manadart/develop-into-3.0-dqlite 9452026 Merge pull request #14016 from manadart/develop-into-3.0-dqlite 741baca Merge pull request #13963 from manadart/develop-into-3.0-dqlite 5449603 Merge pull request #13969 from manadart/dqlite-manifolds 7b612a0 Merge pull request #13944 from SimonRichardson/dqlite-develop
Implementing leases in Dqlite requires a DB-backed implementation of the
lease.Store
indirection.Here we add such an implementation.
This constitutes the core functionality, which will be subject to review and iteration upon threading it into the lease manager worker.
QA steps
No functional changes yet, but changes can be verified by running tests:
make dqlite-deps-pull
.TEST_PACKAGE_LIST=test.list make test
and ensure tests pass.Documentation changes
None.
Bug reference
N/A