Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Improve the way migrations handle transactions #1737

Merged

Conversation

AndrewSisley
Copy link
Contributor

@AndrewSisley AndrewSisley commented Jul 31, 2023

Relevant issue(s)

Resolves #1649 #1592

Description

Improves the way migrations handle transactions, as well as fixing a couple of concurrency issues:

  • Adds locks around the various registry properties, these maps can be accessed concurrently and need to be protected.
  • Removes the transaction continuity issue in the client.LenRegistry interface, where db.LensRegistry() returns an object that does not respect the transactionality of the parent store, and takes txns as input parameters to some of its functions. It does this by following the same pattern as db.db. (Sort out client.LensRegistry transaction stuff #1649)
  • Fixes the bugs in the lens package where migrations set were not visible/accessible until after commit. They are now visible within the transaction scope. (LensRegistry does not allow registered items to be accessed within current txn #1592)

It still does not provide transaction snapshot isolation, I see that issue as relatively high effort low reward at the moment.

@AndrewSisley AndrewSisley added bug Something isn't working area/schema Related to the schema system action/no-benchmark Skips the action that runs the benchmark. labels Jul 31, 2023
@AndrewSisley AndrewSisley added this to the DefraDB v0.7 milestone Jul 31, 2023
@AndrewSisley AndrewSisley requested a review from a team July 31, 2023 19:13
@AndrewSisley AndrewSisley self-assigned this Jul 31, 2023
@codecov
Copy link

codecov bot commented Jul 31, 2023

Codecov Report

Patch coverage: 73.66% and project coverage change: -0.05% ⚠️

Comparison is base (07380d8) 75.69% compared to head (cfa6983) 75.64%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1737      +/-   ##
===========================================
- Coverage    75.69%   75.64%   -0.05%     
===========================================
  Files          208      209       +1     
  Lines        21735    21900     +165     
===========================================
+ Hits         16451    16566     +115     
- Misses        4140     4186      +46     
- Partials      1144     1148       +4     
Flag Coverage Δ
all-tests 75.64% <73.66%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
lens/txn_registry.go 44.05% <44.05%> (ø)
lens/fetcher.go 67.94% <45.45%> (-1.71%) ⬇️
datastore/txn.go 95.45% <87.50%> (-4.55%) ⬇️
db/db.go 71.63% <88.89%> (+0.62%) ⬆️
lens/registry.go 76.68% <92.66%> (+4.66%) ⬆️
api/http/handlerfuncs.go 80.35% <100.00%> (+1.72%) ⬆️
datastore/concurrent_txn.go 100.00% <100.00%> (ø)
db/fetcher/versioned.go 58.20% <100.00%> (+0.45%) ⬆️
db/txn_db.go 58.82% <100.00%> (+0.38%) ⬆️
lens/lens.go 73.74% <100.00%> (+0.27%) ⬆️

... and 7 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07380d8...cfa6983. Read the comment docs.

@fredcarle
Copy link
Collaborator

It still does not provide transaction snapshot isolation, I see that issue as relatively high effort low reward at the moment.

Shouldn't this be a function of the underlying datastore and not something we should handle?

@AndrewSisley
Copy link
Contributor Author

It still does not provide transaction snapshot isolation, I see that issue as relatively high effort low reward at the moment.

Shouldn't this be a function of the underlying datastore and not something we should handle?

The datastore cannot handle wasm modules loaded into memory

@@ -311,7 +311,7 @@ func setMigrationHandler(rw http.ResponseWriter, req *http.Request) {
return
}

err = db.LensRegistry().SetMigration(req.Context(), txn, cfg)
err = db.LensRegistry().SetMigration(req.Context(), cfg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I think this is not setting the migration with the above txn. You probably need a call to db.WithTxn for the registry to be using it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good spot! It will now be using an implicit txn, and the explicit one here can be deleted completely

  • Remove pointless txn object

Comment on lines +57 to +61
// Writable transaction contexts by transaction ID.
//
// Read-only transaction contexts are not tracked.
txnCtxs map[uint64]*txnContext
txnLock sync.RWMutex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: This seems to complicate things more than it needs too. With it, we now need to keep track of transaction IDs and pass that ID in a bunch of places. The benefit is not clear to me at the moment.

question: Why not simply keep the context within implicitTxnLensRegistry and explicitTxnLensRegistry?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'd actually only need to keep it in explicitTxnLensRegistry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We couldn't delete any code if we moved this to explicitTxnLensRegistry, it would just move. What do you think would get simpler?

If either one of implicitTxnLensRegistry or explicitTxnLensRegistry needs the txnCtx then most of the functions in registry require it in order to keep the func signatures the same (or we'd have to duplicate a bunch of code)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCtx gets much simpler and no need to worry about transaction IDs: fredcarle@b059e91

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have made the mistake I made at first, doing it like that means that the same txn cannot scope across many different LensRegistry instances, and I belive the tests do catch this and the changes you linked should fail the build.

For example:

txn := db.NewTxn()
reg1 := db.WithTxn(txn).LensRegistry()
reg2 := db.WithTxn(txn).LensRegistry()

// reg1 and reg2 do not share scope with your proposed method. But they absolutely have to, as they are scoped to the same txn
// for example the below will fail:
reg1.SetMigration(foo)
assert.NotEmpty(reg2.Config()) // this is empty, as reg2 is not sharing reg1's scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the current (registry) solution should still work when we allow external management of transaction, whereas yours would require us to revert your suggestion and re-introduce the one currently within the PR.

Datastore transactions are not restricted to a single in-memory instance. Long term we will need to change the ID logic to properly reflect this (e.g. for badger we should perhaps use the Badger txn id/ts), but the lens registry stuff should be able to stay the same.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of the lensRegistry struct. This one is the same but yes the explicitTxnLensRegistry is different.

re2.Config will be different because the txnContext is different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re2.Config will be different because the txnContext is different

This is the first problem I am talking about, they shouldn't be different, it is the same txn.

(the second problem being that we'll have to revert back to roughly the solution in PR as soon as we properly expose/track transactions)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thanks for your engagement here. I think I can see the bigger picture more clearly.

Side thought: Having external management of transactions will make DefraDB quite in a league of it's own where software interfacing with DefraDB won't need a driver to be able to do that kind of interaction. Which is pretty cool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

external management of transactions

I am really really looking forward to this, and any gotchas we discover along the way. Seems like a tiny thing, but I would have used it to death as an app dev 😁

Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good forward thinking on this one. Thanks again for helping me understand the approach more clearly. Just one todo before merging 👍

db/db.go Outdated
// Warning: we currently rely on this prop being 64-bit aligned in memory
// relative to the start of the `db` struct (for atomic.Add calls). The
// easiest way to ensure this alignment is to declare it at the top.
previousTxnID uint64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: go 1.19 introduced new atomic type to help make these variables safer. In this case I would encourage using atomic.Uint64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh - nice - thanks, I saw the note in the atomic go-doc, but never clocked on that it had introduced a Uint64 wrapper-type, I though it meant the normal uint64 and gave up looking. Will change!

  • atomic.Uint64

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also remove the warning as the new type cover for 32bit systems

Int64 and Uint64 are automatically aligned to 64-bit boundaries in structs and allocated data, even on 32-bit systems.

This can be used to identify transactions.  The lens registry will make use of this shortly, we may wish to use it in some form at a later date to allow external transaction management via stuff like GQL.

It is very basic at the moment and can be improved upon later.
Will be used shortly from within the lens registry in order to clean up state that is no longer wanted (and that is no longer correct).
This commit ended up doing quite a lot, as I figured out various intertwined issues, it does:
- Adds locks around the various registry properties, these maps can be accessed concurrently and need to be protected.
- Removes the transaction continuity issue in the client.LenRegistry interface, where db.LensRegistry() returns an object that does not respect the transactionality of the parent store, and takes `txn`s as input parameters to some of its functions. It does this by following the same pattern as `db.db`.
- Fixes the bugs in the lens package where migrations set were not visible/accessible until after commit.  They are now visible within the transaction scope.
@AndrewSisley AndrewSisley merged commit 9d75c45 into sourcenetwork:develop Aug 3, 2023
12 checks passed
@AndrewSisley AndrewSisley deleted the 1649-lens-txn-stuff branch August 3, 2023 18:57
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
## Relevant issue(s)

Resolves sourcenetwork#1649 sourcenetwork#1592

## Description

Improves the way migrations handle transactions, as well as fixing a
couple of concurrency issues:

- Adds locks around the various registry properties, these maps can be
accessed concurrently and need to be protected.
- Removes the transaction continuity issue in the client.LenRegistry
interface, where db.LensRegistry() returns an object that does not
respect the transactionality of the parent store, and takes `txn`s as
input parameters to some of its functions. It does this by following the
same pattern as `db.db`. (sourcenetwork#1649)
- Fixes the bugs in the lens package where migrations set were not
visible/accessible until after commit. They are now visible within the
transaction scope. (sourcenetwork#1592)

It still does not provide transaction snapshot isolation, I see that
issue as relatively high effort low reward at the moment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/schema Related to the schema system bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sort out client.LensRegistry transaction stuff
2 participants