Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTxO-HD Ledger DB lock tweaks #43

Closed
Tracked by #4487
jasagredo opened this issue Feb 23, 2023 · 1 comment · Fixed by #74
Closed
Tracked by #4487

UTxO-HD Ledger DB lock tweaks #43

jasagredo opened this issue Feb 23, 2023 · 1 comment · Fixed by #74
Assignees
Labels

Comments

@jasagredo
Copy link
Contributor

After today's meeting, several next steps came up to reduce the locking of the Ledger DB in the UTxO-HD implementation.

The reason for the lock

The LedgerDB lock is introduced to make sure that whenever we acquire a DbChangelog and read the BackingStore, these two values are in agreement, in particular that the anchor of the DbChangelog is indeed the slot that was flushed to the BackingStore.

It is the case that LMDB allows for consistent views of the database as long as a transaction is kept open. We therefore can make use of this feature to "acquire" a transaction and release the lock of the DB.

Places that need to change

  • Flushing: It is right now the case that flushing differences in caught-up mode only happens when taking snapshots. Therefore we are waiting ~72 minutes between flushes on mainnet which is too much. Flushing should happen regularly, perhaps every 100 blocks or so (make it configurable). Either do this on a background thread (maybe even the copyAndSnapshotRunner can sometimes flush?) or synchronous with the logic that advances the chain.
  • When flushing it will be the only moment a write lock will be acquired, not when taking a snapshot. Taking a snapshot in fact just needs to read the db.
  • In the places where we are now acquiring the read lock and doing stuff, we must instead acquire the read lock, acquire the relevant state of the mutable variables (i.e. read the DbChangelog tvar and open a read tx to the db) and then do the processing without holding the lock.

Done?

To consider this done, it should be the case that after implementing the changes above, the system level benchmarks show a reasonable amount of locking (perhaps even minimal if things go according to plan).

The prevision for starting this is once the cleanup branch of UTxO-HD is complete.

@jasagredo
Copy link
Contributor Author

The code has been ported, but I'm finding issues in many test-suites. Perhaps I introduced a logic bug.

In any case, I think as access to the BackingStore is performed in several places, we should document somewhere the places where this happens.

@jasagredo jasagredo linked a pull request May 10, 2023 that will close this issue
@dnadales dnadales moved this from 🏗 In progress to 👀 In review in Consensus Team Backlog May 16, 2023
jasagredo added a commit that referenced this issue Jun 2, 2023
# Description

Rework the locking logic for the ledger DB RAW lock. There are mainly 4 places where locking happens:
- background thread that flushes regularly: uses a **write** lock while writing the differences only.
- background thread that creates snapshots: holds a **read** lock for the duration of the snapshot
- forging loop: **quick read** locking to acquire a ledger db and a value handle to get a snapshot
- queries: **quick read** locking to acquire a ledger db and a value handle

This should reduce locking issues.

There are also some side effects in the PR:
- the forging loop is again a `WithEarlyExit` block
- getting a snapshot can no longer fail s you provide the chlog and the value handle
- new policy function `onDiskShouldFlush`

Closes #43
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Consensus Team Backlog Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants