Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add time slicing to splitstore purging to reduce lock congestion #11269

Merged
merged 4 commits into from
Sep 18, 2023

Conversation

fridrik01
Copy link
Contributor

@fridrik01 fridrik01 commented Sep 14, 2023

Related Issues

See #11251

Context

When splitstore is enabled we perform compaction every 7h30m. During this compaction (which takes several minutes on my fast SSD) there is relatively small purging step which takes approx 15sec. The purging step chunks the cids which should be purged and operates on them in blocks. In each block it acquires an exclusive lock which it shares with other I/O operation (for example FVM when applying messages).

So when purging happens, other operations like MpoolSelect block for the whole 15sec duration when it needs to compute tipset state (TipSetState).

Proposed Changes

I added a simple time slicing to the purging step which only tries to purge for 1 sec, while pausing for 1 sec until it is done. This effectively makes the purge 2x slower, but I don't think we care so much as it only happens every 7h30m and purging is already relatively fast compare the the whole splitstore compaction.

Test plan

Using the same testing method as described in this comment I observed with this fix that during splitstore compaction the whole TipSetState went down from 15sec to ~2sec.

Additional Info

See #11251 (comment)

Checklist

Before you mark the PR ready for review, please make sure that:

  • Commits have a clear commit message.
  • PR title is in the form of of <PR type>: <area>: <change being made>
    • example: fix: mempool: Introduce a cache for valid signatures
    • PR type: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, test
    • area, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps
  • If the PR affects users (e.g., new feature, bug fix, system requirements change), update the CHANGELOG.md and add details to the UNRELEASED section.
  • New features have usage guidelines and / or documentation updates in
  • Tests exist for new functionality or change in behavior
  • CI is green

@fridrik01 fridrik01 changed the title Add time slicing ot splitstore purging to reduce lock congestion fix: Add time slicing to splitstore purging to reduce lock congestion Sep 14, 2023
@fridrik01 fridrik01 marked this pull request as ready for review September 14, 2023 13:59
@fridrik01 fridrik01 requested a review from a team as a code owner September 14, 2023 13:59
@Stebalien
Copy link
Member

We likely either need to:

  1. Take some kind of lock when computing tipsets (and/or mining blocks). 1s isn't quite enough to compute a tipset so we'll end up pausing multiple times.
  2. Pause longer. IMO, there's no rush here unless I'm missing something.

That's not to say that the current patch won't help. Even if we only pause for 1s, that means we'll be able to spend 50% of the time computing the tipset which means it'll take ~6s on average instead of 3. But I'd still consider pausing for longer.

@Stebalien
Copy link
Member

That's not to say that the current patch won't help. Even if we only pause for 1s, that means we'll be able to spend 50% of the time computing the tipset which means it'll take ~6s on average instead of 3. But I'd still consider pausing for longer.

I.e., I'd consider pausing for 4s every 1s, or something like that. That gives us 20% utilization, which means it should only make blocks take 20% longer.

Ideally we'd look at datastore metrics and determine that the datastore is "busy". But that's more complicated.

@Stebalien
Copy link
Member

LGTM but this one needs an approval from @arajasek.

@arajasek
Copy link
Contributor

I don't see any harm in doing this, but I'm not sure I fully believe the logic here.

When splitstore is enabled we perform compaction every 7h30m. During this compaction (which takes several minutes on my fast SSD) there is relatively small purging step which takes approx 15sec. The purging step chunks the cids which should be purged and operates on them in blocks. In each block it acquires an exclusive lock which it shares with other I/O operation (for example FVM when applying messages).

We're saying that the purging step takes 15 seconds, but operates in batches. The batch size is 16k, which seems relatively small to me. We take the lock for each batch.

So when purging happens, other operations like MpoolSelect block for the whole 15sec duration when it needs to compute tipset state (TipSetState).

In that case, we shouldn't be hogging the lock for the entire 15s, no? We're aggressively trying to grab it between batches, but we should still be releasing the lock for other operations in between batches. Is it the case that a singly batch is taking 15s?

Copy link
Contributor

@arajasek arajasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about the premise itself. If we're landing, let's make the "work" time and "sleep" time constants (no need to be configurable yet).

elapsed := time.Since(now)
if elapsed > time.Second {
// work 1 second, sleep 4, or 20% utilization
time.Sleep(4 * elapsed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 * elapsed? Do we not want 4 * time.Second here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we schedule slightly over second, this means that we sleep a bit longer guaranteeing a 20% utilization

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that makes sense.

@fridrik01
Copy link
Contributor Author

I don't see any harm in doing this, but I'm not sure I fully believe the logic here.

When splitstore is enabled we perform compaction every 7h30m. During this compaction (which takes several minutes on my fast SSD) there is relatively small purging step which takes approx 15sec. The purging step chunks the cids which should be purged and operates on them in blocks. In each block it acquires an exclusive lock which it shares with other I/O operation (for example FVM when applying messages).

We're saying that the purging step takes 15 seconds, but operates in batches. The batch size is 16k, which seems relatively small to me. We take the lock for each batch.

So when purging happens, other operations like MpoolSelect block for the whole 15sec duration when it needs to compute tipset state (TipSetState).

In that case, we shouldn't be hogging the lock for the entire 15s, no? We're aggressively trying to grab it between batches, but we should still be releasing the lock for other operations in between batches. Is it the case that a singly batch is taking 15s?

Yes we release the lock between batches but when applying messages we fight for that lock independently for each message so if purging takes 15sec we end up trying to acquire the lock for these messages until the whole purging step is done.

@arajasek
Copy link
Contributor

@fridrik01 Aaah, the lock contention is for every single message application. Got it, that makes complete sense. Thank you!

Copy link
Contributor

@arajasek arajasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make time.Second a constant, but LGTM!

@fridrik01 fridrik01 merged commit 8aaa8de into master Sep 18, 2023
87 checks passed
@fridrik01 fridrik01 deleted the throttle-splitstore-purging branch September 18, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants