Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collection is unacceptably slow #2806

Closed
bowenwang1996 opened this issue Jun 7, 2020 · 2 comments · Fixed by #2875
Closed

Garbage collection is unacceptably slow #2806

bowenwang1996 opened this issue Jun 7, 2020 · 2 comments · Fixed by #2875
Assignees
Labels
A-storage Area: storage and databases P-critical Priority: critical

Comments

@bowenwang1996
Copy link
Collaborator

bowenwang1996 commented Jun 7, 2020

When the node garbage collects data, it invokes clear_data, which deletes old block and chunk data. However, in practice we have observed extraordinary slowness in this function. Under current setup, at every call to clear_data, we garbage collect 100 heights worth of data and it takes 30-60s to execute after the network has been running for a while, which is absolutely devastating and unacceptable. I suspect that this has something to do with rocksdb compaction and deletion of data might have triggered compaction, which can take a while to run, depending on the size of the data. To address this issue, we should

  • in the short term find temporary solution (better than fix: mitigate gc slowness by reducing step size #2807) that minimizes the time spent on garbage collection.
  • After the temporary fix, we should develop comprehensive benchmarks to test and fine-tune rocksdb options, as well as our garbage collection parameters, or find some other ways to fully fix this issue.
@bowenwang1996
Copy link
Collaborator Author

bowenwang1996 commented Jun 7, 2020

@mikhailOK @frol @SkidanovAlex please help if you know anything related to rocksdb options.

@bowenwang1996 bowenwang1996 assigned Kouprin and unassigned chefsale Jun 7, 2020
bowenwang1996 added a commit that referenced this issue Jun 7, 2020
A hot fix that mitigates #2806 by reducing garbage collection step to the absolute minimum. However, even if we just advance tail by 1 height at each step, `clear_data` still takes about 0.5s to execute, which is very suboptimal.

Test plan
----------
Deploy on betanet and observe that garbage collection speeds up from more than 30s to 0.5s at each step.
@Kouprin Kouprin removed their assignment Jun 9, 2020
@Kouprin
Copy link
Member

Kouprin commented Jun 9, 2020

It seems I can't resolve it now. Please assign on me when becomes actual.

@bowenwang1996 bowenwang1996 assigned mikhailOK and unassigned ailisp Jun 9, 2020
bowenwang1996 added a commit that referenced this issue Jun 22, 2020
Garbage collection is slow because we do not persist chunk tail and therefore `clear_chunk_data` iterates from 0 to min_chunk_height every time it is called. Fixes #2806.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Area: storage and databases P-critical Priority: critical
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants