Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp/ingest/io: Skip storing live entries seen in the oldest bucket #2618

Merged

Conversation

bartekn
Copy link
Contributor

@bartekn bartekn commented May 22, 2020

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

This commit updates SingleLedgerStateReader to skip updating a temp ledger key store when processing the oldest bucket. This lowered memory usage of SingleLedgerStateReader from 550MB to 325MB when processing recent pubnet ledgers.

Why

After checking memory usage of SingleLedgerStateReader and BucketEntryType distribution in public network buckets:

1488c97c7b1f7d04f6475d874979439c7695df954dcf65612c790166972f4298 meta 1 live 294 dead 62 init 62
6706387e4069cdd1c728d59bc07dec8bed5c2179ad64d3331db5405ddc542661 meta 1 live 285 dead 64 init 75
7de5d543ca7ad31cafb624a2b1fd332c0f5e8a32626b0140d4d812411b3393a9 meta 1 live 632 dead 94 init 81
a813478b35328244e94e90b2f998b80177ed09b06020817bc11b5293a6e671ff meta 1 live 589 dead 74 init 93
2c97b86df1d36f5b5d2f418cabcd1049bc31c32180bc131edc63efa42ff04b47 meta 1 live 1249 dead 147 init 152
d01d5997ff8402ce75eab80f7771a334a60a7829e1ecba41c546a6c834303104 meta 1 live 1579 dead 175 init 170
2952c6ccb006c5339791b2ddab6333b3204c7c2b2f8d82b9baaf51b208d930d9 meta 1 live 2464 dead 186 init 229
efc96e77c5f9541e1698051aa0cbcd58bbbec0cd0b00eb525f9402408a82e438 meta 1 live 7752 dead 290 init 321
f29ddfcfe312e39233055986eb54d0cde0c37f1b0de73b3bdda0189609bf5742 meta 1 live 7462 dead 335 init 327
da91df47e3516d4f9e3d539e6206f1c310de291ff1a3a8b9e2a0f6f0c5fb6453 meta 1 live 11101 dead 594 init 608
0661c27a3682fdf9dc288bedaf71e7cb114dca0c44d282d9c396c3d81ff7b130 meta 1 live 10645 dead 476 init 656
447acb0cf05db29581b393fd81120bcda674e9e329252bf0f8f5be6e52b10911 meta 1 live 15264 dead 919 init 1732
60553007ea09b34ad385e97540fd5a6e0c461b8d49fb21c556c6eefce3ef4622 meta 1 live 15450 dead 3840 init 1365
de2170fd68ffc12c464b5d5bcb8bb515d9a00d79efa7f75106d993ced1fcf4c0 meta 1 live 21963 dead 4437 init 4500
a25139f2600d39b0857d6060e8faa3c4240def42f30faba2d1ae46009ded5445 meta 1 live 86064 dead 7772 init 19527
42e8c8aab93ab365b5b7e06098452e5c9afcdc69076c9179c31b19d65ce84bcb meta 1 live 192682 dead 13555 init 20042
6f9dac938ccccfa1bfd82041087f6912271fffeae7dd84e95d038bedb0603a3e meta 1 live 44340 dead 25491 init 18752
9eab1d1af68cdad21721bb004e8f247ff04cf445ce735cadb9d4318f062e5e37 meta 1 live 323546 dead 14726 init 47435
dea92c0c3b0ebc841d6043a0ea57637379b8cf2ccce82f0e8592ee37cc22252e meta 1 live 468104 dead 271894 init 265515
95e79159a96955be5f95f1fb1d8ad8391003d6f4855814ce20ae68631b067447 meta 1 live 715201 dead 643781 init 164183
3433eea1a40d0c8fce784f09e571dbef65677f3c368e624136fa622b9b892aff meta 1 live 4284497 dead 0 init 3206695
sum meta 21 live 6211163 dead 988912 init 3752520

I observed that the last bucket contains a large number of live entries (~4.3M) but there's no point in storing them in tempSet as seen because this is the last bucket being processed and ledger keys are unique in each bucket (confirmed by @jonjove).

@bartekn bartekn requested review from MonsieurNicolas, jonjove and a team May 22, 2020 14:28
@cla-bot cla-bot bot added the cla: yes label May 22, 2020
// 1. Ledger keys are unique within a single bucket.
// 2. This is the last bucket we process so there's no need to track
// seen last entries in this bucket.
if oldestBucket {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can apply the same logic to the entry.Type == xdr.BucketEntryTypeDeadentry case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The oldest bucket does not contain any dead entries. @jonjove is it always true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to call msr.tempStore.Exist(h) on the last bucket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because it's possible that live or dead entry for a given ledger key has been seen in one of the previous buckets.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartekn It is always true that the oldest bucket does not contain any dead entries.

@bartekn
Copy link
Contributor Author

bartekn commented May 22, 2020

Confirm the change works correctly for the latest checkpoint using dump-ledger-state.

@tomquisel
Copy link
Contributor

@bartekn amazing job! This is such a good optimization. 🎉 🎉

@MonsieurNicolas
Copy link
Contributor

cool. @bartekn you also need to smoke test this with older protocol versions (before 11), before "INIT" entries got added

@bartekn
Copy link
Contributor Author

bartekn commented May 27, 2020

Works correctly for pre-11 ledger: 20406335.

@bartekn bartekn merged commit 4b6180a into stellar:master May 27, 2020
@bartekn bartekn deleted the ingest-skip-storing-seen-keys-in-last-bucket branch May 27, 2020 15:52
@tamirms tamirms added this to the Horizon 1.4.0 milestone Jun 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants