Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a node operator, I want to control the database size in bytes #1449

Open
jakubcech opened this issue May 13, 2019 · 3 comments · May be fixed by #1653
Open

As a node operator, I want to control the database size in bytes #1449

jakubcech opened this issue May 13, 2019 · 3 comments · May be fixed by #1653
Assignees

Comments

@jakubcech
Copy link
Contributor

@jakubcech jakubcech commented May 13, 2019

Description

The initial local snapshot integration works with milestones as a means of tracking and controlling the snapshotting mechanism and node behavior. This was a first iteration on the implementation and it should evolve into a more user friendly solution. Users are not really (that) interested in how many milestones are stored in their DBs. They instead want to control their DB size in an absolute byte size.

Also requested in #1275 by Nuriel.

Motivation

Improved user experience of operating a node. The settings should represent what users expect in respective scenarios.

Requirements

  • I'm able to set a maximum size X I want the node database to have.
    • We set a minimum DB size to something equivalent to the current minimum in milestones - 10,000. Is that what, 10-20GB?
  • The node prunes the database when reaching the size X.
  • The mainnetDB doesn't grow over that size specified in the config. property.
  • We keep the existing milestone option in for now. Whichever value is hit first is used for the decision to prune the DB.

Implementation possibilities:

  • We can use the transactionCounter, we should benchmark what is the error rate there.
  • Or we can query the size of the folder :)

Separate follow-up issue:

  • I'm able to programmatically and remotely retrieve the node's operation size (all DBs and files added up). Can be just the mainnet DB in the first iteration.
    Separate follow-up issue:
    Remove (?) the minimum limit on the DB size?

We're not addressing the minimum 'allowed' DB size as part of this issue.

@no8ody

This comment has been minimized.

Copy link

@no8ody no8ody commented May 13, 2019

Perhaps you could include the corresponding statement in getNodeInfo, so you can see how big the history of a node is approximately.

@jakubcech jakubcech removed the L-Groom label Sep 12, 2019
@kwek20 kwek20 self-assigned this Oct 31, 2019
@kwek20 kwek20 linked a pull request that will close this issue Nov 6, 2019
6 of 6 tasks complete
@kwek20

This comment has been minimized.

Copy link
Member

@kwek20 kwek20 commented Dec 9, 2019

These are the following results of research on this.
The main issue with this is obtaining a reasonably accurate size of the total database.

The ideal size can be obtained by the following sequence: Start a node, send/receive transactions, stop a node, start a node, stop again. Then the folder where the db files are in is correct.
Without doing the second start/stop, files are still managed by the system, or in WAL files or in one a buffer.

Besides this, another problem exists: compression and compaction. Sometimes files get written to disk, and in a later stage compressed or compacted.

Both of these together lead to very inaccurate db size estimation.
Sometimes a database of 100mb, suddenly becomes 400+mb, and then decreases to 80mb.
If we were to set the limit at 300mb, we suddenly had the need to prune one third of the database.

Real world scenarios probably use ~80GB, inaccuracy is only over recent data. However with a spam event, or if we ever want to have regular 100tps, this (lack of) accuracy is not enough.

A couple technical solutions proposed:
db.getTotalSize(): Returns the size the database is in bytes on disk. excludes anything else like caches, buffers.
Bonus: NUMBER_KEYS_WRITTEN can be used to get a more exact size if multiplied by average bytes per key. However this changes as well for any non transaction write to the db (meta, state etc)
Storing/writing to disk more often: Slower DB, more files on disk. Would increase the a mount of times db.getTotalSize() changes.

Test run over 1M-0 value tx, average of 10 attempts. 32MB is ~12500 raw tx data
settting column family buffer size:  (When to write to disk, could at most double, per CF)
2MB: 22 sec
32MB: 18 sec
64MB: 15 sec

CompactRange: Perfect as it compacts and writes to disk, but takes ages (~6 hours for 150GB database). Works exactly as required in a 150MB node :=)
Estimating: Calculate db size by the db count of transactions. db.count is only exact after writing to disk, and can vary by the hundreds of thousands.
rocksdb.estimate-live-data-size: doesnt work
db.getApproximateMemTableStat: uses ranges, have not been able to run without crash or nullpointer. Very little documentation, all for C

Current best solution: check db.getTotalSize(), when it updates check if it didnt suddenly increase more than 10%, if not run a check if it exceeds size. Then if it does, attempt to count amount of tx in lowest milestone, x650 bytes(average size of 1.000.000 0-value tx + 10%) and if its not enough yet, add up the milestone with the index +1, loop untill size estimation is reached.
Assuming that the node has a reasonably correct size when it starts up, this will work-ishhhh.
This will hoever ensure the DB is actually always OVER your set amount 📦 , maybe add a +1 ms buffer once threshold is reached?

@GalRogozinski

This comment has been minimized.

Copy link
Member

@GalRogozinski GalRogozinski commented Dec 9, 2019

Since it is very hard to estimate I offer a little change to:

The node prunes the database when reaching the size X

When we do your proposed "best" solution, but make it even simpler. Have less smart counting and more dumb logic. E.g. if we are close enough to X (can be measured in GB) just prune a predetermined # of milestones.

As long as we are in the range between max and min it should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.