Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
As a node operator, I want to control the database size in bytes #1449
The initial local snapshot integration works with milestones as a means of tracking and controlling the snapshotting mechanism and node behavior. This was a first iteration on the implementation and it should evolve into a more user friendly solution. Users are not really (that) interested in how many milestones are stored in their DBs. They instead want to control their DB size in an absolute byte size.
Also requested in #1275 by Nuriel.
Improved user experience of operating a node. The settings should represent what users expect in respective scenarios.
Separate follow-up issue:
We're not addressing the minimum 'allowed' DB size as part of this issue.
These are the following results of research on this.
The ideal size can be obtained by the following sequence: Start a node, send/receive transactions, stop a node, start a node, stop again. Then the folder where the db files are in is correct.
Besides this, another problem exists: compression and compaction. Sometimes files get written to disk, and in a later stage compressed or compacted.
Both of these together lead to very inaccurate db size estimation.
Real world scenarios probably use ~80GB, inaccuracy is only over recent data. However with a spam event, or if we ever want to have regular 100tps, this (lack of) accuracy is not enough.
A couple technical solutions proposed:
CompactRange: Perfect as it compacts and writes to disk, but takes ages (~6 hours for 150GB database). Works exactly as required in a 150MB node :=)
Current best solution: check db.getTotalSize(), when it updates check if it didnt suddenly increase more than 10%, if not run a check if it exceeds size. Then if it does, attempt to count amount of tx in lowest milestone, x650 bytes(average size of 1.000.000 0-value tx + 10%) and if its not enough yet, add up the milestone with the index +1, loop untill size estimation is reached.
Since it is very hard to estimate I offer a little change to:
When we do your proposed "best" solution, but make it even simpler. Have less smart counting and more dumb logic. E.g. if we are close enough to X (can be measured in GB) just prune a predetermined # of milestones.
As long as we are in the range between max and min it should be fine.