Skip to content

Big Server Performance

Lucas Betschart edited this page Feb 8, 2018 · 37 revisions

See also Small Server Performance.

Platform

The block data below was derived from the following platform.

Ubuntu 14.04 LTS (64 bit)
2.10 GHz (32 cores, 64 with HT)
512 GB SSD, 256 GB RAM
300 Mb/s Internet
Bitcoin Node v3.0.0

Configuration

Default configuration was used with the exception of the following settings.

blockchain.use_libconsensus = true
node.byte_fee_satoshis = 10

Block Data

From the console (preambles removed), see interpretation.

Block [458008] 1841 txs 5388 ins    0 wms  215 vms   40 vµs    2 rµs    5 cµs   21 pµs   11 aµs    1 sµs    5 dµs 1.000000
Block [458009] 1081 txs 5103 ins    0 wms  179 vms   35 vµs    2 rµs    4 cµs   17 pµs   11 aµs    1 sµs    4 dµs 0.999074
Block [458010] 1132 txs 3760 ins    0 wms  155 vms   41 vµs    2 rµs    5 cµs   18 pµs   11 aµs    6 sµs    3 dµs 0.960212
Block [458011] 2437 txs 4113 ins    0 wms  187 vms   45 vµs    1 rµs    5 cµs   21 pµs   16 aµs    2 sµs    6 dµs 0.999590
Block [458012] 1932 txs 4211 ins    0 wms  200 vms   47 vµs    2 rµs    6 cµs   20 pµs   15 aµs    4 sµs    5 dµs 0.992232
Block [458013] 2843 txs 4377 ins    0 wms  203 vms   46 vµs    3 rµs    6 cµs   20 pµs   15 aµs    2 sµs    6 dµs 1.000000
Block [458014] 2313 txs 4318 ins    0 wms  200 vms   46 vµs    3 rµs    5 cµs   21 pµs   15 aµs    3 sµs    6 dµs 0.996972
Block [458015] 1635 txs 4257 ins    0 wms  174 vms   41 vµs    2 rµs    4 cµs   20 pµs   13 aµs    1 sµs    7 dµs 0.985924
Block [458016] 2556 txs 4398 ins    0 wms  199 vms   45 vµs    3 rµs    5 cµs   20 pµs   15 aµs    2 sµs    5 dµs 1.000000
Block [458017] 2753 txs 4095 ins    0 wms  198 vms   48 vµs    2 rµs    6 cµs   21 pµs   17 aµs    2 sµs    5 dµs 1.000000

Transformation

The block data can be transformed into a table and averaged as follows.

block txs ins wms vms vµs rµs cµs pµs aµs sµs dµs efficiency
458008 1841 5388 0 215 40 2 5 21 11 1 5 1.000000
458009 1081 5103 0 179 35 2 4 17 11 1 4 0.999074
458010 1132 3760 0 155 41 2 5 18 11 6 3 0.960212
458011 2437 4113 0 187 45 1 5 21 16 2 6 0.999590
458012 1932 4211 0 200 47 2 6 20 15 4 5 0.992232
458013 2843 4377 0 203 46 3 6 20 15 2 6 1.000000
458014 2313 4318 0 200 46 3 5 21 15 3 6 0.996972
458015 1635 4257 0 174 41 2 4 20 13 1 7 0.985924
458016 2556 4398 0 199 45 3 5 20 15 2 5 1.000000
458017 2753 4095 0 198 48 2 6 21 17 2 5 1.000000
average 2052 4402 0 191 43 2 5 20 14 2 5 0.993400

Summary

The example shows an average of 191 ms (43 µs per input) to validate a typical block with an additional 22 ms to store it. Storage costs for Bitcoin Server are moderately higher given additional indexing.

Data population (pµs) and deposit (dµs) costs are roughly half those of the small server. These are affected by both the number of cores and the amount of RAM.

Big and small server performance is very similar in the aspects of deserialization (rµs) and block check (cµs). These operations are not parallelized and as such are not improved by multiple cores. These are also context free and as such do not benefit from the improved memory-mapped file performance associated with increased RAM.

Notice that blocks 458010, 458012 and 458014 have higher than average script costs, and more so for the small server. This is the result of lower efficiency causing more scripts to be validated for these blocks than others. In the small server example the inefficiencies are nearly identical to those of the big server, however the script costs are roughly four times higher for these blocks due to the lower core count.

In the case of block accept (aµs) the small server significantly outperforms the big server. This is the result of excessive parallelism. The cost of fan-out is higher than the savings from parallelism. These checks do not hit the store and are not computationally costly. There is some benefit to parallelization, but diminished returns are achieved well before 64 cores. Similarly script cost for the small server is zero for 100% efficiency blocks, besting the big server performance for the same reason.

Configuring a limit of blockchain.cores = 32 can significantly improve performance on systems with more than 32 cores (by preventing excess parallelism). The big server averages 150 ms per block with that additional constraint.

Reducing the configured value of node.minimum_byte_fee_satoshis will increase efficiency in exchange for an increase in disk space. A zero value is most efficient but exposes the node to a cost-free disk exhaustion attack.

Configuration of blockchain.use_libconsensus = true generally increases script validation cost. Given the very low level of abstraction in comparison to libbitcoin native consensus checks, the opposite might be expected. However the libbitcoinconsensus library interface requires that each transaction and previous output be passed in serialized form.

Futures

  • Excess parallelism will be tuned in future releases, eliminating at least 11 µs per input.
  • Introduction of a transaction index will amortize most of the remaining block accept (aµs) and data population (pµs) costs, saving 3 µs and 19 µs per input respectively.
  • Integration of compact blocks will eliminate most of the redundant deserialization (rµs) cost, saving 2 µs per input.
  • The above will shave a total of 35 µs per input from the big server, leaving 8 µs per input.
  • This reduces total block validation for the big server to 35 ms (58 ms including storage), and for the small server to 70 ms (120 ms including storage).
  • With 100% efficiency block validation time will be reduced to 22 ms for the big server and 26 ms for the small.
Clone this wiki locally