Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atlas fixes #2453

Merged
merged 54 commits into from
Mar 23, 2021
Merged

Atlas fixes #2453

merged 54 commits into from
Mar 23, 2021

Conversation

lgalabru
Copy link
Member

Addressing #2428, #2427 and #2425.
I'm still working on some tooling for #2142, which I believe could help revealing unexpected situations.

src/net/atlas/download.rs Outdated Show resolved Hide resolved
@lgalabru
Copy link
Member Author

lgalabru commented Feb 26, 2021

Pretty happy with the result I'm seeing on my local setup, simulating 2000 users pre-ordering + registering names, so a total of 6000 transactions being dispatched on 10 follower nodes and 1 miner, involving attachments replication.
Will keep conducting test, but happy to start getting some reviews.

@lgalabru lgalabru requested a review from kantai February 26, 2021 00:24
@lgalabru lgalabru marked this pull request as ready for review February 26, 2021 00:25
@lgalabru lgalabru self-assigned this Feb 26, 2021
@jcnelson
Copy link
Member

jcnelson commented Mar 8, 2021

Notes for performance tests:

  • Verify that the time taken per attachment request doesn't grow unbound (and is under 100ms)
  • Verify that there are no event ID leaks
  • Prometheus data

@lgalabru
Copy link
Member Author

lgalabru commented Mar 9, 2021

I've spent some time benchmarking performances on my local machine, and with:
1 miner, 9 followers, 2000 attachments (requiring 6000 transactions), I'm getting the following prometheus data from the miner:

# HELP stacks_node_active_miners_total Total number of active miners.
# TYPE stacks_node_active_miners_total gauge
stacks_node_active_miners_total{handler="all"} 1
# HELP stacks_node_btc_blocks_received_total Total number of blocks processed from the burnchain.
# TYPE stacks_node_btc_blocks_received_total counter
stacks_node_btc_blocks_received_total{handler="all"} 1916
# HELP stacks_node_btc_ops_sent_total Total number of ops (key registrations, block commits, user burn supports) submitted to the burnchain.
# TYPE stacks_node_btc_ops_sent_total counter
stacks_node_btc_ops_sent_total{handler="all"} 19
# HELP stacks_node_mempool_outstanding_txs Number of still-unprocessed transactions received by this node since it started
# TYPE stacks_node_mempool_outstanding_txs gauge
stacks_node_mempool_outstanding_txs{handler="all"} 0
# HELP stacks_node_mempool_tx_confirm_times Time (seconds) between when a tx was received by this node's mempool and when a tx was first processed in a block
# TYPE stacks_node_mempool_tx_confirm_times histogram
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="300"} 2083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="600"} 2083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="900"} 3775
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="1200"} 3775
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="1500"} 5462
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="1800"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="2100"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="2400"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="2700"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="3000"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="3600"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="4200"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="4800"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="6000"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="+Inf"} 6083
stacks_node_mempool_tx_confirm_times_sum{handler="all"} 4687816
stacks_node_mempool_tx_confirm_times_count{handler="all"} 6083
# HELP stacks_node_p2p_msg_authenticated_handshake_received_total Total number of authenticated Handshake messages received.
# TYPE stacks_node_p2p_msg_authenticated_handshake_received_total counter
stacks_node_p2p_msg_authenticated_handshake_received_total{handler="all"} 876
# HELP stacks_node_p2p_msg_get_blocks_inv_received_total Total number of GetBlocksInv messages received.
# TYPE stacks_node_p2p_msg_get_blocks_inv_received_total counter
stacks_node_p2p_msg_get_blocks_inv_received_total{handler="all"} 1841
# HELP stacks_node_p2p_msg_get_neighbors_received_total Total number of GetNeighbors messages received.
# TYPE stacks_node_p2p_msg_get_neighbors_received_total counter
stacks_node_p2p_msg_get_neighbors_received_total{handler="all"} 5847
# HELP stacks_node_p2p_msg_nat_punch_request_received_total Total number of NatPunchRequest messages received.
# TYPE stacks_node_p2p_msg_nat_punch_request_received_total counter
stacks_node_p2p_msg_nat_punch_request_received_total{handler="all"} 9
# HELP stacks_node_p2p_msg_unauthenticated_handshake_received_total Total number of authenticated Handshake messages received.
# TYPE stacks_node_p2p_msg_unauthenticated_handshake_received_total counter
stacks_node_p2p_msg_unauthenticated_handshake_received_total{handler="all"} 11
# HELP stacks_node_rpc_call_latencies_histogram Time (seconds) measuring RPC calls latency
# TYPE stacks_node_rpc_call_latencies_histogram histogram
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.005"} 315
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.01"} 315
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.025"} 315
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.05"} 315
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.1"} 315
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.25"} 315
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.5"} 315
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="1"} 315
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="2.5"} 315
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="5"} 315
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="10"} 315
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="+Inf"} 315
stacks_node_rpc_call_latencies_histogram_sum{path="/v2/attachments/:hash"} 0.019561402999999998
stacks_node_rpc_call_latencies_histogram_count{path="/v2/attachments/:hash"} 315
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.005"} 27
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.01"} 81
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.025"} 109
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.05"} 112
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.1"} 113
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.25"} 113
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.5"} 113
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="1"} 113
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="2.5"} 113
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="5"} 113
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="10"} 113
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="+Inf"} 113
stacks_node_rpc_call_latencies_histogram_sum{path="/v2/attachments/inv"} 1.061132588
stacks_node_rpc_call_latencies_histogram_count{path="/v2/attachments/inv"} 113
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.005"} 68
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.01"} 68
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.025"} 68
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.05"} 68
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.1"} 68
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.25"} 68
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.5"} 68
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="1"} 68
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="2.5"} 68
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="5"} 68
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="10"} 68
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="+Inf"} 68
stacks_node_rpc_call_latencies_histogram_sum{path="/v2/blocks/:hash"} 0.0022735649999999995
stacks_node_rpc_call_latencies_histogram_count{path="/v2/blocks/:hash"} 68
# HELP stacks_node_rpc_requests_total Total number of RPC requests made.
# TYPE stacks_node_rpc_requests_total counter
stacks_node_rpc_requests_total{handler="all"} 496
# HELP stacks_node_stx_blocks_mined_total Total number of stacks blocks mined by node.
# TYPE stacks_node_stx_blocks_mined_total counter
stacks_node_stx_blocks_mined_total{handler="all"} 11
# HELP stacks_node_stx_blocks_processed_total Total number of stacks blocks processed.
# TYPE stacks_node_stx_blocks_processed_total counter
stacks_node_stx_blocks_processed_total{handler="all"} 11
# HELP stacks_node_stx_blocks_served_total Total number of Stacks blocks served.
# TYPE stacks_node_stx_blocks_served_total counter
stacks_node_stx_blocks_served_total{handler="all"} 68
# HELP stacks_node_transactions_received_total Total number of transactions received and relayed.
# TYPE stacks_node_transactions_received_total counter
stacks_node_transactions_received_total{handler="all"} 48416

@lgalabru
Copy link
Member Author

lgalabru commented Mar 9, 2021

Same setup and same miner, after killing and restarting the 9 followers a few times:

# HELP stacks_node_active_miners_total Total number of active miners.
# TYPE stacks_node_active_miners_total gauge
stacks_node_active_miners_total{handler="all"} 1
# HELP stacks_node_btc_blocks_received_total Total number of blocks processed from the burnchain.
# TYPE stacks_node_btc_blocks_received_total counter
stacks_node_btc_blocks_received_total{handler="all"} 3314
# HELP stacks_node_btc_ops_sent_total Total number of ops (key registrations, block commits, user burn supports) submitted to the burnchain.
# TYPE stacks_node_btc_ops_sent_total counter
stacks_node_btc_ops_sent_total{handler="all"} 49
# HELP stacks_node_mempool_outstanding_txs Number of still-unprocessed transactions received by this node since it started
# TYPE stacks_node_mempool_outstanding_txs gauge
stacks_node_mempool_outstanding_txs{handler="all"} 0
# HELP stacks_node_mempool_tx_confirm_times Time (seconds) between when a tx was received by this node's mempool and when a tx was first processed in a block
# TYPE stacks_node_mempool_tx_confirm_times histogram
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="300"} 2083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="600"} 2083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="900"} 3775
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="1200"} 3775
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="1500"} 5462
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="1800"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="2100"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="2400"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="2700"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="3000"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="3600"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="4200"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="4800"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="6000"} 6083
stacks_node_mempool_tx_confirm_times_bucket{handler="all",le="+Inf"} 6083
stacks_node_mempool_tx_confirm_times_sum{handler="all"} 4687816
stacks_node_mempool_tx_confirm_times_count{handler="all"} 6083
# HELP stacks_node_p2p_msg_authenticated_handshake_received_total Total number of authenticated Handshake messages received.
# TYPE stacks_node_p2p_msg_authenticated_handshake_received_total counter
stacks_node_p2p_msg_authenticated_handshake_received_total{handler="all"} 3254
# HELP stacks_node_p2p_msg_get_blocks_inv_received_total Total number of GetBlocksInv messages received.
# TYPE stacks_node_p2p_msg_get_blocks_inv_received_total counter
stacks_node_p2p_msg_get_blocks_inv_received_total{handler="all"} 13299
# HELP stacks_node_p2p_msg_get_neighbors_received_total Total number of GetNeighbors messages received.
# TYPE stacks_node_p2p_msg_get_neighbors_received_total counter
stacks_node_p2p_msg_get_neighbors_received_total{handler="all"} 25494
# HELP stacks_node_p2p_msg_nat_punch_request_received_total Total number of NatPunchRequest messages received.
# TYPE stacks_node_p2p_msg_nat_punch_request_received_total counter
stacks_node_p2p_msg_nat_punch_request_received_total{handler="all"} 70
# HELP stacks_node_p2p_msg_unauthenticated_handshake_received_total Total number of authenticated Handshake messages received.
# TYPE stacks_node_p2p_msg_unauthenticated_handshake_received_total counter
stacks_node_p2p_msg_unauthenticated_handshake_received_total{handler="all"} 30
# HELP stacks_node_rpc_call_latencies_histogram Time (seconds) measuring RPC calls latency
# TYPE stacks_node_rpc_call_latencies_histogram histogram
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.005"} 37995
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.01"} 38023
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.025"} 38035
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.05"} 38041
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.1"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.25"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.5"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="1"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="2.5"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="5"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="10"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="+Inf"} 38042
stacks_node_rpc_call_latencies_histogram_sum{path="/v2/attachments/:hash"} 2.92804312500001
stacks_node_rpc_call_latencies_histogram_count{path="/v2/attachments/:hash"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.005"} 46
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.01"} 154
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.025"} 212
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.05"} 216
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.1"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.25"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.5"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="1"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="2.5"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="5"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="10"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="+Inf"} 219
stacks_node_rpc_call_latencies_histogram_sum{path="/v2/attachments/inv"} 2.057386508
stacks_node_rpc_call_latencies_histogram_count{path="/v2/attachments/inv"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.005"} 754
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.01"} 754
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.025"} 754
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.05"} 754
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.1"} 754
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.25"} 754
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="0.5"} 754
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="1"} 754
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="2.5"} 754
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="5"} 754
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="10"} 754
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/blocks/:hash",le="+Inf"} 754
stacks_node_rpc_call_latencies_histogram_sum{path="/v2/blocks/:hash"} 0.02572605699999997
stacks_node_rpc_call_latencies_histogram_count{path="/v2/blocks/:hash"} 754
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/microblocks/confirmed/:hash",le="0.005"} 19
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/microblocks/confirmed/:hash",le="0.01"} 19
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/microblocks/confirmed/:hash",le="0.025"} 19
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/microblocks/confirmed/:hash",le="0.05"} 19
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/microblocks/confirmed/:hash",le="0.1"} 19
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/microblocks/confirmed/:hash",le="0.25"} 19
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/microblocks/confirmed/:hash",le="0.5"} 19
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/microblocks/confirmed/:hash",le="1"} 19
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/microblocks/confirmed/:hash",le="2.5"} 19
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/microblocks/confirmed/:hash",le="5"} 19
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/microblocks/confirmed/:hash",le="10"} 19
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/microblocks/confirmed/:hash",le="+Inf"} 19
stacks_node_rpc_call_latencies_histogram_sum{path="/v2/microblocks/confirmed/:hash"} 0.0033207849999999997
stacks_node_rpc_call_latencies_histogram_count{path="/v2/microblocks/confirmed/:hash"} 19
# HELP stacks_node_rpc_requests_total Total number of RPC requests made.
# TYPE stacks_node_rpc_requests_total counter
stacks_node_rpc_requests_total{handler="all"} 39034
# HELP stacks_node_stx_blocks_mined_total Total number of stacks blocks mined by node.
# TYPE stacks_node_stx_blocks_mined_total counter
stacks_node_stx_blocks_mined_total{handler="all"} 41
# HELP stacks_node_stx_blocks_processed_total Total number of stacks blocks processed.
# TYPE stacks_node_stx_blocks_processed_total counter
stacks_node_stx_blocks_processed_total{handler="all"} 41
# HELP stacks_node_stx_blocks_served_total Total number of Stacks blocks served.
# TYPE stacks_node_stx_blocks_served_total counter
stacks_node_stx_blocks_served_total{handler="all"} 754
# HELP stacks_node_stx_confirmed_micro_blocks_served_total Total number of Stacks blocks served.
# TYPE stacks_node_stx_confirmed_micro_blocks_served_total counter
stacks_node_stx_confirmed_micro_blocks_served_total{handler="all"} 19
# HELP stacks_node_transactions_received_total Total number of transactions received and relayed.
# TYPE stacks_node_transactions_received_total counter
stacks_node_transactions_received_total{handler="all"} 48416

So zooming on the Atlas histograms:

stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.005"} 37995
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.01"} 38023
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.025"} 38035
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.05"} 38041
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.1"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.25"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="0.5"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="1"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="2.5"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="5"} 38042
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/:hash",le="10"} 38042

and

stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.005"} 46
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.01"} 154
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.025"} 212
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.05"} 216
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.1"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.25"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="0.5"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="1"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="2.5"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="5"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="10"} 219
stacks_node_rpc_call_latencies_histogram_bucket{path="/v2/attachments/inv",le="+Inf"} 219

We're meeting the acceptance criteria (no response time > 100ms):

  • /v2/attachments/inv: 219 requests, 212 handled in less than 0.025s
  • /v2/attachments/:hash: 38042 requests, 38041 handled in less than 0.05s

@jcnelson
Copy link
Member

jcnelson commented Mar 9, 2021

Thanks for gathering the data! How big were these inventories in these tests? Were they maximally sized -- i.e. is this data the worst-case possible performance?

@jcnelson
Copy link
Member

Got a panic while testing this. I stopped my testnet node and restarted it, and it immediately crashed with this trace:

Process abort due to thread panic
   0: stacks_node::main::{{closure}}
             at testnet/stacks-node/src/main.rs:57
   1: std::panicking::rust_panic_with_hook
   2: std::panicking::begin_panic_handler::{{closure}}
   3: std::sys_common::backtrace::__rust_end_short_backtrace
   4: rust_begin_unwind
   5: std::panicking::begin_panic_fmt
   6: blockstack_lib::net::atlas::download::AttachmentsBatch::track_attachment
   7: blockstack_lib::net::atlas::download::AttachmentsDownloader::enqueue_new_attachments
             at src/net/atlas/download.rs:220
   8: blockstack_lib::net::atlas::download::AttachmentsDownloader::run
             at src/net/atlas/download.rs:61
   9: blockstack_lib::net::p2p::PeerNetwork::do_attachment_downloads::{{closure}}
             at src/net/p2p.rs:3099
      blockstack_lib::net::p2p::PeerNetwork::with_attachments_downloader
             at src/net/p2p.rs:485
      blockstack_lib::net::p2p::PeerNetwork::do_attachment_downloads
             at src/net/p2p.rs:3095
      blockstack_lib::net::p2p::PeerNetwork::dispatch_network
             at src/net/p2p.rs:4111
  10: blockstack_lib::net::p2p::PeerNetwork::run
             at src/net/p2p.rs:4314
  11: stacks_node::neon_node::spawn_peer::{{closure}}
             at testnet/stacks-node/src/neon_node.rs:640
      std::sys_common::backtrace::__rust_begin_short_backtrace
             at /home/buildozer/aports/community/rust/src/rustc-1.47.0-src/library/std/src/sys_common/backtrace.rs:137
  12: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}
             at /home/buildozer/aports/community/rust/src/rustc-1.47.0-src/library/std/src/thread/mod.rs:458
      <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /home/buildozer/aports/community/rust/src/rustc-1.47.0-src/library/std/src/panic.rs:308
      std::panicking::try::do_call
             at /home/buildozer/aports/community/rust/src/rustc-1.47.0-src/library/std/src/panicking.rs:373
      std::panicking::try
             at /home/buildozer/aports/community/rust/src/rustc-1.47.0-src/library/std/src/panicking.rs:337
      std::panic::catch_unwind
             at /home/buildozer/aports/community/rust/src/rustc-1.47.0-src/library/std/src/panic.rs:379
      std::thread::Builder::spawn_unchecked::{{closure}}
             at /home/buildozer/aports/community/rust/src/rustc-1.47.0-src/library/std/src/thread/mod.rs:457
      core::ops::function::FnOnce::call_once{{vtable.shim}}
             at /home/buildozer/aports/community/rust/src/rustc-1.47.0-src/library/core/src/ops/function.rs:227
  13: std::sys::unix::thread::Thread::new::thread_start
  14: <unknown>

Dumping core for pid 14949

I did get this warning from Atlas just before the crash: WARN [1615452411.056211] [src/net/p2p.rs:4293] [p2p] Atlas: updating attachment inventory failed Not connected to peer network

@lgalabru
Copy link
Member Author

@jcnelson thanks for reporting, I think I see what's going on, this is with a working_dir set?

@jcnelson
Copy link
Member

Yes, there's a working_dir set.

@lgalabru
Copy link
Member Author

@jcnelson bug was fixed, the latest branch is doing a better job at dealing with graceful working dir recovery.

@jcnelson
Copy link
Member

jcnelson commented Mar 13, 2021

Found another potential problem -- Atlas attachment instances are inserted multiple times across node re-starts:

$ sqlite3 -header /tmp/stacks-chainstate-testnet/xenon/atlas.sqlite 'select attachment_index,block_height,tx_id from attachment_instances order by block_height'
attachment_index|block_height|tx_id
0|194|b96391199ec41457cb5ba8850d001e3ccc6e6ff43a8ab8b8a3d43ac872068916
0|194|b96391199ec41457cb5ba8850d001e3ccc6e6ff43a8ab8b8a3d43ac872068916
0|194|b96391199ec41457cb5ba8850d001e3ccc6e6ff43a8ab8b8a3d43ac872068916
0|194|b96391199ec41457cb5ba8850d001e3ccc6e6ff43a8ab8b8a3d43ac872068916
1|340|7ef022bb31ea4479dd315e5d3771854bc2e9748d050a0cafb69638c4b35637d9
1|340|7ef022bb31ea4479dd315e5d3771854bc2e9748d050a0cafb69638c4b35637d9
1|340|7ef022bb31ea4479dd315e5d3771854bc2e9748d050a0cafb69638c4b35637d9
2|514|d8ec73abc3031cb188783e265bb999405ceaa2f70e0140f99d2220423c209940
2|514|d8ec73abc3031cb188783e265bb999405ceaa2f70e0140f99d2220423c209940
2|514|d8ec73abc3031cb188783e265bb999405ceaa2f70e0140f99d2220423c209940
3|613|28ad691caf48ef9676ff5c4324b8c342c9a15ac62428b26ad4b2cb410b61cfba
3|613|28ad691caf48ef9676ff5c4324b8c342c9a15ac62428b26ad4b2cb410b61cfba
3|613|28ad691caf48ef9676ff5c4324b8c342c9a15ac62428b26ad4b2cb410b61cfba
4|2121|68f35741fbe3913618defba628d1f10fe2f1af3fa4d3b9f0c58787a5a1dc8b5c
4|2121|5228bc04a6530ed57d209bba0dbe369e47bd49be8165f90c33bafa6f8575977c
4|2121|68f35741fbe3913618defba628d1f10fe2f1af3fa4d3b9f0c58787a5a1dc8b5c
4|2121|5228bc04a6530ed57d209bba0dbe369e47bd49be8165f90c33bafa6f8575977c
4|2121|68f35741fbe3913618defba628d1f10fe2f1af3fa4d3b9f0c58787a5a1dc8b5c
4|2121|5228bc04a6530ed57d209bba0dbe369e47bd49be8165f90c33bafa6f8575977c

Is this supposed to happen?

Also, I'm running a testnet follower on Xenon, and it's not getting any attachments at all. Does the Xenon node have attachments to serve, or should I go and make some?

@jcnelson jcnelson self-requested a review March 23, 2021 17:29
Copy link
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Once this deploys to Xenon, let's register a bunch of names with attachments (e.g. on the order of 1,000) and verify that the network remains stable.

# Conflicts:
#	src/monitoring/mod.rs
#	src/monitoring/prometheus.rs
@lgalabru
Copy link
Member Author

@jcnelson Sounds good, will do.

@lgalabru lgalabru merged commit 75b57ec into develop Mar 23, 2021
@lgalabru lgalabru deleted the fix/atlas-fixes branch March 23, 2021 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants