SIMD-0132: Dynamic Block Limits #132

cavemanloverboy · 2024-03-25T01:51:01Z

SIMD Proposal: Dynamic Block Limits

Summary

This proposal introduces dynamic adjustments to the compute unit (CU) limit of Solana blocks based on network utilization at the end of each epoch. If the average block utilization exceeds 75%, the CU limit will increase by 20%; if it falls below 25%, the limit will decrease by 20%. A second metric based on vote slot latency is used to preserve protocol liveness and responsive UX. This proposal aims to optimize network performance by adapting to demonstrated compute capacity and demand without centralized decisions about limits and without voting.Although the adjustment rate is arbitrary (and can be discussed in this PR), the block limit will be determined by the demonstrated capacity of the network without voting.

removed few things

blasrodri · 2024-03-25T02:10:38Z

Shouldn't this also have a mechanism to ensure that block producers are able to keep the pace during spike times?

cavemanloverboy · 2024-03-25T02:31:14Z

Shouldn't this also have a mechanism to ensure that block producers are able to keep the pace during spike times?

CU limits will only move up if the vast majority of the network is nearly filling blocks, and move down if the vast majority of the network is struggling to fill blocks. Outliers will either be throttled or pruned.

Tamgros · 2024-03-25T02:43:24Z

There are other considerations

Delinquency, the protocol should only increase if the delinquency rate is below x%. Ie, if the validators are already having some struggles, don't want to make it even harder to validate
If this update can happen every epoch, the CU changes can be a much smaller % and still see pretty significant movement in a short period.

I think it'd also be worth thinking about incentives long term. Validators want more tx fees, but they also have switching costs to hardware. This is why I think an incremental approach is better. It allows for validators to plan and see how their current setups are/aren't performing

cavemanloverboy · 2024-03-25T02:45:39Z

There are other considerations

Delinquency, the protocol should only increase if the delinquency rate is below x%. Ie, if the validators are already having some struggles, don't want to make it even harder to validate

If this update can happen every epoch, the CU changes can be a much smaller % and still see pretty significant movement in a short period.

I think it'd also be worth thinking about incentives long term. Validators want more tx fees, but they also have switching costs to hardware. This is why I think an incremental approach is better. It allows for validators to plan and see how their current setups are/aren't performing

Love the delinquency check. Gets at the previous concern about potatoes.
20% was arbitrary and chosen as conversation starter. can definitely go for something smaller.

7layermagik · 2024-03-25T03:14:51Z

There are other considerations

Delinquency, the protocol should only increase if the delinquency rate is below x%. Ie, if the validators are already having some struggles, don't want to make it even harder to validate

If this update can happen every epoch, the CU changes can be a much smaller % and still see pretty significant movement in a short period.

I think it'd also be worth thinking about incentives long term. Validators want more tx fees, but they also have switching costs to hardware. This is why I think an incremental approach is better. It allows for validators to plan and see how their current setups are/aren't performing

Using vote latency might be better than delinquency? Hopefully doesn't get to the point where you have high delinquency... if you have a target vote latency range, that measure will always be pretty directly tied to UX and confirmation time consistency. You could take the median vote latency of the top 80% of validators for example and just make sure that falls within a certain range. You can even use vote latency instead of cu's as a way to know when to increase or decrease block size. With timely vote credits, validators will already be searching for ways to improve their apy by improving vote latency

Skip rate is also something worth considering.

cavemanloverboy · 2024-03-25T03:32:26Z

Things that need to be specified more precisely that I was hoping to discuss:

Upper and lower bounds. The initial 48M CU limit seems like a natural suggestion for lower bound. Setting the upper bound too low renders this SIMD useless because a centralized/arbitrary decision will often need to be made to raise the upper bound. Perhaps the upper bound can instead be a max increase in some number of epochs, i.e. cannot be raised more than 2x within 10 epochs ≈ O(1 month)).
Whether per-account cu limits or other block parameters are to be included in this SIMD, or whether it is best to leave them for a future SIMD. I am in favor of the latter.

bji · 2024-03-25T05:46:50Z

Shouldn't this also have a mechanism to ensure that block producers are able to keep the pace during spike times?

CU limits will only move up if the vast majority of the network is nearly filling blocks, and move down if the vast majority of the network is struggling to fill blocks. Outliers will either be throttled or pruned.

Why does it make sense to adjust CU based on utilization? Shouldn't it be adjusted based on capacity? CU should always be set to the greatest that the network supports, not what happens to be being used.

cavemanloverboy · 2024-03-25T05:57:49Z

Shouldn't this also have a mechanism to ensure that block producers are able to keep the pace during spike times?

CU limits will only move up if the vast majority of the network is nearly filling blocks, and move down if the vast majority of the network is struggling to fill blocks. Outliers will either be throttled or pruned.

Why does it make sense to adjust CU based on utilization? Shouldn't it be adjusted based on capacity? CU should always be set to the greatest that the network supports, not what happens to be being used.

Utilization is both demonstrated capacity and demand. Increasing CU limits far beyond demonstrated capacity adds risk to the system because it opens a vector for validators to create fat blocks that the rest of the network may struggle to replay and which the network has not yet demonstrated it is capable of handling.

ripatel-fd · 2024-04-09T01:09:52Z

This proposal needs a lot more research. Do you have evidence that increasing limits without validator interaction won't just introduce irrecoverable instability and crash the network?

cavemanloverboy · 2024-04-09T03:29:40Z

This proposal needs a lot more research. Do you have evidence that increasing limits without validator interaction won't just introduce irrecoverable instability and crash the network?

What form of evidence would you like to see?

The mechanism is self-correcting. if the network demonstrates that it cannot keep up with a higher CU limit while preserving low latency for the entire supermajority, the CU limit decreases. If the blocks (whose schedule is sampled by stake-weight) are full and are replayed and voted on — and if the supermajority of the network is highly responsive — why would the network go down?

bw-solana · 2024-05-09T22:10:39Z

A couple of things:

I think using median (or some percentile, like OC %) would be better/safer as opposed to average. E.g. we move up the CU limit when 67% of the blocks were packed >=80%. I'm thinking of some diabolical case where half the cluster stake is super nodes and half is potatoes. The super nodes are packing 100% and the potatoes are struggling just to pack 60%. We average to 80% and move up the limits again. The potatoes die even more, skipped slots go wild, machines go delinquent, we don't have enough stake to confirm anything. Much RIP. If we use OC%ile, that should guarantee OC% of the stake is able to keep up (+/- some small std deviation).
We'll need some way of computing average/median CU cost per block for nodes that come online in the middle of epoch. This probably means adding some small amount of metadata to the snapshot

bw-solana · 2024-05-09T22:13:40Z

For testing, I bet we could start with just a local cluster running with shortened epochs(maybe 5 minutes) to prove out the idea. Set starting CUs to like 1M and see what terminal value is hit when spamming bench-tps. Then kill bench-tps and see what CUs fall to

0xSol · 2024-05-21T18:56:07Z

@cavemanloverboy to share testing outcome with metrics once ready

cavemanloverboy and others added 2 commits March 24, 2024 18:42

simd-0130

4158800

Update 0130-dynamic-block-limits.md

b019c22

removed few things

cavemanloverboy added 2 commits March 24, 2024 20:21

fix linting

8171707

update wording

7232945

cavemanloverboy added 3 commits March 24, 2024 21:57

motivate target utilization, add vote slot latency

ab53fe5

fix typos

1340ca6

fix typo

625dc3f

cavemanloverboy and others added 5 commits March 24, 2024 23:43

typo

ec2cb11

typos

36035a4

add SVSL to decrease and summarize criteria

a8cac06

improve structure

536e799

fix formatting

3bba4ce

CriesofCarrots changed the title ~~SIMD-0130: Dynamic Block Limits~~ SIMD-0132: Dynamic Block Limits May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD-0132: Dynamic Block Limits #132

SIMD-0132: Dynamic Block Limits #132

cavemanloverboy commented Mar 25, 2024 •

edited

Loading

blasrodri commented Mar 25, 2024

cavemanloverboy commented Mar 25, 2024

Tamgros commented Mar 25, 2024

cavemanloverboy commented Mar 25, 2024

7layermagik commented Mar 25, 2024 •

edited

Loading

cavemanloverboy commented Mar 25, 2024

bji commented Mar 25, 2024

cavemanloverboy commented Mar 25, 2024 •

edited

Loading

ripatel-fd commented Apr 9, 2024

cavemanloverboy commented Apr 9, 2024 •

edited

Loading

bw-solana commented May 9, 2024

bw-solana commented May 9, 2024

0xSol commented May 21, 2024

SIMD-0132: Dynamic Block Limits #132

Are you sure you want to change the base?

SIMD-0132: Dynamic Block Limits #132

Conversation

cavemanloverboy commented Mar 25, 2024 • edited Loading

SIMD Proposal: Dynamic Block Limits

Summary

blasrodri commented Mar 25, 2024

cavemanloverboy commented Mar 25, 2024

Tamgros commented Mar 25, 2024

cavemanloverboy commented Mar 25, 2024

7layermagik commented Mar 25, 2024 • edited Loading

cavemanloverboy commented Mar 25, 2024

bji commented Mar 25, 2024

cavemanloverboy commented Mar 25, 2024 • edited Loading

ripatel-fd commented Apr 9, 2024

cavemanloverboy commented Apr 9, 2024 • edited Loading

bw-solana commented May 9, 2024

bw-solana commented May 9, 2024

0xSol commented May 21, 2024

cavemanloverboy commented Mar 25, 2024 •

edited

Loading

7layermagik commented Mar 25, 2024 •

edited

Loading

cavemanloverboy commented Mar 25, 2024 •

edited

Loading

cavemanloverboy commented Apr 9, 2024 •

edited

Loading