Potential array-out-of-bound runtime error in get_attesting_indices #2500

booleanfunction · 2021-06-30T00:50:04Z

What the problem is and where it is located

The following relates to eth2.0-specs/specs/phase0/beacon-chain.md.

It appears that the committee created in get_attesting_indices can exceed length of MAX_VALIDATORS_PER_COMMITTEE and hence this will cause an array-out-of-bound runtime error when accessing bits to return the set of attesting indices. The problem occurs when the number of active validators exceeds 2**22 = 4,194,304.

I would note that strictly speaking, the SSZ definition of a bitlist means that |bits| <= MAX_VALIDATORS_PER_COMMITTEE and so a more accurate requirement for get_attesting_indices is that:

|committee| <= |bits| <= MAX_VALIDATORS_PER_COMMITTEE (i.e.2048)

however, the purpose of this analysis is to understand when
|committee| > MAX_VALIDATORS_PER_COMMITTEE.

How was the problem discovered

The problem was discovered while using Dafny to formally verify the Beacon Chain Phase 0 spec (eth2.0-dafny).

As I think the issue can be difficult to detect, particularly with the use of floor division and the dependency of parameter calculations between functions, I thought it would be useful to step through the calculations involved, which I do below.

Results of the analysis

The key results are that:

when 32 <= number of active validators <= 4,194,304, the length of committee will range between 1 and MAX_VALIDATORS_PER_COMMITTEE (inclusive) for all data.slot and data.index combinations
when 4,194,304 < number of active validators < 4,196,352, committee will have length greater than MAX_VALIDATORS_PER_COMMITTEE for at least one data.slot and data.index combination
when 4,196,352 <= number of active validators, committee will have length greater than MAX_VALIDATORS_PER_COMMITTEE for all data.slot and data.index combinations

And for completeness, please note that:

when number of active validators < 32, committee will have length 0 for some data.slot and data.index combinations

Possible solution

If in get_attesting_indiceswe assume that |bits| = MAX_VALIDATORS_PER_COMMITTEE then a basic solution is to modify compute_committee such that:

end = min(start + MAX_VALIDATORS_PER_COMMITTEE, (len(indices) * (index+1)) // count).

Detailed analysis

Context

get_attesting_indices is used as part of the rewards and penalties, justification and finalisation, as well as process_attestion.

For context I thought it was useful to show the function call paths.

Tracing the problem

To determine how the length of committee is calculated we must look at get_beacon_committee.

The output of get_beacon_committee is then determined by the call to compute_committee.

If we start with the return statement in compute_committee, the returned sequence will have a length determined by range(start, end). Specifically, range(start, end) will yield a sequence of length end-start and hence:

|compute_committee(indices, seed, index, count)| 
= end-start
= (len(indices) * (index+1)) // count - (len(indices) * index) // count

When compute_committee is called from get_beacon_committee we see that:

indices = get_active_validator_indices
seed = get_seed
index = (slot % SLOTS_PER_EPOCH) * committees_per_slot + index
count = committees_per_slot * SLOTS_PER_EPOCH

(Note that the index parameter in get_beacon_committee is different to the index parameter in compute_committee).

Making these substitutions, at the get_beacon_committee level we determine end and start as follows:

end = (len(get_active_validator_indices) * (((slot % SLOTS_PER_EPOCH) * committees_per_slot + index)+1)) // (committees_per_slot * SLOTS_PER_EPOCH)
start = (len(get_active_validator_indices) * ((slot % SLOTS_PER_EPOCH) * committees_per_slot + index)) // (committees_per_slot * SLOTS_PER_EPOCH)

This is also a good time to mention that committees_per_slot is determined by a call to get_committees_per_slot and is calculated as follows:

committees_per_slot = max(1, min(MAX_COMMITTEES_PER_SLOT, len(get_active_validator_indices) // SLOTS_PER_EPOCH // TARGET_COMMITTEE_SIZE))

committees_per_slot is therefore a function of the number of active validators and 0 < committees_per_slot <= 64.

We could now substitute the formula for committees_per_slot into end and start but for the purpose of this analysis we can conclude that end - start is a function of:

len(get_active_validator_indices),
slot % SLOTS_PER_EPOCH, and
index (i.e. the committee index parameter of get_beacon_committee).

It is also useful to note the following bounds on these values:

0 <= slot % SLOTS_PER_EPOCH < 32
0 <= index < committees_per_slot

Putting all of this together, when get_beacon_committee(state, data.slot, data.index) is called from get_attesting_indices for a state of a particular number of active validators, it can be done so for multiple (data.slot,data.index) combinations according to the above (slot % SLOTS_PER_EPOCH,index) bounds.

Further in relation to the index bounds, as committees_per_slot depends on the number of activate validators, the range of index also depends on the number of active validators. Once the number of active validators is sufficiently large, there will be a maximum of 32 * 64 = 2048 of these (slot % SLOTS_PER_EPOCH,index) combinations to consider for a state with a particular number of active validators.

To determine for which (len(get_active_validator_indices), slot % SLOTS_PER_EPOCH,index) combinations we find that end - start > MAX_VALIDATORS_PER_COMMITTEE we need to analyse the above formulas for end and start. Of course, the use of floor division (i.e. //) makes this more difficult than regular division and hence here I will just show the overall results but I will include several specific examples where a bound breach occurs (i.e. where end - start > MAX_VALIDATORS_PER_COMMITTEE).

I thought some visualisations might assist.

Let's firstly consider how many committees_per_slot there will be for 0 < len(get_active_validator_indices) < 4,250,000.

We see that once there are 2**18=262,144 or more active validators, committees_per_slot will reach its upper bound of 64.

The value of committees_per_slot consequently determines the number of (slot % SLOTS_PER_EPOCH,index) combinations to consider for a state with a particular number of active validators.

In the next graph we plot the number of (slot % SLOTS_PER_EPOCH,index) combinations, as well as how many of these input combinations will produce end - start > MAX_VALIDATORS_PER_COMMITTEE and hence ultimately how many |committee| > MAX_VALIDATORS_PER_COMMITTEE breaches occur at a given number of active validators.

We observe that the number of (slot % SLOTS_PER_EPOCH,index) combinations reaches its maximum of 32 * 64 = 2048 at 2**18=262144 active validators (i.e. as is expected given the Figure 1 in which committees_per_slot reaches its maximum at that number of active validators).

We can also observe that the number of these combinations where |committee| = end - start > MAX_VALIDATORS_PER_COMMITTEE is zero until the number of active validators exceeds 2**22 = 4,194,304.

If we focus on this particular area of the graph, we see:

Hence, once the number of active validators exceeds 2**22 = 4,194,304 we can observe that there are (slot % SLOTS_PER_EPOCH,index) combinations such that |committee| > MAX_VALIDATORS_PER_COMMITTEE. Further, once the number of active validators reaches 2**22 + 2**11 = 4,196,352 we can observe that all (slot % SLOTS_PER_EPOCH,index) combinations will result in |committee| > MAX_VALIDATORS_PER_COMMITTEE.

Examples

We can now look at a couple of examples to see which (slot % SLOTS_PER_EPOCH,index) combinations cause the problem for a set number of active validators.

For example, when the number of active validators is 4,194,305 we can see that (slot % SLOTS_PER_EPOCH=31,index=63) will be the only committee such that |committee| > MAX_VALIDATORS_PER_COMMITTEE. i.e. as indicated by the red block.

Where as when the number of active validators is for example 4,194,355, we can see that numerous (slot % SLOTS_PER_EPOCH,index) combinations result in |committee| > MAX_VALIDATORS_PER_COMMITTEE. i.e. as indicated by the red blocks.

I was able to prove the generalised results using formal verification in Dafny (eth2.0-dafny) but to create the visualisations shown here I simply created a Jupyter notebook so as to explore the equations numerically.

The text was updated successfully, but these errors were encountered:

djrtwo · 2021-07-06T16:07:23Z

This is a known bound chosen and based on the instantiation of the beacon chain on mainnet Ethereum, assuming a maximum suppy of ~130M ETH and thus a maximum number of active validators.

This is noted briefly in the annotated spec under MAX_VALIDATORS_PER_COMMITTEE.

Two quick comments:

if there is a simple way to protect against this edge case, we can soft for it it without any issue. That said, any even moderate complexity on this edge is likely not worth it today given the bounds of mainnet ETH
The most dangerous thing is likely if these values get naively parroted to another network instantiation that has different minimum ETH requirements or a different ETH supply. OR if ethereum beacon chain mainnet was later upgraded to do a validator split to require much less ETH per validator which could unexpectedly hit this threshold. At the very least this should be well documented somewhere.

[side note: there are many configuration relationships and details that are not well documented and would be weird foot-guns if trying to change configs for other networks. there is likely a whole project here on documenting these relationships]

hwwhww added the scope:security General protocol security-related items label Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential array-out-of-bound runtime error in get_attesting_indices #2500

Potential array-out-of-bound runtime error in get_attesting_indices #2500

booleanfunction commented Jun 30, 2021

djrtwo commented Jul 6, 2021

Navigation Menu

Potential array-out-of-bound runtime error in get_attesting_indices #2500

Potential array-out-of-bound runtime error in get_attesting_indices #2500

Comments

booleanfunction commented Jun 30, 2021

What the problem is and where it is located

How was the problem discovered

Results of the analysis

Possible solution

Detailed analysis

Context

Tracing the problem

Examples

djrtwo commented Jul 6, 2021