New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffling tests #1

Open
mratsim opened this Issue Jul 23, 2018 · 3 comments

Comments

Projects
None yet
1 participant
@mratsim
Member

mratsim commented Jul 23, 2018

The cutoffs function description is the followinf:

An algorithm that we use to split up validators into groups at the start of every epoch, determining at what height they can make attestations and what shard they are making crosslinks for.

The schema:

image

A full Python implementation:

SHARD_COUNT = 1024
ETH_SUPPLY_CAP = 2**27
DEPOSIT_SIZE = 32
MAX_VALIDATOR_COUNT = ETH_SUPPLY_CAP // DEPOSIT_SIZE
EPOCH_LENGTH = 64
SLOT_DURATION = 8
MIN_COMMITTEE_SIZE = 128
END_EPOCH_GRACE_PERIOD = 8

def get_cutoffs(validator_count):
    height_cutoffs = [0]
    # EPOCH_LENGTH // phi
    cofactor = 39
    STANDARD_COMMITTEE_SIZE = MAX_VALIDATOR_COUNT // SHARD_COUNT
    # If there are not enough validators to fill a minimally
    # sized committee at every height, skip some heights
    if validator_count < EPOCH_LENGTH * MIN_COMMITTEE_SIZE:
        height_count = validator_count // MIN_COMMITTEE_SIZE or 1
        heights = [(i * cofactor) % EPOCH_LENGTH
                   for i in range(height_count)]
    # If there are enough validators, fill all the heights
    else:
        height_count = EPOCH_LENGTH
        heights = list(range(EPOCH_LENGTH))

    filled = 0
    for i in range(EPOCH_LENGTH - 1):
        if not i in heights:
            height_cutoffs.append(height_cutoffs[-1])
        else:
            filled += 1
            height_cutoffs.append(filled * validator_count // height_count)
    height_cutoffs.append(validator_count)

    # For the validators assigned to each height, split them up
    # into committees for different shards. Do not assign the
    # last END_EPOCH_GRACE_PERIOD heights in an epoch to any shards.
    shard_cutoffs = [0]
    for i in range(EPOCH_LENGTH - END_EPOCH_GRACE_PERIOD):
        size = height_cutoffs[i+1] - height_cutoffs[i]
        shards = (size + STANDARD_COMMITTEE_SIZE - 1) // STANDARD_COMMITTEE_SIZE
        pre = shard_cutoffs[-1]
        for j in range(1, shards+1):
            shard_cutoffs.append(pre + size * j // shards)

    return height_cutoffs, shard_cutoffs

(h, s) = get_cutoffs(16)

# Note: Python 2 because Apple/Mac ...
print height # [0, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16]
print len(height) # 65
print shard # [0, 16]

The Python implementation does not split like in the example
The example may be a simplification that shows the spirit of the spec but no sharding occurs because 16 validators is too small.

Nim implementation:

func getCutoffs*(validatorCount: int): tuple[height, shard: seq[int]] {.noInit.} =
## Split up validators into groups at the start of every epoch,
## determining at what height they can make attestations and what shard they are making crosslinks for
## Implementation should do the following: http://vitalik.ca/files/ShuffleAndAssign.png
## TODO: It doens't work.
result.height = @[0]
let cofactor = 39 # EpochLength / phi
const StandardCommitteeSize = MaxValidatorCount div ShardCount
var heightCount: int
var heights: seq[int]
if validatorCount < EpochLength * MinCommitteeSize:
# If there are not enough validators to fill a minimally
# sized committee at every height, skip some heights
heightCount = validatorCount div MinCommitteeSize or 1 # TODO div/or precedence ?
for i in 0 ..< heightCount:
heights.add (i * cofactor) mod EpochLength
else:
# If there are enough validators, fill all the heights
heightCount = EpochLength
heights = toSeq(0 ..< EpochLength)
var filled = 0
for i in 0 ..< EpochLength - 1:
if i notin heights: # TODO, this will be slow for seq, use intsets instead?
result.height.add result.height[^1]
else:
inc filled
result.height.add filled * validatorCount div heightCount
result.height.add validatorCount
# For the validators assigned to each height, split them up
# into committees for different shards. Do not assign the
# last END_EPOCH_GRACE_PERIOD heights in an epoch to any shards.
result.shard = @[0]
for i in 0 ..< EpochLength - EndEpochGracePeriod:
let
size = result.height[i+1] - result.height[i]
shards = (size + StandardCommitteeSize - 1) div StandardCommitteeSize
pre = result.shard[^1]
for j in 1 .. shards:
result.shard.add pre + size * j div shards

# to run from ./build
import ../beacon_chain/private/helpers
import ../beacon_chain/datatypes

let a = 16
var seed: Blake2_256_Digest

echo getShuffling(seed, a) # @[14, 9, 13, 10, 3, 6, 12, 1, 2, 0, 5, 15, 8, 7, 11, 4]
let (height, shard) = getCutoffs(a)

echo height # @[0, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16]
echo height.len # 65
echo shard # @[0, 16]

@mratsim mratsim changed the title from Cutoff split function spec or example issue to Shuffling tests Nov 26, 2018

@mratsim

This comment has been minimized.

Member

mratsim commented Nov 26, 2018

Renaming to shuffling tests as the specs changed significantly and cutoffs are now just shuffling.

See also comments in #20.

The most comprehensive shuffling tests are by Sigp: https://github.com/sigp/lighthouse/blob/ba548e49a52687a655c61b443b6835d79c6d4236/beacon_chain/validator_shuffling/src/shuffle.rs

@mratsim mratsim referenced this issue Nov 26, 2018

Merged

spec updates #20

@mratsim

This comment has been minimized.

Member

mratsim commented Nov 30, 2018

Also a topic for reference on the current shuffling algorithm design: ethereum/beacon_chain#57

@mratsim

This comment has been minimized.

Member

mratsim commented Dec 3, 2018

As of today it seems like besides us no one is in line with the specs, which states:

Specs

shuffle

def shuffle(values: List[Any], seed: Hash32) -> List[Any]:
    """
    Returns the shuffled ``values`` with ``seed`` as entropy.
    """
    values_count = len(values)

    # Entropy is consumed from the seed in 3-byte (24 bit) chunks.
    rand_bytes = 3
    # The highest possible result of the RNG.
    rand_max = 2 ** (rand_bytes * 8) - 1

    # The range of the RNG places an upper-bound on the size of the list that
    # may be shuffled. It is a logic error to supply an oversized list.
    assert values_count < rand_max

    output = [x for x in values]
    source = seed
    index = 0
    while index < values_count - 1:
        # Re-hash the `source` to obtain a new pattern of bytes.
        source = hash(source)
        # Iterate through the `source` bytes in 3-byte chunks.
        for position in range(0, 32 - (32 % rand_bytes), rand_bytes):
            # Determine the number of indices remaining in `values` and exit
            # once the last index is reached.
            remaining = values_count - index
            if remaining == 1:
                break

            # Read 3-bytes of `source` as a 24-bit big-endian integer.
            sample_from_source = int.from_bytes(source[position:position + rand_bytes], 'big')

            # Sample values greater than or equal to `sample_max` will cause
            # modulo bias when mapped into the `remaining` range.
            sample_max = rand_max - rand_max % remaining

            # Perform a swap if the consumed entropy will not cause modulo bias.
            if sample_from_source < sample_max:
                # Select a replacement index for the current index.
                replacement_position = (sample_from_source % remaining) + index
                # Swap the current index with the replacement index.
                output[index], output[replacement_position] = output[replacement_position], output[index]
                index += 1
            else:
                # The sample causes modulo bias. A new sample should be read.
                pass

    return output

split

def split(values: List[Any], split_count: int) -> List[Any]:
    """
    Splits ``values`` into ``split_count`` pieces.
    """
    list_length = len(values)
    return [
        values[(list_length * i // split_count): (list_length * (i + 1) // split_count)]
        for i in range(split_count)
    ]

clamp

def clamp(minval: int, maxval: int, x: int) -> int:
    """
    Clamps ``x`` between ``minval`` and ``maxval``.
    """
    if x <= minval:
        return minval
    elif x >= maxval:
        return maxval
    else:
        return x

get_new_shuffling

def get_new_shuffling(seed: Hash32,
                      validators: List[ValidatorRecord],
                      crosslinking_start_shard: int) -> List[List[ShardAndCommittee]]:
    """
    Shuffles ``validators`` into shard committees using ``seed`` as entropy.
    """
    active_validator_indices = get_active_validator_indices(validators)

    committees_per_slot = clamp(
        1,
        SHARD_COUNT // EPOCH_LENGTH,
        len(active_validator_indices) // EPOCH_LENGTH // TARGET_COMMITTEE_SIZE,
    )

    # Shuffle with seed
    shuffled_active_validator_indices = shuffle(active_validator_indices, seed)

    # Split the shuffled list into epoch_length pieces
    validators_per_slot = split(shuffled_active_validator_indices, EPOCH_LENGTH)

    output = []
    for slot, slot_indices in enumerate(validators_per_slot):
        # Split the shuffled list into committees_per_slot pieces
        shard_indices = split(slot_indices, committees_per_slot)

        shard_id_start = crosslinking_start_shard + slot * committees_per_slot

        shards_and_committees_for_slot = [
            ShardAndCommittee(
                shard=(shard_id_start + shard_position) % SHARD_COUNT,
                committee=indices,
                total_validator_count=len(active_validator_indices),
            )
            for shard_position, indices in enumerate(shard_indices)
        ]
        output.append(shards_and_committees_for_slot)

    return output

Testing

Is a bit tricky due to the random seed, the test vectors should be produced by a literal copy of the python specs with the random seed.

Our implementation

func shuffle*[T](values: seq[T], seed: Eth2Digest): seq[T] =
## Returns the shuffled ``values`` with seed as entropy.
## TODO: this calls out for tests, but I odn't particularly trust spec
## right now.
let values_count = values.len
const
# Entropy is consumed from the seed in 3-byte (24 bit) chunks.
rand_bytes = 3
# The highest possible result of the RNG.
rand_max = 2^(rand_bytes * 8) - 1
# The range of the RNG places an upper-bound on the size of the list that
# may be shuffled. It is a logic error to supply an oversized list.
assert values_count < rand_max
result = values
var
source = seed
index = 0
while index < values_count - 1:
# Re-hash the `source` to obtain a new pattern of bytes.
source = eth2hash source.data
# Iterate through the `source` bytes in 3-byte chunks.
for pos in countup(0, 29, 3):
let remaining = values_count - index
if remaining == 1:
break
# Read 3-bytes of `source` as a 24-bit big-endian integer.
let sample_from_source =
source.data[pos].Uint24 shl 16 or
source.data[pos+1].Uint24 shl 8 or
source.data[pos+2].Uint24
# Sample values greater than or equal to `sample_max` will cause
# modulo bias when mapped into the `remaining` range.
let sample_max = rand_max - rand_max mod remaining
# Perform a swap if the consumed entropy will not cause modulo bias.
if sample_from_source < sample_max:
# Select a replacement index for the current index.
let replacement_position = sample_from_source mod remaining + index
swap result[index], result[replacement_position]
inc index
func split*[T](lst: openArray[T], N: Positive): seq[seq[T]] =
## split lst in N pieces, with each piece having `len(lst) div N` or
## `len(lst) div N + 1` pieces
# TODO: implement as an iterator
result = newSeq[seq[T]](N)
for i in 0 ..< N:
result[i] = lst[lst.len * i div N ..< lst.len * (i+1) div N] # TODO: avoid alloc via toOpenArray

func get_new_shuffling*(seed: Eth2Digest,
validators: openArray[ValidatorRecord],
crosslinking_start_shard: int
): array[CYCLE_LENGTH, seq[ShardAndCommittee]] =
## Split up validators into groups at the start of every epoch,
## determining at what height they can make attestations and what shard they are making crosslinks for
## Implementation should do the following: http://vitalik.ca/files/ShuffleAndAssign.png
let
active_validators = get_active_validator_indices(validators)
committees_per_slot = clamp(
len(active_validators) div CYCLE_LENGTH div TARGET_COMMITTEE_SIZE,
1, SHARD_COUNT div CYCLE_LENGTH)
# Shuffle with seed
shuffled_active_validator_indices = shuffle(active_validators, seed)
# Split the shuffled list into cycle_length pieces
validators_per_slot = split(shuffled_active_validator_indices, CYCLE_LENGTH)
assert validators_per_slot.len() == CYCLE_LENGTH # what split should do..
for slot, slot_indices in validators_per_slot:
let
shard_indices = split(slot_indices, committees_per_slot)
shard_id_start = crosslinking_start_shard + slot * committees_per_slot
var committees = newSeq[ShardAndCommittee](shard_indices.len)
for shard_position, indices in shard_indices:
committees[shard_position].shard =
uint64(shard_id_start + shard_position) mod SHARD_COUNT
committees[shard_position].committee = indices
result[slot] = committees

Ethereum/beacon_chain repo

It is still using the old dynasty, and seems like development is now down in Py-EVM.

def shuffle(lst: List[Any],
            seed: Hash32,
            config: Dict[str, Any]=DEFAULT_CONFIG) -> List[Any]:
    lst_count = len(lst)
    assert lst_count <= 16777216
    o = [x for x in lst]
    source = seed
    i = 0
    while i < lst_count:
        source = blake(source)
        for pos in range(0, 30, 3):
            m = int.from_bytes(source[pos:pos+3], 'big')
            remaining = lst_count - i
            if remaining == 0:
                break
            rand_max = 16777216 - 16777216 % remaining
            if m < rand_max:
                replacement_pos = (m % remaining) + i
                o[i], o[replacement_pos] = o[replacement_pos], o[i]
                i += 1
    return o


def split(lst: List[Any], N: int) -> List[Any]:
    list_length = len(lst)
    return [
        lst[(list_length * i // N): (list_length * (i+1) // N)] for i in range(N)
    ]


def get_new_shuffling(seed: Hash32,
                      validators: List['ValidatorRecord'],
                      dynasty: int,
                      crosslinking_start_shard: ShardId,
                      config: Dict[str, Any]=DEFAULT_CONFIG) -> List[List[ShardAndCommittee]]:
    cycle_length = config['cycle_length']
    min_committee_size = config['min_committee_size']
    shard_count = config['shard_count']
    avs = get_active_validator_indices(dynasty, validators)
    if len(avs) >= cycle_length * min_committee_size:
        committees_per_slot = len(avs) // cycle_length // (min_committee_size * 2) + 1
        slots_per_committee = 1
    else:
        committees_per_slot = 1
        slots_per_committee = 1
        while (len(avs) * slots_per_committee < cycle_length * min_committee_size and
               slots_per_committee < cycle_length):
            slots_per_committee *= 2
    o = []

    shuffled_active_validator_indices = shuffle(avs, seed, config)
    validators_per_slot = split(shuffled_active_validator_indices, cycle_length)
    for slot, slot_indices in enumerate(validators_per_slot):
        shard_indices = split(slot_indices, committees_per_slot)
        shard_id_start = crosslinking_start_shard + (
            slot * committees_per_slot // slots_per_committee
        )
        o.append([ShardAndCommittee(
            shard_id=(shard_id_start + j) % shard_count,
            committee=indices
        ) for j, indices in enumerate(shard_indices)])
    return o

Py-EVM

also uses dynasty in get_new_shuffling

https://github.com/ethereum/py-evm/blob/f2d0d5d187400ba46a6b8f5b1f1c9997dc7fbb5a/eth/beacon/utils/random.py#L27-L86

def shuffle(values: Sequence[Any],
            seed: Hash32) -> Iterable[Any]:
    """
    Returns the shuffled ``values`` with seed as entropy.
    Mainly for shuffling active validators in-protocol.
    Spec: https://github.com/ethereum/eth2.0-specs/blob/0941d592de7546a428066c0473fd1000a7e3e3af/specs/beacon-chain.md#helper-functions  # noqa: E501
    """
    values_count = len(values)

    if values_count > SAMPLE_RANGE:
        raise ValueError(
            "values_count (%s) should less than or equal to SAMPLE_RANGE (%s)." %
            (values_count, SAMPLE_RANGE)
        )

    output = [x for x in values]
    source = seed
    index = 0
    while index < values_count:
        # Re-hash the source
        source = blake(source)
        for position in range(0, 30, 3):  # gets indices 3 bytes at a time
            # Select a 3-byte sampled int
            sample_from_source = int.from_bytes(source[position:position + 3], 'big')
            # `remaining` is the size of remaining indices of this round
            remaining = values_count - index
            if remaining == 0:
                break

            # Set a random maximum bound of sample_from_source
            rand_max = SAMPLE_RANGE - SAMPLE_RANGE % remaining

            # Select `replacement_position` with the given `sample_from_source` and `remaining`
            if sample_from_source < rand_max:
                # Use random number to get `replacement_position`, where it's not `index`
                replacement_position = (sample_from_source % remaining) + index
                # Swap the index-th and replacement_position-th elements
                (output[index], output[replacement_position]) = (
                    output[replacement_position],
                    output[index]
                )
                index += 1
            else:
                pass

    return output


@to_tuple
def split(seq: Sequence[TItem], pieces: int) -> Iterable[Any]:
    """
    Returns the split ``seq`` in ``pieces`` pieces in protocol.
    Spec: https://github.com/ethereum/eth2.0-specs/blob/0941d592de7546a428066c0473fd1000a7e3e3af/specs/beacon-chain.md#helper-functions  # noqa: E501
    """
    list_length = len(seq)
    return [
        seq[(list_length * i // pieces): (list_length * (i + 1) // pieces)]
        for i in range(pieces)
    ]

https://github.com/ethereum/py-evm/blob/f2d0d5d187400ba46a6b8f5b1f1c9997dc7fbb5a/eth/beacon/helpers.py#L272-L344

@to_tuple
def get_new_shuffling(*,
                      seed: Hash32,
                      validators: Sequence['ValidatorRecord'],
                      dynasty: int,
                      crosslinking_start_shard: int,
                      cycle_length: int,
                      min_committee_size: int,
                      shard_count: int) -> Iterable[Iterable[ShardAndCommittee]]:
    """
    Return shuffled ``shard_and_committee_for_slots`` (``[[ShardAndCommittee]]``) of
    the given active ``validators``.
    Two-dimensional:
    The first layer is ``slot`` number
        ``shard_and_committee_for_slots[slot] -> [ShardAndCommittee]``
    The second layer is ``shard_indices`` number
        ``shard_and_committee_for_slots[slot][shard_indices] -> ShardAndCommittee``
    Example:
        validators:
            [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
        After shuffling:
            [6, 0, 2, 12, 14, 8, 10, 4, 9, 1, 5, 13, 15, 7, 3, 11]
        Split by slot:
            [
                [6, 0, 2, 12, 14], [8, 10, 4, 9, 1], [5, 13, 15, 7, 3, 11]
            ]
        Split by shard:
            [
                [6, 0], [2, 12, 14], [8, 10], [4, 9, 1], [5, 13, 15] ,[7, 3, 11]
            ]
        Fill to output:
            [
                # slot 0
                [
                    ShardAndCommittee(shard_id=0, committee=[6, 0]),
                    ShardAndCommittee(shard_id=1, committee=[2, 12, 14]),
                ],
                # slot 1
                [
                    ShardAndCommittee(shard_id=2, committee=[8, 10]),
                    ShardAndCommittee(shard_id=3, committee=[4, 9, 1]),
                ],
                # slot 2
                [
                    ShardAndCommittee(shard_id=4, committee=[5, 13, 15]),
                    ShardAndCommittee(shard_id=5, committee=[7, 3, 11]),
                ],
            ]
    NOTE: The spec might be updated to output an array rather than an array of arrays.
    """
    active_validators = get_active_validator_indices(dynasty, validators)
    active_validators_size = len(active_validators)
    committees_per_slot = clamp(
        1,
        shard_count // cycle_length,
        active_validators_size // cycle_length // (min_committee_size * 2) + 1,
    )
    shuffled_active_validator_indices = shuffle(active_validators, seed)

    # Split the shuffled list into cycle_length pieces
    validators_per_slot = split(shuffled_active_validator_indices, cycle_length)
    for index, slot_indices in enumerate(validators_per_slot):
        # Split the shuffled list into committees_per_slot pieces
        shard_indices = split(slot_indices, committees_per_slot)
        shard_id_start = crosslinking_start_shard + index * committees_per_slot
        yield _get_shards_and_committees_for_shard_indices(
            shard_indices,
            shard_id_start,
            shard_count,
        )

Sigp

Sigp is still using the old min_committe_size instead of target committee size

https://github.com/sigp/lighthouse/blob/ba548e49a52687a655c61b443b6835d79c6d4236/beacon_chain/validator_shuffling/src/shuffle.rs

/// Given the validator list, delegates the validators into slots and comittees for a given cycle.
fn generate_cycle(
    validator_indices: &[usize],
    shard_indices: &[usize],
    crosslinking_shard_start: usize,
    cycle_length: usize,
    min_committee_size: usize,
) -> Result<DelegatedCycle, ValidatorAssignmentError> {
    let validator_count = validator_indices.len();
    let shard_count = shard_indices.len();

    if shard_count / cycle_length == 0 {
        return Err(ValidatorAssignmentError::TooFewShards);
    }

    let (committees_per_slot, slots_per_committee) = {
        if validator_count >= cycle_length * min_committee_size {
            let committees_per_slot = min(
                validator_count / cycle_length / (min_committee_size * 2) + 1,
                shard_count / cycle_length,
            );
            let slots_per_committee = 1;
            (committees_per_slot, slots_per_committee)
        } else {
            let committees_per_slot = 1;
            let mut slots_per_committee = 1;
            while (validator_count * slots_per_committee < cycle_length * min_committee_size)
                & (slots_per_committee < cycle_length)
            {
                slots_per_committee *= 2;
            }
            (committees_per_slot, slots_per_committee)
        }
    };

    let cycle = validator_indices
        .honey_badger_split(cycle_length)
        .enumerate()
        .map(|(i, slot_indices)| {
            let shard_start =
                crosslinking_shard_start + i * committees_per_slot / slots_per_committee;
            slot_indices
                .honey_badger_split(committees_per_slot)
                .enumerate()
                .map(|(j, shard_indices)| ShardAndCommittee {
                    shard: ((shard_start + j) % shard_count) as u16,
                    committee: shard_indices.to_vec(),
                }).collect()
        }).collect();
    Ok(cycle)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment