Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug/fuzzing] Incorrect validation of pre-state attestation & malformed block signature during state transition (in pcli) #5658

Closed
pventuzelo opened this issue Apr 28, 2020 · 16 comments
Assignees
Labels
Enhancement New feature or request Priority: Low Low priority item

Comments

@pventuzelo
Copy link

pventuzelo commented Apr 28, 2020

馃悶 Bug Report

Description

During fuzzing with beacon-fuzz, I provided a set of invalid SSZ state and block to prysm using pcli tool.
Prysm consider both pre-state and block as valid and try to process the state transition anyway.

Other client/tool like lighthouse and zcli reject either the state or the block.

NOTES:

  • mainnet preset (work also with minimal)
  • In case of zcli, process panic but @protolambda told us it should because this case should never happened.
  • Quick output from @protolambda,
sounds like a badly formatted attestation. The attestation bitfield has N bits as determined by deserialization, but the state always needs some specific amount of bits to match the committee size
FilterParticipants does that sanity check in zrnt
and it's called for attestations IN the state, not the block.

Additional info

lighthouse (mainnet and minimal):

./lcli transition-blocks incorrect_ssz_validation_state_prysm.ssz incorrect_ssz_validation_block_prysm.ssz
2020-04-27 23:35:33,850 INFO  [lcli::transition_blocks] Using minimal spec
2020-04-27 23:35:33,850 INFO  [lcli::transition_blocks] Pre-state path: "incorrect_ssz_validation_state_prysm.ssz"
2020-04-27 23:35:33,850 INFO  [lcli::transition_blocks] Block path: "incorrect_ssz_validation_block_prysm.ssz"
Failed to run lcli: Failed to transition blocks: Ssz decode failed: BytesInvalid("Invalid Signature bytes: [186, 1, 17, 234, 51, 124, 150, 23, 84, 33, 184, 135, 189, 162, 39, 125, 121, 105, 76, 65, 49, 22, 197, 47, 148, 200, 37, 254, 10, 95, 164, 10, 43, 76, 146, 180, 219, 155, 194, 48, 19, 251, 198, 223, 156, 190, 197, 116, 188, 102, 187, 248, 0, 32, 158, 46, 109, 253, 193, 8, 161, 82, 81, 246, 215, 198, 36, 107, 127, 166, 8, 48, 99, 125, 8, 154, 157, 143, 233, 101, 207, 76, 100, 93, 138, 217, 193, 217, 196, 206, 24, 238, 77, 162, 41, 245]")

zcli (mainnet):

$ go get -tags preset_mainnet github.com/protolambda/zcli

$ ./bin/zcli transition blocks incorrect_ssz_validation_block_prysm.ssz -i incorrect_ssz_validation_state_prysm.ssz 
panic: committee mismatch, bitfield length does not match

馃敩 Minimal Reproduction

$ cd prysm

# build pcli tool
$ bazel build //tools/pcli:pcli
$ cp bazel-bin/tools/pcli/linux_amd64_stripped/pcli .

# timeout requiered otherwise you will need to kill manually the process
$ timeout -k 5 15  ./pcli -- state-transition --pre-state-path incorrect_ssz_validation_state_prysm.ssz --block-path incorrect_ssz_validation_block_prysm.ssz 
[2020-04-28 11:47:38]  INFO Performing state transition with a block root of 0x0af54b869712cdd5ca8569f8bdde228b8aa6c6b15edb8d28fef2ee1a7ba5d377 and pre state root of 0x2e8e47e652c946397ca090e6b7a907c0369a1bd6f86c2097e0204b40b90af291 blockSlot=1970324836982784 preStateSlot=95

Notes: Don't know if it is related yet, but blockSlot value is particularly hight.

馃實 Your Environment

branch: master
commit: e30349d

@nisdas
Copy link
Member

nisdas commented Apr 28, 2020

This seems like an issue where prysm is regarding an invalid ssz object as valid whereas lcli,zcli regard it as invalid an error accordingly.

zcli: sounds like a badly formatted attestation. The attestation bitfield has N bits as determined by deserialization, but the state always needs some specific amount of bits to match the committee size

lcli: Failed to run lcli: Failed to transition blocks: Ssz decode failed: BytesInvalid("Invalid Signature bytes: [186, 1, 17, 234, 51, 124, 150, 23, 84, 33, 184, 135, 189, 162, 39, 125, 121, 105, 76, 65, 49, 22, 197, 47, 148, 200, 37, 254, 10, 95, 164, 10, 43, 76, 146, 180, 219, 155, 194, 48, 19, 251, 198, 223, 156, 190, 197, 116, 188, 102, 187, 248, 0, 32, 158, 46, 109, 253, 193, 8, 161, 82, 81, 246, 215, 198, 36, 107, 127, 166, 8, 48, 99, 125, 8, 154, 157, 143, 233, 101, 207, 76, 100, 93, 138, 217, 193, 217, 196, 206, 24, 238, 77, 162, 41, 245]")

Whereas in both mainnet/minimal there were no ssz errors. That doesn't seem correct, we should fail when unmarshalling invalid ssz objects.

@nisdas nisdas added Bug Something isn't working Priority: High High priority item labels Apr 28, 2020
@pventuzelo
Copy link
Author

I've modify lighthouse lcli to use mainnet preset and result is the same.
Same from zcli in both mainnet and minimal.

Some new output from @protolambda:

The pre-state has a bad attestation, causing the committee indices to not match the bits. And lighthouse instead fails earlier, because of a bad signature in the block. (Zrnt only parses that right before verification, not before doing any transition at all)
So zrnt and lighthouse are not contradictory, it's just that both the state and the block are bad

@protolambda
Copy link

It's not the SSZ that is invalid, it's the bitlength of some attestation in the state that is incorrect (not matching committee size), and some malformed signature in the block. It just happens that lighthouse does the signature check earlier, while zrnt deserializes it right before verification, at which point it already hit the state format problem first.

That said, if the block actually were valid, Lighthouse should still check that their state is not invalid, unlike Prysm, which seemed to catch neither of the problems. We do have invalid-signature tests though, and the invalid pre-state is not something within reach to begin with, so I don't see this as a critical bug. Still, hardening committee retrieval from prysm/lighthouse state wouldn't hurt.

@pventuzelo pventuzelo changed the title [bug/fuzzing] Incorrect validation of SSZ block during state transition [bug/fuzzing] Incorrect validation of pre-state attestation & malformed block signature during state transition Apr 28, 2020
@pventuzelo
Copy link
Author

Just updated the issue name, hope it is more clear.
Thanks @protolambda

@pventuzelo
Copy link
Author

pventuzelo commented Apr 28, 2020

nim-beacon-chain also validate the state and block.

I just opened an issue for them as well: status-im/nimbus-eth2#944

@nisdas
Copy link
Member

nisdas commented Apr 28, 2020

@protolambda
lcli did output this though Ssz decode failed: BytesInvalid("Invalid Signature bytes: [186, 1, 17, 234, 51, 124, 150, 23, 84, 33, 184, 135, 189, 162, 39, 125, 121, 105, 76, 65, 49, 22, 197, 47, 148, 200, 37, 254, 10, 95, 164, 10, 43, 76, 146, 180, 219, 155, 194, 48, 19, 251, 198, 223, 156, 190, 197, 116, 188, 102, 187, 248, 0, 32, 158, 46, 109, 253, 193, 8, 161, 82, 81, 246, 215, 198, 36, 107, 127, 166, 8, 48, 99, 125, 8, 154, 157, 143, 233, 101, 207, 76, 100, 93, 138, 217, 193, 217, 196, 206, 24, 238, 77, 162, 41, 245]"
wouldn't that point to it being a ssz issue for lighthouse ?

If it was an invalid ssz object this should have failed for us first.

@pventuzelo
Copy link
Author

Block is an invalid ssz for zcli

./bin/zcli pretty block incorrect_ssz_validation_block_prysm.ssz
cannot load input
cannot decode ssz: expected to be at 3417631015 bytes, but currently at 84

@paulhauner
Copy link

wouldn't that point to it being a ssz issue for lighthouse ?

We enforce valid BLS points at the SSZ decoding level (except for deposits). Not technically part of the SSZ spec but it saves us from processing things that will never pass signature verification.

@pventuzelo
Copy link
Author

teku also have the same issue.

Reported here: ConsenSys/teku#1686

@nisdas
Copy link
Member

nisdas commented Apr 28, 2020

We enforce valid BLS points at the SSZ decoding level (except for deposits). Not technically part of the SSZ spec but it saves us from processing things that will never pass signature verification.

Gotcha, thanks for confirming this. So its a sig verification issue then. So trying to think of action items for this:
@terencechain any ideas ? Of the top of my head.

  1. Perform signature verification before processing block
  2. Verify Pre-State

On 1) I think we are moving towards decoupling signature verification from block processing, so we could take this under that. This should prevent us from processing any bad blocks.

On 2) , not too sure about how to approach this. it's the bitlength of some attestation in the state that is incorrect (not matching committee size) . In restrospect we should never have some state with invalid attestations ever saved to db. This would fail the previous state transition anyway. Not sure how common this would be to come up in runtime. It's not like we receive state over the wire or anything, so I guess this may not be as important of a check.

@pventuzelo
Copy link
Author

pventuzelo commented Apr 28, 2020

Quick recap:

lighthouse:

  • decode the state properly
  • detect bls signature is invalid in the block during ssz decoding: Ssz decode failed: BytesInvalid("Invalid Signature bytes:
    notes: lighthouse is checking BLS during ssz decoding

zcli:

  • decode the state as well
  • detect the block is invalid running zcli pretty with error: decode ssz: expected to be at 3417631015 bytes, but currently at 84
  • when running zcli transition blocks, we get the commitee mismatch error

prysm/teku/nim:

  • decode the state as well
  • consider the block as valid and tried to process the state transition

Some extra tests using states from the spec (pre.ssz and post.ssz) show that the results on prysm are the same whatever state we are using.

@terencechain
Copy link
Member

@pventuzelo to echo what @protolambda said:

... the invalid pre-state is not something within reach to begin with, so I don't see this as a critical bug. Still, hardening committee retrieval from prysm/lighthouse state wouldn't hurt.

Given this is strictly a client implementation detail, it may be better to start harden it from spec's point of view (ie add this check to the beacon chain spec) then us the client implementations can follow along. What do you think?

@terencechain terencechain added Priority: Low Low priority item Need-Info Need more information from author Enhancement New feature or request and removed Priority: High High priority item Bug Something isn't working labels Apr 29, 2020
@prestonvanloon
Copy link
Member

How long did this take to run? The input data here would run almost 2 quadrillion empty state transitions before running validation.

This isn't the normal beacon block pipeline for Prysm. This method has assumed the following:

  • The block is not from the future
  • The block has already be verified by p2p spec requirements
  • The state transition processing has a reasonable deadline (pcli has no deadline).

I don't think your findings are correct. There is no way prysm processed 2 quadrillion empty state transitions in 15 seconds.

@prestonvanloon
Copy link
Member

We have implemented the spec state_transition pretty much exactly as it is written. If Lighthouse or other clients are doing additional behavior, then this isn't a proper test setup. Let us know what you need, otherwise this is a no-op issue.

A quick recap of what pcli is doing:

  • Deserialize ssz objects
  • Take determine the roots of these objects before processing
  • Run state_transition as defined in the spec
  • Return results
def state_transition(state: BeaconState, signed_block: SignedBeaconBlock, validate_result: bool=True) -> BeaconState:
    block = signed_block.message
    # Process slots (including those with no blocks) since block
    process_slots(state, block.slot)
    # Verify signature
    if validate_result:
        assert verify_block_signature(state, signed_block)
    # Process block
    process_block(state, block)
    # Verify state root
    if validate_result:
        assert block.state_root == hash_tree_root(state)
    # Return post-state
    return state

With your input data, process_slots(state, block.slot) will take a very long time as it has to run nearly 2 quadrillion state transitions. (1970324836982784-95-1 transitions). The log you pasted in the original issue shows that Prysm is starting process_slots and you terminated it in 15s. Based on the spec, we have implemented this correctly and are doing exactly what is intended. Like i said in the previous comment, blocks would not have reached this state transition core function from any layer of the application. pcli is calling state transition directly, as intended.

@zedt3ster
Copy link

I think the main issue here is that we've used utilities such as pcli, zcli, and ncli with the assumption that they're identical to how the related client actually processes the SSZ containers, thinking that these tools include all checks and verification steps, which isn't the case as outlined by @prestonvanloon.

Feel free to close this issue, we'll investigate a better/more effective way of testing these edge cases :)

@pventuzelo pventuzelo changed the title [bug/fuzzing] Incorrect validation of pre-state attestation & malformed block signature during state transition [bug/fuzzing] Incorrect validation of pre-state attestation & malformed block signature during state transition (in pcli) Apr 30, 2020
@pventuzelo
Copy link
Author

I've update the issue name to make it clear that it is related to pcli.

@no-response no-response bot removed the Need-Info Need more information from author label Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request Priority: Low Low priority item
Projects
None yet
Development

No branches or pull requests

8 participants