Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossip verify block to return block if no parent #1028

Closed
AgeManning opened this issue Apr 20, 2020 · 17 comments
Closed

Gossip verify block to return block if no parent #1028

AgeManning opened this issue Apr 20, 2020 · 17 comments
Assignees
Labels
good first issue Good for newcomers optimization Something to make Lighthouse run more efficiently.

Comments

@AgeManning
Copy link
Member

Description

When verifying a block for gossip propagation, one error that can occur is ParentUnknown. In this case, we want to store the block and lookup its parents before processing a chain of blocks.

Currently this return type doesn't return the block and we must clone the block before checking the verification in case we need to do a parent lookup.

To avoid the clone, the load_parent function should return the SignedBeaconBlock itself in the error message.

@paulhauner
Copy link
Member

paulhauner commented Apr 20, 2020

Could you just clone the block.parent_root instead of the whole thing?

EDIT: oh you need to store the whole block to process it later.

@paulhauner
Copy link
Member

paulhauner commented Apr 20, 2020

I'm going to un-assign myself from this one since it's a relatively minor optimization. Someone else can come pick it up if they like, otherwise I'll get to it at some point.

Returning the parent block means making BlockError generic over EthSpec.

@paulhauner paulhauner removed their assignment Apr 20, 2020
@paulhauner paulhauner added good first issue Good for newcomers optimization Something to make Lighthouse run more efficiently. labels Apr 20, 2020
@AgeManning
Copy link
Member Author

Sounds good to me

@thor314
Copy link
Contributor

thor314 commented May 5, 2020

Working on this now.

@thor314
Copy link
Contributor

thor314 commented May 5, 2020

@AgeManning Haven't contributed to your repo before, so excuse the dumb questions.
"Store the block/Lookup its parents" - store/lookup where
"before processing chain of blocks" - Unclear what chain of blocks referred to
"processing" - unclear, what is meant by process in this instance?
"Don't want to clone the block" - Seems right
return signedBeaconBlock in error message - Seems right

I assume the team is generally busy and don't expect fast response

@AgeManning
Copy link
Member Author

Hey @thor314. Thanks for the interest.
Let me explain this issue in a little more detail.

We get a SignedBeaconBlock in from the network and we need to make a decision about whether we should propagate it to the rest of the network or not (to filter out spam and bad blocks etc). The function that deals with this is here:
https://github.com/sigp/lighthouse/blob/master/beacon_node/network/src/router/processor.rs#L463

As you can see, we then call this function to determine if we should propagate the block or not.

Now it can happen that the block we received references a block that we do not know about. In this case the verify_for_gossip function returns a ParentUnknown variant as you can see here

In this case, we want to go and find the parent block. As you can see, the returned error ParentUnknown variant only returns the block hash. However we want the actual block in the case the parent is unknown. So, what we want to do is let the ParentUnknown variant actually return the block itself, to avoid the block clone when sending to the verify_block_for_gossip.

You'll have to track down the processing in verify_block_for_gossip and try to keep ownership of the block and if the parent is unknown return it in the variant. As paul mentions, this will mean making the BlockError generic over EthSpec, i.e it will become BlockError<T: EthSpec> and this will need to be propagated through the code base.

Hopefully this is enough information to get you started :)

@thor314
Copy link
Contributor

thor314 commented May 14, 2020

Thanks for bearing with me @AgeManning. I'm going to talk to you like a rubber duck for a bit, sprinkling questions as they arise, then restate the questions at the end. Commence rubber ducking:
The predicate should_forward_block runs verify_block_for_gossip.
If verify_block_for_gossip fails, it propagates back an error.
Back in should_forward_block, we pattern match for ParentUnknown errors. This line suggests to me that someone believes:
ParentUnknown is the only error we need to handle in should_forward_block, and any other possible errors are going to be further propagated to any caller of should_forward_block. You explained this a little, as if ParentUnknown comes back "we want to go and find the parent block". Makes the sense, it does.
Finally, if the propagated error has type ParentUnknown, we call method send_to_sync().
So here's the issue, as you explain: we want to let the ParentUnknown BlockError, return the block itself.
0. You reference Paul's mentioning that BlockError needs to become generic over EthSpec; I don't quite understand why, but I can do that, and likewise wherever else it pops up in the code base. Sanity check: I'm going to be changing a bunch of instances of BlockError to BlockError<T>, where I'm going to bound T:EthSpec, this is what is meant by generic over EthSpec yes?

  1. Now s'pose we have some impl block for ChainSegmentResult:
impl ChainSegmentResult {
    pub fn to_block_error(self) -> Result<(), BlockError> {
        match self {
            ChainSegmentResult::Failed { error, .. } => Err(error),
            ChainSegmentResult::Successful { .. } => Ok(()),
        }
    }
}

We want to make BlockError generic over EthSpec, but to change this line, BlockError needs to become BlockError<T>. To do that we have to make ChainSegmentResult generic over EthSpec too. Unless I'm misunderstanding?

Now once that's handled, we skedaddle back on over to should_forward_block, where this line appears:
let result = self.chain.verify_block_for_gossip(*block.clone());
SignedBeaconBlock is wrapped in a Box, presumably because:
2. we don't know how many messages a BeaconBlock will contain?

After we clear up making BlockError generic over Ethspec, we get to bop the clone straight out of existence:
let result = self.chain.verify_block_for_gossip(*block);
And rustc is probably going to have some things to say, and I'll fiddle with the verify_block_for_gossip function to see if I can't fix them. That about right?

  1. I looked at the EthSpec trait, and it's pretty huge. What is the property of EthSpec that we are benefitting from by making BlockError generic over Ethspec? And then,
  2. Do I need to make ChainSegmentResult generic over EthSpec too?
  3. *skippable - do you pass SignedBeaconBlock references inside a Box because the number of messages contained is not known at compile time?

@AgeManning
Copy link
Member Author

Let me answer these in some arbitrary high level order and then go back to some more specifics.

The whole EthSpec thing seems to be a point of confusion, and i don't blame you, it's pretty funky. At the crux of it, a BeaconBlock (and many other types) change their form depending on a specification. More concretely, we have a number of different specs, like a mainnet spec, a minimal spec an interop spec. Each of these specs change constants about eth2 consensus which affect the structure of a block. A minimal spec has less attestations allowed in it than say a block for mainnet.
So when we start a client we specify which spec we using. I.e BeaconBlock<T> where T is EthSpec which concretely could be a mainnet spec or a minimal spec, and these kinds of blocks would be different types/sizes etc.

Now, BlockError only returns things like hashes and slots IIRC. If BlockError were to return a BeaconBlock also (or it's signed equivalent) then it too would have it's type change based on the spec because the variant BlockError::ParentUnknown(BeaconBlock<T>) would be a different type dependent on the Spec being used.

Hopefully this answers your question 0 above.

  1. From a generic rust point of view, the answer is no. For any function, you can do things like:
    pub fn to_block_error<T:EthSpec>(self) -> Result<(), BlockError<T>> {
        match self {
            ChainSegmentResult::Failed { error, .. } => Err(error),
            ChainSegmentResult::Successful { .. } => Ok(()),
        }
    }

However depending on the logic, perhaps you want to make ChainSegmentResult generic over T as well. I'm not sure where abouts this is, so can't comment on that right now.

  1. A number of objects come from the network are sent through a channel via an Enum. If we had a enum like:
enum Thing {
   A(BeaconBlock),
   B(usize),

Thing would occupy the max size of either variant. So if we had B, it would consume in memory as much as a BeaconBlock, which could be quite large. If we Box BeaconBlock, then we are only storing a pointer to some heap memory and the total enum is small for all variants. So we have it as a box.

Now, I think the core of this issue, is finding out where the block is being consumed when it fails with parent unknown. A quick look points me to the check starting here
Then this thing happens, which looks suspect: load_parent but its only taking a borrow of the block.

It's here

and it errors here

It seems to me, the solution would be to set load_parent to take a full block, not a borrow.. do its calculations and return the block on error and also on success so the rest of the calculations can be done on the block, if you follow

@thor314
Copy link
Contributor

thor314 commented May 19, 2020

Okay, so I'm struggling in tracking down all the changes needed for making BlockError generic.
Here's what I've done so far (I have a more detailed worklog file if details are important, this is a summary of that):

  1. *block.clone() -> *block
let result = self.chain.verify_block_for_gossip(*block.clone()); // to
let result = self.chain.verify_block_for_gossip(*block);
  1. Let BlockError be generic over EthSpec
pub enum BlockError<T> {
    /// The parent block was unknown.
    ParentUnknown(Hash256),
// change to
pub enum BlockError<T:EthSpec> {
    /// The parent block was unknown.
    ParentUnknown(Hash256),
  1. deal with all 25 resulting instances of error0107 (replace BlockError with BlockError<T> in a bunch of places, and trait bound T with EthSpec where not already bound
  2. deal with E0207 in BlockProcessingOutcome:
pub enum BlockProcessingOutcome {
...
    /// The parent block was unknown.
    ParentUnknown,
...
impl BlockProcessingOutcome {
    pub fn<T> shim(
        result: Result<Hash256, BlockError>,
   ) -> Result<BlockProcessingOutcome, BeaconChainError> {
//to
pub enum<T:EthSpec> BlockProcessingOutcome<T> {
...
    /// The parent block was unknown.
    ParentUnknown(T),
...
impl BlockProcessingOutcome {
    pub fn<T> shim(
        result: Result<Hash256, BlockError<T>>,
   ) -> Result<BlockProcessingOutcome<T>, BeaconChainError> {
  1. Deal with E0277 (easy) and E0392 (confusing)
    E0277: tried use a trait in a type that doesn't expect that trait.
// Changing:
pub fn signature_verify_chain_segment<T: BeaconChainTypes>(
// to 
pub fn signature_verify_chain_segment<T: BeaconChainTypes + EthSpec>(

done again in check_block_against_finalized_slot
check_block_relevancy
load_parent
and

E0392: A type or lifetime parameter has been declared but not used.

// Errors: 
beacon_chain.rs
76 | pub enum ChainSegmentResult<T: EthSpec> {
   |                             ^ unused parameter

block_verification.rs
88 | pub enum BlockError<T: EthSpec> {
   |                     ^ unused parameter

I'm interpreting this to mean that I need to change the following:

pub enum BlockError<T:EthSpec> {
    /// The parent block was unknown.
    ParentUnknown(Hash256),
// change to
pub enum BlockError<T:EthSpec> {
    /// The parent block was unknown.
    ParentUnknown(T),

and in fact that does solve both instances of E0392, but then dumps 44 new errors (some rather involved ones too) out, and by Occam's razor, I deduce that I may have simply done something dumb and/or misunderstood something.

Some other questions that came up along the way:

  1. Is there a reason cheap_state_advance_to_obtain_committees (in block_verification.rs) is non-uniform to the rest of the codebase: generic over the generic over the letter E rather than T?
  2. I made ChainSegmentResult (in beacon_chain.rs) generic over EthSpec, so that I BlockError would have a generic argument available for BlockError. This increments the number of args ChainSegmentResult takes from 0 to 1.
  3. In BlockProcessingOutcome (block_processing_outcome.rs), I think I need to increase the number of arguments that BlockProcessingOutcome takes from 0 to 1, or else the enumerated ParentUnknown(T) type can't use the generic type for an argument
  4. If I'm making BlockError generic over EthSpec, while trying to keep ownership of the block in ParentUnknown, does ParentUnknown(Hash256) become ParentUnknown(T) or stay the same? Or use an entirely different type?

@AgeManning
Copy link
Member Author

Hey, to resolve all the issues with the generic type parameter, I threaded the generic parameter through to help you start. Its in #1174.
I've made a ParentUnknownCorrect variant placeholder to show you what I was expecting. It should replace the ParentUnknown variant once complete (but with the same name).

Hope this gives a helping hand.

@thor314
Copy link
Contributor

thor314 commented May 26, 2020

Sorry on the delay, balancing this with finals.

ParentUnknownCorrect expects a SignedBeaconBlock. The argument to load_parent is just a BeaconBlock, ie. the message without the signature of the SignedBeaconBlock. Shouldn't ParentUnknownCorrect should expect a BeaconBlock then, or should I wire up a new load_parent_signed function for this use case?

More explicitly:
GossipVerifiedBlock::new takes a SignedBeaconBlock, and calls load_parent to borrow the block's message field, but not the signature field.
then, load_parent takes a borrowed BeaconBlock (NOT a SignedBeaconBlock), and if the Parent is unknown, returns a ParentUnknown, which used to wrap the block root, but we're modifying to return ParentUnknownCorrect, which wraps a SignedBeaconBlock.

Edit: ignore just now, still sorting myself out
Edit2: feeling sorted again. See above :)

@AgeManning
Copy link
Member Author

Yes you're right. Make the load_parent accept a SignedBeaconBlock instead of just a BeaconBlock

@thor314
Copy link
Contributor

thor314 commented May 29, 2020

Okay, I'm getting stuck, but I think I can resolve that with some input. Here's what I've done:
I copied load_parent into a separate function, load_parent_correct, updated in 2 places:

    block: BeaconBlock<T::EthSpec>,
...
    if !chain.fork_choice.contains_block(&block.parent_root) {
        return Err(BlockError::ParentUnknownCorrect(Box::new(block)));

This compiles. Now if I plug in load_parent_correct to GossipVerifiedBlock::new:
let mut parent = load_parent_correct(block.message, chain)?;
Then when block.message is used 50 lines later, we're fighting the borrow checker.

Here's the problem:
My version of load_parent_correct is currently returning type Result<BeaconSnapshot<T::EthSpec>, BlockError<T::EthSpec>>

where BeaconSnapshot is defined

pub struct BeaconSnapshot<E: EthSpec> {
    pub beacon_block: SignedBeaconBlock<E>,
    pub beacon_block_root: Hash256,
    pub beacon_state: BeaconState<E>,
    pub beacon_state_root: Hash256,
}

And presently is only looking at the parent of the borrowed block, so it's dropping the block I just gave it ownership of when it returns the snapshot.

Here's what I think I need to do:
Update the signature of load_parent_correct from
Result<BeaconSnapshot<T::EthSpec>, BlockError<T::EthSpec>>
to a Result containing a tuple of BeaconSnapshot and BeaconBlock:
Result<(BeaconSnapshot<T::EthSpec>, BeaconBlock<T::EthSpec), BlockError<T::EthSpec>>
This looks ugly to me, to have a Result wrapping a tuple, but I'm not sure what a better solution would be.
Q1: Is there something more idiomatic/consistent with the existing codebase?

If I do that, I'm only using load_parent_correct in the GossipVerifiedBlock::new, but load_parent is used in some other places.
Q2: Should I only replace this one instance of load_parent, or all of them with load_parent_correct?
Q2.1: if only this one instance, I should rename "blah_correct" to something like "blah_takes_and_returns", yeah?

@divagant-martian
Copy link
Collaborator

Hi @thor314 , where are you working on this if may I ask? went lurking to your fork but I couldn't get to a branch, and most links on your last comment go to the stable-futures branch which no longer exists

@thor314
Copy link
Contributor

thor314 commented Jun 2, 2020

@divagant-martian Ah, I'm still getting used to a git workflow. It should be up to date now. I'm updating all the links from my prev comment now, to point to the right stuff. K, links updated to be in my fork of the repo.

@divagant-martian
Copy link
Collaborator

Hi, answering your questions:

    1. Making the result's happy path to be a tuple is ok. I think this is what you should do
    1. Update all load_parent appearances. I don't think you need to rename it, just update the docs.

Also, if you check, all calls to load_parent are over block.message where block is of type SignedBeaconBlock and message, of type BeaconBlock. To avoid moving the block/message out of the signed block if would be easier to make load_parent receive the whole signed block instead of just the block. Otherwise, if the signed block is used later you would need to reconstruct it from the block and signature of the original signed block. Maybe check where it's called and decide if it's better to send it as a whole or not. In any case, since the ParentUnknown(Correct) error is constructed from the block that load_parent receives, their types (of the block) should match.

Let me know if you run into issues

@divagant-martian
Copy link
Collaborator

Also probably wrap the block in a Box since those are thick boys

bors bot pushed a commit that referenced this issue Aug 6, 2020
## Issue Addressed

#1028 

A bit late, but I think if `BlockError` had a kind (the current `BlockError` minus everything on the variants that comes directly from the block) and the original block, more clones could be removed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers optimization Something to make Lighthouse run more efficiently.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants