[Epoch Sync] EpochInfoAggregator treatment is awkward at the epoch sync boundary #11931

robin-near · 2024-08-13T22:52:53Z

The EpochInfoAggregator is a cache of the aggregated info of an epoch. At the beginning of an epoch, the EpochInfoAggregator should be empty, and then each block of the epoch adds some information to the aggregator. That way, we're able to accumulate very important info such as all the proposals submitted during the epoch.

The way that the EpochManager maintains the EpochInfoAggregator is that it keeps both an in-memory aggregator and an on-disk aggregator; in both cases, the aggregator refers to a block hash H, and its data represents fold_left(aggregator_append, empty aggregator state, all blocks from H's epoch's first block to H inclusive). In order to compute the EpochInfoAggregator for any given block hash B, we walk back from B until we either hit H (in which case we combine with the current aggregator), or we hit the beginning of the epoch (in which case we effectively recomputed the whole result).

This sounds like not a big issue for EpochSync, because as we sync to the first block of an epoch, our EpochInfoAggregator would begin empty. However, it is not that simple. The way we maintain the current EpochInfoAggregator is that we would always update it to the last final block. That makes sense, as the last final block is as far as we can advance without ever having to rewind the aggregator computation (rewind as in, when updating the aggregator we would end up having to traverse all the way to epoch start). This presents a very awkward situation for EpochSync, because right after epoch sync, our header_head is at the first block F of the current epoch T:

L3 --> L2 --> L1 --> F
       Epoch T-1  | Epoch T

but our last final block may be either L2 or L3 (depending on whether F is an endorsement or skip). Either way, when we do header sync after, we are going to apply a block on top of F:

L3 --> L2 --> L1 --> F --> S
       Epoch T-1  | Epoch T

Now, the final block is either L1, L2, or L3. If the last final block does advance, let's say from L2 to L1, EpochManager is going to have to update the EpochInfoAggregator to L1, but this is where the problem happens: we do not have a way to compute this. We're missing the entire epoch T - 1 so even if we wanted to we couldn't aggregate all the way to the beginning of epoch T - 1. In other words there is no way for us to correctly compute the EpochInfoAggregator at L1.

So to solve this, we have a few options:

Forcefully initialize EpochInfoAggregator at L3 to an empty state. It is incorrect, and as a result, the computation of the aggregator at L2 and L1 will also be incorrect, but that is okay because we don't need to use the aggregator for epoch T - 1 to compute anything, since the EpochInfo for T+1 had already been initialized by epoch sync. It doesn't affect the aggregator computation for epoch T, since we would restart the aggregation at F anyway. The problem with this approach is it's just hacky, and we still need to put in some additional conditionals in the code (it's not too bad) because we don't have the BlockInfo for L3, which the current aggregator code needs.
Make the EpochManager aware that L1 is our epoch sync boundary, so that when calculating the last final block of F or S, rather than returning L3 or L2, always return L1. This is fine for correctness because we will not be rewinding or having forks during epoch sync - we already know as part of epoch sync that L1 is final. This approach will avoid having to compute the aggregator for anything in T-1. The disadvantage is that now we have a weird dependency just for this very subtle situation. No other code other than epoch sync itself currently needs to know about an "epoch sync boundary".
Some other solution perhaps?

The text was updated successfully, but these errors were encountered:

robin-near mentioned this issue Aug 13, 2024

[ProjectTracking]: Epoch Sync V4 near/near-one-project-tracking#73

Open

staffik mentioned this issue Aug 19, 2024

[Epoch Sync] Initial implementation for Epoch Sync V4. #11934

Merged

github-actions bot mentioned this issue Sep 1, 2024

Monthly issue metrics report #12028

Open

staffik mentioned this issue Sep 23, 2024

[Epoch Sync] EpochSyncProof validation #12020

Merged

staffik mentioned this issue Oct 18, 2024

[Epoch Sync] Verify 2 last block headers in prev epoch #12259

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Epoch Sync] EpochInfoAggregator treatment is awkward at the epoch sync boundary #11931

[Epoch Sync] EpochInfoAggregator treatment is awkward at the epoch sync boundary #11931

robin-near commented Aug 13, 2024 •

edited

Loading

[Epoch Sync] EpochInfoAggregator treatment is awkward at the epoch sync boundary #11931

[Epoch Sync] EpochInfoAggregator treatment is awkward at the epoch sync boundary #11931

Comments

robin-near commented Aug 13, 2024 • edited Loading

robin-near commented Aug 13, 2024 •

edited

Loading