Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consensus: require multiple receipts to seal #450

Merged
merged 32 commits into from
Mar 16, 2021

Conversation

AlexHentschel
Copy link
Member

@AlexHentschel AlexHentschel commented Feb 23, 2021

production version of MainNet fix: #444 (Require multiple receipts to seal)

Overview of Changes

  • logic for requiring two receipts from different ENs before sealing;
  • logic to ensure that we re-request a second receipt (as a single one is insufficient now)
  • additional unit tests

and

  • general modularization and restructuring of sealing logic (matching.Core):
    • in sealableResults we now have distinct methods for that implement the various sealing checks or conditions:
      // Can we seal following the happy-path protocol?
      sealingStatus, err := c.hasEnoughApprovals(incorporatedResult, sealingTracker)
      // Emergency Sealing Fallback: only kicks in if we can't seal following the happy-path sealing
      emergencySealable := false
      if !happySealable {
      emergencySealable, err = c.emergencySealable(incorporatedResult, lastFinalized)
      hasMultipleReceipts := c.resultHasMultipleReceipts(incorporatedResult)
    • consolidated the logic for checking the happy path sealing conditions into the single method hasEnoughApprovals and removed a lot of duplicated requests / queries
    • consolidated logic for tracking approvals. Specifically optimized set-membership implementation which is now using map-based lookups O(1) instead of iterating over all elements to search for a match O(n):
      // BySigner returns a signer's signature if it exists
      func (c *SignatureCollector) BySigner(signerID Identifier) (*crypto.Signature, bool) {
      idx, found := c.signerIDSet[signerID]
      if !found {
      return nil, false
      }
      return &c.verifierSignatures[idx], true
      }

Thanks @jordanschalm and @zhangchiqing for your comments on the previous version of this PR: #444. I have addressed all your comments here.

Ran this on localnet with disabled receipts broadcast for the Execution Nodes, to confirm sealing is live.

Alexander Hentschel added 2 commits February 22, 2021 21:46
…re sealing;

* fixed logic for re-requesting second receipt (as a single one is insufficient now);
* added unit tests
Comment on lines +233 to +234
requester.WithRetryInitial(2*time.Second),
requester.WithRetryMaximum(30*time.Second),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default for the RetryMaximum was 2min, which is way too slow as it directly impacts time to seal , which is critical for NBA

// However, as soon as we have at least 2 receipts, the matching Engine will produce a candidate
// seal and put it in the mempool. To prevent spamming the logs, we log only when there is
// no seal in the IncorporatedResultSeals mempool:
_, sealAlreadyGenerated := c.seals.ByID(incorporatedResult.ID()) // IncorporatedResultSeal has the same ID as incorporatedResult
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caused some confusion in the previous version of this PR (see comments). I have revised the comment in response, specifically this part:

// seal and put it in the mempool. To prevent spamming the logs, we log only when there is
// no seal in the IncorporatedResultSeals mempool:
_, sealAlreadyGenerated := c.seals.ByID(incorporatedResult.ID()) // IncorporatedResultSeal has the same ID as incorporatedResult

Additional context:

Please note the naming convention for matching.Core:

  • the persistent storage components have names ending in DB, such as:
    headersDB storage.Headers // used to check sealed headers
  • Components without DB refer to mempools. Specifically, c.seals has the following type:
    seals mempool.IncorporatedResultSeals // holds candidate seals for incorporated results that have acquired sufficient approvals; candidate seals are constructed without consideration of the sealability of parent results
    It stores IncorporatedResultSeals, whose ID method is defined as
    // ID implements flow.Entity.ID for IncorporatedResultSeal to make it capable of
    // being stored directly in mempools and storage.
    func (s *IncorporatedResultSeal) ID() Identifier {
    return s.IncorporatedResult.ID()
    }

Alexander Hentschel and others added 2 commits February 23, 2021 14:43
engine/consensus/matching/core.go Outdated Show resolved Hide resolved
return false
}

l := c.log.Info().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this generate lots of info log?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change the log level. Let me know what you prefer.
The logging should essentially be almost identical to the initial version you built: #363 (comment) (including logging on info Level). The only noteworthy differences between your version and the revised behaviour as of this PR:

  • FirstUnmatchedChunkIndex: -1 is now omitted.
  • SealedByEmergency is only included when we actually evaluated the emergency sealing condition (as opposed to before, where the default value was false even if we didn't check whether the result qualified for emergency sealing)
  • we also have information whether 2 receipts for the result are known

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to use debug log.

The log I built only print logs for next unsealed result, whereas the added log here will print log for all unsealed results.

Since you've added extensive logging for the next unsealed result, I think this logging could be debug level

Alexander Hentschel added 8 commits March 4, 2021 12:25
ExecutedBlock *flow.Header // block the incorporated Result is for
IncorporatedResult *flow.IncorporatedResult // the incorporated result

// SufficientApprovalsForSealing: True iff all chunks in the result have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// SufficientApprovalsForSealing: True iff all chunks in the result have
// SufficientApprovalsForSealing: True if all chunks in the result have

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is intentional and not a typo 😉 :

[Wiki]: In logic and related fields such as mathematics and philosophy, "if and only if" (shortened as "iff") is a biconditional logical connective between statements, where either both statements are true or both are false.

// FirstUnmatchedChunkIndex: Index of first chunk that hasn't received
// sufficient approval (ordered by chunk index). Optional value: only set
// if SufficientApprovalsForSealing == False and nil otherwise.
FirstUnmatchedChunkIndex *uint64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have pointers here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a proper value for FirstUnmatchedChunkIndex in some cases. It only has a well-defined value if one of the chunks is missing approvals. If all chunks have sufficient approvals, what should we set FirstUnmatchedChunkIndex to? Previously, we used to set it to -1. Personally, I prefer to have this expressed more explicitly.

In Java, I would have implemented this using Optional. Did some looking around, and it seems people tend to use pointer types for this in Go (-> stackoverflow) ... open to alternative suggestions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To main purpose of the SealingTracker is to provide logging information. We don't include the values if they haven't been set by the sealing logic:

if record.firstUnmatchedChunkIndex != nil {
kvps["first_unmatched_chunk_index"] = *record.firstUnmatchedChunkIndex
}
if record.qualifiesForEmergencySealing != nil {
kvps["qualifies_for_emergency_sealing"] = *record.qualifiesForEmergencySealing
}
if record.hasMultipleReceipts != nil {
kvps["has_multiple_receipts"] = *record.hasMultipleReceipts
}

@durkmurder
Copy link
Member

Looks good to me, just a few comments related to tests and some minor stuff.

Comment on lines 69 to 71
if rs == nil {
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if rs is nil, then rs.SetHasMultipleReceipts would hit nil pinter first

Suggested change
if rs == nil {
return
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, having nil as legitimate value for SealingRecord is just too confusing and error-prone. Now working only with non-nil values SealingRecords. Hope that is clearer now 😅

engine/consensus/matching/sealingtracker/sealing_record.go Outdated Show resolved Hide resolved
ExecutedBlock: executedBlock,
IncorporatedResult: incorporatedResult,
}
st.records = append(st.records, record)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when do we remove the record from it, would the memory keeps growing as we record and hold more?

// have a child yet. Then, the chunk assignment cannot be computed.
// - All other errors are unexpected and symptoms of internal bugs, uncovered edge cases,
// or a corrupted internal node state. These are all fatal failures.
func (c *Core) hasEnoughApprovals(incorporatedResult *flow.IncorporatedResult, sealingTracker *sealingtracker.SealingTracker) (*sealingtracker.SealingRecord, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is too complex as it takes in a stateful sealingTracker type and returns another sealingRecord type.

I would suggest not to depend on those types in order to keep it simple.

This caller seems only care about whether there is enough approvals for the given result, and the FirstUnmatchedChunkIndex if not enough. In other words, we don't care about the chunk that had enough approvals, right?

So we could let this function return (FirstUnmatchedChunkIndex int, hasEnoughApprovals bool).

If we had enough approvals, we return (-1, true); if we the 3rd chunk didn't have enough approval, we return (2, false)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Agree. I was concerned about the complexity here as well.

I have refactored hasEnoughApprovals to:

func (c *Core) hasEnoughApprovals(incorporatedResult *flow.IncorporatedResult) (*sealingtracker.SealingRecord, error) {

While I still return the SealingRecord for the incorporatedResult, I have made several modifications which hopefully address the heart of your concern:

  • removed the stateful sealingTracker from the arguments
  • Regarding the usage of SealingRecord, I have applied significant simplifications:
    • SealingRecord is now directly generated by hasEnoughApprovals (i.e. I removed SealingTracker as a factory for SealingRecords)
    • SealingRecord is never nil and always reflects accurately the sealing status of the incorporated Result
    • Essentially, the SealingRecord holds the values (FirstUnmatchedChunkIndex int, hasEnoughApprovals bool) plus some other flags for later sealing checks.

Please let me know whether the code is clear enough with these revisions.

engine/consensus/matching/core.go Outdated Show resolved Hide resolved
Int("sealable_incorporated_results", len(sealedBlockIDs)).
Int64("duration_ms", time.Since(startTime).Milliseconds()).
Str("next_unsealed_results", sealingTracker.String()).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does sealingTracker.String() include all records, or just the next unsealed result?

Could you provide a sample log?

Better not all records, because there will be too much noise, as we only care about the next unsealed when debugging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only track the sealing status of the IncorporatedResult(s) for the next unsealed finalized block:

// Track tracks the given SealingRecord, provided the it should be tracked
// according to the SealingTracker's internal policy.
func (st *SealingTracker) Track(sealingRecord *SealingRecord) {
executedBlockID := sealingRecord.IncorporatedResult.Result.BlockID
if st.isRelevant(executedBlockID) {
st.records = append(st.records, sealingRecord)
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging behaviour should be almost identical to the initial version you built: #363 (comment)

When missing approval for the next unsealed block

[{
"executed_block_height":2190957487,
"executed_block_id":"9efa9526c0ab51db9c5a904bfc589f95955c1659f6b439196c0146dc8e2a3694",
"first_unmatched_chunk_index":2,
"incorporated_result_id":"b159f2623bd9d660fa1125d05487d7658e5725fcba479568da139e880df8422e",
"number_chunks":17,
"qualifies_for_emergency_sealing":false,
"result_id":"53c8d0dd81bf4aef407331c2388bcdd53f4bd5ae5cb81ad5242a9ee6655706db",
"sufficient_approvals_for_sealing":false}]

When sealed by emergency

The SealedByEmergency becomes true

[{
"executed_block_height":2190957487,
"executed_block_id":"9efa9526c0ab51db9c5a904bfc589f95955c1659f6b439196c0146dc8e2a3694",
"first_unmatched_chunk_index":2,
"has_multiple_receipts":true,
"incorporated_result_id":"b159f2623bd9d660fa1125d05487d7658e5725fcba479568da139e880df8422e",
"number_chunks":17,
"qualifies_for_emergency_sealing":true,
"result_id":"53c8d0dd81bf4aef407331c2388bcdd53f4bd5ae5cb81ad5242a9ee6655706db",
"sufficient_approvals_for_sealing":false}]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for providing the log

// contains a receipt that commits to this result.
assignment, err := c.assigner.Assign(incorporatedResult.Result, incorporatedResult.IncorporatedBlockID)
// Can we seal following the happy-path protocol?
sealingStatus, err := c.hasEnoughApprovals(incorporatedResult, sealingTracker)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This behavior is different from what I expected.

What I expected was that:

  1. we will require multiple receipts to produce a seal
  2. in other words, we will never produce a seal if there is less two receipts.

Now the new behavior is that:

  1. we will require multiple receipts and either enough approvals or emergency sealing to produce a seal
  2. in other words, we won't produce a seal even if we've received multiple receipts, but haven't received enough approvals yet, and emergency sealing has not been triggered.

I'm not comfortable to have the sealing logic requiring enough approvals, because we still have one unsolved issue, which is verification node can't resume the verification on results which are produced when it's off. So I'm afraid the emergency sealing will often be triggered, and makes the sealing sometimes fast and sometimes slow, making it hard to debug.

I would suggest to keep the original plan to just seal with multiple receipts until we fix the sealing liveness issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that the logic seals under the condition

we will require multiple receipts and either enough approvals or emergency sealing to produce a seal

This is also stated here in the code comment:

// Determine sealability:
// (i) the incorporatedResult must qualify for happy path sealing
// or qualify for emergency sealing
// AND
// (ii) there must be at least 2 receipts from _different_ ENs
// committing to the result
// comment: we evaluate condition (ii) only if (i) is true
if happySealable || emergencySealable {

Essentially, the consensus node already implements the general case, where we could require some approvals to seal (or wait until emergency sealing kicks in). With the parametrization DefaultRequiredApprovalsForSealConstruction = 0, the function hasEnoughApprovals always returns true (for any incorporatedResult). Therefore, (i) is always true and the sealing condition

 (i) the incorporatedResult must qualify for happy path sealing 
      or qualify for emergency sealing 
 AND 
 (ii) there must be at least 2 receipts from _different_ ENs 
      committing to the result 

reduces to

there must be at least 2 receipts from _different_ ENs  committing to the result 

which is exactly what you expected.

Note:

From my perspective, the current logic should be the same as what we are currently running on Mainnet (PR #444):

if matched || emergencySealed {
if e.resultHasMultipleReceipts(incorporatedResult) {
// add the result to the results that should be sealed
results = append(results, incorporatedResult)
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the following explicit shortcut in hasEnoughApprovals

// shortcut: if we don't require any approvals, any incorporatedResult has enough approvals
if c.requiredApprovalsForSealConstruction == 0 {
return sealingTracker.SufficientApprovals(incorporatedResult), nil
}

  • which hopefully makes the logic clearer
  • it shortcuts some work of determining assignments if we don't require any approvals to seal

Copy link
Member

@zhangchiqing zhangchiqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice work!

return false
}

l := c.log.Info().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to use debug log.

The log I built only print logs for next unsealed result, whereas the added log here will print log for all unsealed results.

Since you've added extensive logging for the next unsealed result, I think this logging could be debug level

// (ii) there must be at least 2 receipts from _different_ ENs
// committing to the result
// comment: we evaluate condition (ii) only if (i) is true
if sealableWithEnoughApprovals || emergencySealable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] there are quite a few nested if and || conditions in this function, would be great to keep the logic flat, however I haven't found a cleaner way. Maybe something for later in the next phase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flattened the logic a bit:

// Determine sealability:
// (i) the incorporatedResult must qualify for happy path sealing
// or qualify for emergency sealing
// AND
// (ii) there must be at least 2 receipts from _different_ ENs
// committing to the result
// comment: we evaluate condition (ii) only if (i) is true
if !(sealableWithEnoughApprovals || emergencySealable) { // condition (i) is false
continue
}
hasMultipleReceipts := c.resultHasMultipleReceipts(incorporatedResult)
sealingStatus.SetHasMultipleReceipts(hasMultipleReceipts)
if !hasMultipleReceipts { // condition (ii) is false
continue
}
results = append(results, incorporatedResult) // add the result to the results that should be sealed

not sure whether this is simpler or just as complex (only in a different way) 😅

Alexander Hentschel added 4 commits March 16, 2021 11:45
// (ii) there must be at least 2 receipts from _different_ ENs
// committing to the result
// comment: we evaluate condition (ii) only if (i) is true
if !(sealableWithEnoughApprovals || emergencySealable) { // condition (i) is false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !(sealableWithEnoughApprovals || emergencySealable) { // condition (i) is false
if sealableWithEnoughApprovals == false && emergencySealable == false { // condition (i) is false

@AlexHentschel AlexHentschel merged commit f6e12e8 into master Mar 16, 2021
@AlexHentschel AlexHentschel deleted the alex/5341_two-receipts-to-seal branch March 16, 2021 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants