[Consensus] Log verifier ids for missing approvals #1324

zhangchiqing · 2021-09-21T23:53:39Z

For https://github.com/dapperlabs/flow-go/issues/5851

This PR improves the sealing observation log to include the verifiers who didn't send approvals for chunks they were assigned.

AlexHentschel

Hey Leo,
sorry I am too late with my feedback of your PR #847.
I am really worried about the “unsealed reason”, because there are more subtle reasons why we might not seal the next result in addition to the ones covered in your PR:

There is a reason why an individual result is not sealed. But there might be multiple results, each with a different reason why it is not sealed.
In addition to the few reasons you cover in your PR, additional reasons for not sealing might be:
- there is a miss-match between the end state of the sealed result and the start state of the result pending sealing
- we have only seen a single execution result (but we require results from two execution nodes)
- results might be incorporated in a different fork, but not yet in the fork which we are extending, so we can’t seal

If you take a look at the logic in the block builder for determining which seals can be included, the logic is quite complex. To accurately describe why a specific result is not sealed, we would probably need to have similar logic in the sealing observation.

To be clear

The sealing observation gives you all the data you need to diagnose a sealing halt. I have payed very close attention to that when refactoring it.
What you need to add as a human is a detailed understanding of the sealing conditions, so you can identify the specific condition that is not satisfied.

There is a reason why it is hard to diagnose why sealing halted: because the sealing conditions are complex.

Oversimplifying the conditions where sealing halts really worries me:
Imagine we have a mainnet fire because sealing stoped. What would you prefer: a wrong diagnosis leading you to investigate an aspect that is not a problem? No diagnosis, requiring you draw your own conclusion from the data given.

I have very strong opinion about this topic: If we want algorithmic help to diagnose a sealing halt, we need to make sure that the diagnosis is accurate. Otherwise, it is counter-productive. My preference would be to revert the merge and either implement it properly or remove the diagnosis.

AlexHentschel · 2021-09-22T18:52:15Z

engine/consensus/approvals/tracker/tracker.go

+
+	return fmt.Sprintf("unsealed block at height %v has been executed, the result has been incorporated on %v fork(s), "+
+		"but no result has received enough approvals, check the chunks_with_insufficient_approvals field for more details",
+		len(st.records),


⚠️
unsealed block at height %v is filled with parameter len(st.records)

⚠️
result has been incorporated on %v fork(s) This statement is only valid if there is one result. But we can have execution forks. Tow results might be incorporated in the same fork

AlexHentschel · 2021-09-22T18:57:00Z

engine/consensus/approvals/tracker/record.go

+		indices := make([]string, 0, len(chunksWithMissingApprovals))
+		for i, list := range chunksWithMissingApprovals {
+			indices = append(indices, fmt.Sprintf("chunk_index: %v, verifier_ids: %v", i, list))
 		}
 		r.entries["chunks_with_insufficient_approvals"] = indices


So far, the sealing observation is structured as a nested json, which allows to post-process the message. I feel, we are flattening the data here, which unnecessarily removes structure:

Conceptually, we have pairs: (chunk_index, IDs of verifiers from which we miss approvals)

I feel it would be beneficial to preserve this structure in the json:

chunksInfo := make([]map[string]interface{}, 0, len(chunksWithMissingApprovals)) for i, list := range chunksWithMissingApprovals { chunk := make(map[string]interface{}) chunk["chunk_index"] = i chunk["missing_approvals_from_verifiers"] = list chunksInfo = append(chunksInfo, chunk) } bytes, err := json.Marshal(r) if err != nil { bytes = []byte("failed to marshal data about chunks with missing approvals") } r.entries["chunks_with_insufficient_approvals"] = string(bytes)

AlexHentschel · 2021-09-23T00:04:47Z

this PR has been reverted in favour of https://github.com/onflow/flow-go/pull/1332/files

log verifier ids for missing approvals

6a9c86c

zhangchiqing requested a review from AlexHentschel as a code owner September 21, 2021 23:53

zhangchiqing requested review from durkmurder and yhassanzadeh13 September 21, 2021 23:53

durkmurder approved these changes Sep 22, 2021

View reviewed changes

yhassanzadeh13 approved these changes Sep 22, 2021

View reviewed changes

Merge branch 'master' into leo/log-verifier-ids-for-missing-approvals

ad5864a

zhangchiqing mentioned this pull request Sep 22, 2021

[Consensus] v0.21 log verifier #1329

Merged

zhangchiqing merged commit 4342cd4 into master Sep 22, 2021

zhangchiqing deleted the leo/log-verifier-ids-for-missing-approvals branch September 22, 2021 17:49

AlexHentschel requested changes Sep 22, 2021

View reviewed changes

AlexHentschel reviewed Sep 22, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Consensus] Log verifier ids for missing approvals #1324

[Consensus] Log verifier ids for missing approvals #1324

zhangchiqing commented Sep 21, 2021

AlexHentschel left a comment

AlexHentschel Sep 22, 2021

AlexHentschel Sep 22, 2021

AlexHentschel Sep 22, 2021 •

edited

AlexHentschel commented Sep 23, 2021

[Consensus] Log verifier ids for missing approvals #1324

[Consensus] Log verifier ids for missing approvals #1324

Conversation

zhangchiqing commented Sep 21, 2021

AlexHentschel left a comment

Choose a reason for hiding this comment

AlexHentschel Sep 22, 2021

Choose a reason for hiding this comment

AlexHentschel Sep 22, 2021

Choose a reason for hiding this comment

AlexHentschel Sep 22, 2021 • edited

Choose a reason for hiding this comment

AlexHentschel commented Sep 23, 2021

AlexHentschel Sep 22, 2021 •

edited