New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Consensus] Log verifier ids for missing approvals #1324
[Consensus] Log verifier ids for missing approvals #1324
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Leo,
sorry I am too late with my feedback of your PR #847.
I am really worried about the “unsealed reason”, because there are more subtle reasons why we might not seal the next result in addition to the ones covered in your PR:
- There is a reason why an individual result is not sealed. But there might be multiple results, each with a different reason why it is not sealed.
- In addition to the few reasons you cover in your PR, additional reasons for not sealing might be:
- there is a miss-match between the end state of the sealed result and the start state of the result pending sealing
- we have only seen a single execution result (but we require results from two execution nodes)
- results might be incorporated in a different fork, but not yet in the fork which we are extending, so we can’t seal
If you take a look at the logic in the block builder for determining which seals can be included, the logic is quite complex. To accurately describe why a specific result is not sealed, we would probably need to have similar logic in the sealing observation.
To be clear
- The sealing observation gives you all the data you need to diagnose a sealing halt. I have payed very close attention to that when refactoring it.
- What you need to add as a human is a detailed understanding of the sealing conditions, so you can identify the specific condition that is not satisfied.
There is a reason why it is hard to diagnose why sealing halted: because the sealing conditions are complex.
Oversimplifying the conditions where sealing halts really worries me:
Imagine we have a mainnet fire because sealing stoped. What would you prefer: a wrong diagnosis leading you to investigate an aspect that is not a problem? No diagnosis, requiring you draw your own conclusion from the data given.
I have very strong opinion about this topic: If we want algorithmic help to diagnose a sealing halt, we need to make sure that the diagnosis is accurate. Otherwise, it is counter-productive. My preference would be to revert the merge and either implement it properly or remove the diagnosis.
|
||
return fmt.Sprintf("unsealed block at height %v has been executed, the result has been incorporated on %v fork(s), "+ | ||
"but no result has received enough approvals, check the chunks_with_insufficient_approvals field for more details", | ||
len(st.records), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsealed block at height %v
is filled with parameter len(st.records)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result has been incorporated on %v fork(s)
This statement is only valid if there is one result. But we can have execution forks. Tow results might be incorporated in the same fork
indices := make([]string, 0, len(chunksWithMissingApprovals)) | ||
for i, list := range chunksWithMissingApprovals { | ||
indices = append(indices, fmt.Sprintf("chunk_index: %v, verifier_ids: %v", i, list)) | ||
} | ||
r.entries["chunks_with_insufficient_approvals"] = indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, the sealing observation is structured as a nested json, which allows to post-process the message. I feel, we are flattening the data here, which unnecessarily removes structure:
- Conceptually, we have pairs: (
chunk_index
,IDs of verifiers from which we miss approvals
)
I feel it would be beneficial to preserve this structure in the json:
chunksInfo := make([]map[string]interface{}, 0, len(chunksWithMissingApprovals))
for i, list := range chunksWithMissingApprovals {
chunk := make(map[string]interface{})
chunk["chunk_index"] = i
chunk["missing_approvals_from_verifiers"] = list
chunksInfo = append(chunksInfo, chunk)
}
bytes, err := json.Marshal(r)
if err != nil {
bytes = []byte("failed to marshal data about chunks with missing approvals")
}
r.entries["chunks_with_insufficient_approvals"] = string(bytes)
this PR has been reverted in favour of https://github.com/onflow/flow-go/pull/1332/files |
For https://github.com/dapperlabs/flow-go/issues/5851
This PR improves the sealing observation log to include the verifiers who didn't send approvals for chunks they were assigned.