Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Consensus] Log verifier ids for missing approvals #1324

Merged
merged 2 commits into from Sep 22, 2021

Conversation

zhangchiqing
Copy link
Member

For https://github.com/dapperlabs/flow-go/issues/5851

This PR improves the sealing observation log to include the verifiers who didn't send approvals for chunks they were assigned.

@zhangchiqing zhangchiqing merged commit 4342cd4 into master Sep 22, 2021
@zhangchiqing zhangchiqing deleted the leo/log-verifier-ids-for-missing-approvals branch September 22, 2021 17:49
Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Leo,
sorry I am too late with my feedback of your PR #847.
I am really worried about the “unsealed reason”, because there are more subtle reasons why we might not seal the next result in addition to the ones covered in your PR:

  • There is a reason why an individual result is not sealed. But there might be multiple results, each with a different reason why it is not sealed.
  • In addition to the few reasons you cover in your PR, additional reasons for not sealing might be:
    • there is a miss-match between the end state of the sealed result and the start state of the result pending sealing
    • we have only seen a single execution result (but we require results from two execution nodes)
    • results might be incorporated in a different fork, but not yet in the fork which we are extending, so we can’t seal

If you take a look at the logic in the block builder for determining which seals can be included, the logic is quite complex. To accurately describe why a specific result is not sealed, we would probably need to have similar logic in the sealing observation.

To be clear

  • The sealing observation gives you all the data you need to diagnose a sealing halt. I have payed very close attention to that when refactoring it.
  • What you need to add as a human is a detailed understanding of the sealing conditions, so you can identify the specific condition that is not satisfied.

There is a reason why it is hard to diagnose why sealing halted: because the sealing conditions are complex.

Oversimplifying the conditions where sealing halts really worries me:
Imagine we have a mainnet fire because sealing stoped. What would you prefer: a wrong diagnosis leading you to investigate an aspect that is not a problem? No diagnosis, requiring you draw your own conclusion from the data given.

I have very strong opinion about this topic: If we want algorithmic help to diagnose a sealing halt, we need to make sure that the diagnosis is accurate. Otherwise, it is counter-productive. My preference would be to revert the merge and either implement it properly or remove the diagnosis.


return fmt.Sprintf("unsealed block at height %v has been executed, the result has been incorporated on %v fork(s), "+
"but no result has received enough approvals, check the chunks_with_insufficient_approvals field for more details",
len(st.records),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
unsealed block at height %v is filled with parameter len(st.records)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
result has been incorporated on %v fork(s) This statement is only valid if there is one result. But we can have execution forks. Tow results might be incorporated in the same fork

Comment on lines +35 to 39
indices := make([]string, 0, len(chunksWithMissingApprovals))
for i, list := range chunksWithMissingApprovals {
indices = append(indices, fmt.Sprintf("chunk_index: %v, verifier_ids: %v", i, list))
}
r.entries["chunks_with_insufficient_approvals"] = indices
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, the sealing observation is structured as a nested json, which allows to post-process the message. I feel, we are flattening the data here, which unnecessarily removes structure:

  • Conceptually, we have pairs: (chunk_index, IDs of verifiers from which we miss approvals)

I feel it would be beneficial to preserve this structure in the json:

chunksInfo := make([]map[string]interface{}, 0, len(chunksWithMissingApprovals))
for i, list := range chunksWithMissingApprovals {
	chunk := make(map[string]interface{})
	chunk["chunk_index"] = i
	chunk["missing_approvals_from_verifiers"] = list
	chunksInfo = append(chunksInfo, chunk)
}
bytes, err := json.Marshal(r)
if err != nil {
	bytes = []byte("failed to marshal data about chunks with missing approvals")
}
r.entries["chunks_with_insufficient_approvals"] = string(bytes)

@AlexHentschel
Copy link
Member

this PR has been reverted in favour of https://github.com/onflow/flow-go/pull/1332/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants