-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Back Error/Debug Logs #7922
Conversation
@@ -92,6 +93,7 @@ func (s *Service) validateBeaconBlockPubSub(ctx context.Context, pid peer.ID, ms | |||
s.pendingQueueLock.RUnlock() | |||
|
|||
if err := helpers.VerifySlotTime(uint64(s.chain.GenesisTime().Unix()), blk.Block.Slot, params.BeaconNetworkConfig().MaximumGossipClockDisparity); err != nil { | |||
log.WithError(err).WithField("blockSlot", blk.Block.Slot).Debug("Rejecting incoming block.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we revert this line?
1.) It's odd to have Rejecting incoming block
just for this time check. If we do want this then we should add this rejecting incoming back to every validation step. (Even for debug)
2.) This hasn't been relevant since we had the time bug
3.) Maybe we also can downgrade it to trace? and something like this for all the validation steps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its pretty important to know if a block has been rejected, unlike attestations blocks occur only once every 12 seconds. Node operators will have very little idea if their local clocks are in sync with the rest of the network if these logs disappear. They would see poor attestation performance, voting on the wrong head etc but wouldn't be able to tie it in with an out of sync clock if this does happen. Rejecting a block happens rarely enough that it is fine to keep it in debug. On 2) , that isn't really true, a lot of users still see that log locally when their clock is out of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. My point is we should value consistency here. Not just have debug logging for 1-2 of the steps (due to previous incidents) and not all the steps. What do you think about?
1.) Add debug logging for rest of the steps (e.g. block descendants, signature... etc)
2.) Conform to the same prefix. (e.g. Could not validate incoming back:
)
Let me know if you want to do that here? I'm happy to do it as well either push it to this PR or open a new PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good 👍 , agree with standardizing logs for all our debug messages here otherwise its ugly looking. I can fix this tomorrow too, or you are welcome to also clean it up if you are free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. I can help. Thank you!
@@ -80,6 +80,7 @@ func (s *Service) validateBeaconBlockPubSub(ctx context.Context, pid peer.ID, ms | |||
} | |||
// Check if parent is a bad block and then reject the block. | |||
if s.hasBadBlock(bytesutil.ToBytes32(blk.Block.ParentRoot)) { | |||
log.Debugf("Received block with root %#x that has an invalid parent %#x", blockRoot, blk.Block.ParentRoot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we revert this line?
1.) It's odd to have a particular debug just for hasBadBlock
. If we do want to keep it then we should add it for the rest of the validation step
2.) I think this was only needed for debugging a bug, this hasn't shown up in a while
3.) Maybe we also can downgrade it to trace? and something like this for all the validation steps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still relevant, users have reported seeing this log and right after their node being stuck before resync
saves it. We still have an unresolved issue with the proposer index cache which leads to any of its descendant blocks being rejected despite all blocks being valid. Removing this debug log, makes it hard to really know what is happening.
What type of PR is this?
UX
What does this PR do? Why is it needed?
-[x] In #7914 some important sync logs were removed or pushed down to debug. This PR adds them back.
Which issues(s) does this PR fix?
N.A
Other notes for review