-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basics of Cert-Count Non-Locking Telemetry #16676
Conversation
…to capture (and test for) duplicates.
For anyone looking in: after discussion internally, we've settled on it being more important not to lock (and block all issuance and revocation) on startup than to have perfectly correct metrics here. Instead, we'll do a "best effort" metric - which might include some double counting of certificate entries. If we do miscount, we want to make sure we 'overcount' rather than undercount the number of certs in storage. ( https://hashicorp.slack.com/archives/C0386B7KPHR/p1660332542341629 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, I do like where this is headed. Sorry though I went a little nuts on the comments...
} | ||
|
||
func (b *backend) decrementTotalRevokedCertificatesCount() { | ||
atomic.AddUint32(b.revokedCertCount, ^uint32(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this can't happen in the current code/usage I'm seeing with the two decrements, but should we add a guard here for b.revokedCertCount being 0 and we under-flowing the unit32? The only safe way I can think of this is this ugly code block
testInt := uint32(1)
for i := 0; i <= 5; i++ {
oldVal := atomic.LoadUint32(&testInt)
if oldVal == 0 {
break
}
newVal := oldVal - 1
if atomic.CompareAndSwapUint32(&testInt, oldVal, newVal) {
break
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit I don't like this. If someone ever saw an underflow it would be solvable by "turning off and turning on" and would be a really really helpful report that we aren't honouring the promise of "always overcount".
If someone else agrees with this solution, I'll go ahead and add it, but in general I don't think we can protect against people writing bad code in the future with uglier code now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for addressing all my concerns overall.
This is the basics of the Certificate Count Telemetry.
Lock is Removed. See Comment.
When we store a cert now, if certs haven't been counted, that is added to a slice which will be checked for double counting.
I'm pretty sure I got every place that a cert is stored: