-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulkheads in Metrics Only mode #72
Conversation
This is for greater diagnostics on Mjolnir breakers and bulkheads.
Looks good to me. The gauge logger has a few gnarly bits, but given that it's a logger that's probably okay as long as it doesn't deadlock or otherwise impact a thread that's doing the actual work. |
@robhruska is there something specific about the logger/ConcurrentDictionary in there that you feel could be avoided? |
Yeah, I'll comment inline. |
{ | ||
_concurrencyExceededLog.Debug($"BulkheadRejections since last gauge - [Bulkhead={bulkheadName}, BulkheadType={bulkheadType}, MaxConcurrent={maxConcurrent}, CountAvailable={countAvailable}, Rejections={currentCount}]"); | ||
// Remove the rejections we've just logged from the current rejection count | ||
_currentBulkheadsRejected.AddOrUpdate(bulkheadName, 0, (b, c) => c - currentCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be able to get rid of some of the code here that calculates deltas, and instead just keep a running incrementing count of rejections. The logging aggregator/search (e.g. Sumo Logic, Splunk) could possibly be used to find the per-window counts, depending on its capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yeh but it could be tricky to write that kind of query in something like Sumo Logic. The context of a log message is only valuable with past data. It would also reset to 0 on a restart/redeploy of the app so you would need to some how factor that into an higher level log aggregation.
_metricEvents.RejectedByBulkhead(bulkhead.Name, command.Name); | ||
throw new BulkheadRejectedException(); | ||
if (!onlyMetrics) throw new BulkheadRejectedException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will change the flow of semaphore.
We'll be in this code when TryEnter() will return false. That will indicate that we don't have space in a semaphore. Still then later on in that function we'll call Release() function on that semaphore which will increase it's CurrentValue
. This will potentially cause to increase maximum allowed value in the semaphore.
Scenario:
Assuming that the semaphore.CurrentCount = 2
and at the very same time we'll have 5 threads try to enter it, 2 will get inside (with above if not being hit) resulting with CurrentCount to be set to 0 and after execution increased to 2 again. The problem starts with remaining 3. Normally (without onlyMetrics config) those would throw and then would not call.Release() code below. But because we are not returning function here, Release() will be call and we'll end up with CurrentCount = 5.
So I think that what we created here will be a self-regulated semaphore ;p. Which over time have CurrentCount equal to number of maximum concurrent calls. The think is I guess that was not the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good critique @damtur, you're totally correct there. I'll refactor a bit more and cc you again for another pass.
cc @damtur. I think this should be good now. I'm going to add a couple of unit tests around this now. |
@@ -123,7 +125,7 @@ public TResult ExecuteWithBulkhead<TResult>(SyncCommand<TResult> command, Cancel | |||
} | |||
finally | |||
{ | |||
bulkhead.Release(); | |||
if (!onlyMetrics) bulkhead.Release(); | |||
|
|||
_metricEvents.LeaveBulkhead(bulkhead.Name, command.Name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more question. Do we want the LeveBulkhead metrics for those calls which would have tripped bulkhead if only metrics was not set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do, as it will give the full picture. In reality the call has gone through the bulkhead successfully, because it's disabled. But I could see it skewing metrics if they were being collected in a different way to logs, so I could be swayed to not make the Enter and Leave calls in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, as long as it's not confusing and clear what we want to achieve here I'm ok with anything. Just wanted to highlight how it works now, and make sure this is the intention.
v3.2.0
This adds the ability to put Bulkheads into a metrics only mode, through configuration at a per bulkhead or global level.
If bulkhead concurrency isn't properly tuned in a production scenario then it's possible that you get a lot of
BulkheadRejectionExceptions
that actually would have been safe to execute without a throttle. These config keys add the ability to quickly turn off bulkhead rejections during a production scenario like this.We felt like it was still important to get metrics for rejections during a time when they're effectively not rejecting, as this may help with tuning the concurrency levels in the future.
This PR also adds an
IMetricEvent
implementation, withGaugeLogMetrics
to help with some of the diagnostic actions mentioned above.The wiki is being updated with relevant information in the
MARM-UpdateDocsForBulkheadMetricsOnly
branch of the wiki repo.