-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Circuit breaker spike march16 added windows #1
Circuit breaker spike march16 added windows #1
Conversation
…s ensuring that the windows are being used correctly by the circuit breaker.
…mplement the interface. This allows for changing the implementation of the health metrics without changing the code in the controller. Implemented a new strategy of health metrics which does not have rolling windows, but instead a single window. At the moment the decision on which 'strategy' to use is made in the controller, but this can be moved out of the controller and be handled in the syntax along with checks on the parameters.
…ice duration value.
@kristianhald Agree. Aiming to complete a read of this later today. No big issues so far 👍 |
// therefore not be removed from the queue. | ||
if (now - _windows.Peek().StartedAt < _timesliceDuration) | ||
break; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kristianhald Are you happy if, around here, we simply do the following:
while (_windows.Count > 0 && (now - _windows.Peek().StartedAt >= _timesliceDuration))
_windows.Dequeue();
It keeps the logic of this and code above similar, and seems straightforwardly enough comprehensible: "while there are out-of-date slices at the front of the queue, remove them".
Either you can commit this (becomes part of the PR) or I can just do it after merging.
@kristianhald Thanks again for this great contribution! Having read over the code, my only detail comment is the one above around the Reading over the tests next... |
@reisenberger Thank you. Regarding the while loop, 👍. |
Hey @kristianhald. Here is a conceptual issue. I observe that, from our implementation, the windows we create within the timeslice are not necessarily adjacent. For instance, imagining windows of size 1, if the first action records at t = 0, and the second action at t = 1.5, we will be running two windows from 0->1, and 1.5->2.5, with a 0.5 gap in between them. Reasoning suggests this has no effect on the statistics, as the gaps only represent 'empty' times with no results. I considered an outside case: imagining windows of size 1 in slice size 10, we could have a window at t = 0 and another at t = 9.9, this latter in theory extending +1 to t = 10.9 ... Again, reasoning suggests that if any statistic comes in at t >= 10, the t = 0 window will be dropped, so at no point will we be considering statistics for a period >10. However, it might be nice (for regression value, in case anyone refactors...) to have a test proving this. It looks like this could be done by duplicating the test Really liking the look of the tests. In general: Do you agree that the non-adjacent windows issue has no effect on statistics, or am I missing something? Perhaps controversially, my first instinct is not to code to make windows adjacent before merging. It adds complication and doesn't add anything to solving the problem at hand. Following the mantra 'Keep the code as simple as possible to deliver the job needed', we don't need to correct for this, and the current creation/disposal of window code reads intuitively. However, there would be future scenarios (eg emitting regular statistics, as Hystrix does) where it would become important. And interested in your thoughts... Note: I will document this also in the main thread on App-vNext/Polly, since it's important enough to have in the history for future maintenance. Let me have your thoughts. Thanks! |
... unless you already have such a test ... (not quite read all; called away from kbd now) |
I am under the impression, like you state, that as long as we remove buckets/windows that are older than Did a little thinking and it might actually be quite easy to add the logic for adjacent buckets/windows. Lets take the following scenario: At the current moment in time, DateTime.Now() reads 26.457 seconds and an error occurs. As the 26.5 bucket is in the future, then this cannot be created. Therefore the bucket we are in is 25.8. The question is then, how do we get from 26.457 to 25.8 without having to create all buckets from 23 seconds. The answer (I think 😄. Just throwing a quick idea here) is the following calculation: If we take the above numbers we should get: It will still have empty buckets as holes, but it should align the buckets to each other. Question: Would aligning the buckets actually solve the statistics issue?
When you say 'regular' do you mean bucket/window duration regular, every 1 second regular or a delegate call, when an event occurs (Success or Error)? |
@reisenberger Don't have such a test and I believe that you are correct that we should have one to ensure that it works as expected. I think that the test would ensure the outer case is valid. |
If statistics should be provided, what is the time interval given? If we have a window duration of 0.7 seconds will that match the time interval of the statistics Polly gives or should Polly give statistics for every second or every five seconds? It is an entirely different discussion, so do not believe we should try to solve it here. I just wanted to point out that if the window durations do not align with the statistical interval that is being emitted, then it will not be possible for Polly to give the correct statistics based on the data from the windows, as the statistical interval can begin or end in the middle of a window, thereby not knowing which of the events from the window belongs in the statistical interval and which are outside. I think? |
…her, then having a bucket that starts just before the timeslice duration ends, does not result in the calculation of the HealthCount being different, than if the windows were aligned.
At regular intervals (every call = too frequent). But the frequency of emitting these statistics would absolutely be aligned to either Options for adding this (later) to Polly:
OR (more timely, fractionally more code)
Absolutely. See the (ambitious but exciting!) roadmap (do add feedback under the placeholder issue if you have thoughts!). My long-term goal is to allow variety of Polly resilience strategies to be chained together in a kind of wrap / pipeline syntax (see the roadmap), but with a more flexible model than Hystrix. So that users can choose which elements they want to combine (rather than have to learn/consider the whole lot) - either choose just one strategy, or make a more complicated wrap, as they choose. Rather than emitting statistics now, my (current) vision is of adding emitting more broadly-encompassing statistics to Polly after more of the pipeline approach/more of the resilience strategies are in place ... not yet got around to adding this to the roadmap ... |
Re:
Nice! Building on your suggestion, maybe:
Heh, we even have I marginally prefer this (smaller subtraction, modulus; easier comprehension), but it leaves an issue [another code branch needed] when the circuit is reset and there isn't a previous window ... if aligning across circuit resets matters ... |
Although re:
if users have complete flexibility, eg of setting circuit-breaker and retry intervals entirely unrelated, then wrapping those policies together and emitting time-regulated statistics from the combined [but different time-intervalled] wrap could (would) be problematic. Not got a final view on this ... but currently seeing stats in context of bigger vision (rather than, eg, adding stats only for breaker now). |
Weeeeeeeell, we could be a little lazy and define a reset as the 'StartTime' change: public void Reset_NeedsLock()
{
_currentWindow = null;
_windows.Clear();
} to public void Reset_NeedsLock()
{
_currentWindow = null;
_windows.Clear();
long now = SystemClock.UtcNow().Ticks;
_currentWindow = new HealthCount { StartedAt = now };
_windows.Enqueue(_currentWindow);
} That would change the start time of the entire timeslice, but allow the logic to still use the current windows StartedAt. If the system should still have windows aligned acrossed resets, then something like this would probably solve it (Haven't thought through all iterations): public void Reset_NeedsLock()
{
ActualiseCurrentMetric_NeedsLock();
_windows.Clear();
_currentWindow = new HealthCount { StartedAt = _currentWindow.StartedAt };
_windows.Enqueue(_currentWindow);
} |
This code assumes that the ActualiseCurrentMetric aligns the windows next to each other. |
Hey @kristianhald Briefly to say, don't add this (aligning) to current PR yet. It's great stuff, just want to step back reflect whether we're adding it or not ... (tho the more concise we punch it down, the better candidate it gets ...) |
No problem. I believe that both would work in ensuring alignment of windows even across resets, but it would only ensure alignment of windows and not provide any more or less exactness of the breaker. In regards to overhead of the calculation, I do not believe it so much, since it would only be done every window duration. But it should be performance tested and we should have a baseline to compare against. So if we believe the current implementation solves the feature, then I suggest that should be the baseline for any additional implementation. I also would like to have the feature packaged, so I can begin using it 😃. |
Ok, back to possible solutions: how about this? (building again on yours) private void ActualiseCurrentMetric_NeedsLock(long alignTo = 0) // or a separate overload rather than optional parameter, probably clearer
{
if (alignTo == 0) alignTo = _currentWindow.StartedAt;
long now = SystemClock.UtcNow().Ticks;
if (_currentWindow == null || now - _currentWindow.StartedAt >= _windowDuration)
{
_currentWindow = new HealthCount { StartedAt = now - ((now - alignTo) % windowDuration) };
_windows.Enqueue(_currentWindow);
}
while (_windows.Count > 0 && (now - _windows.Peek().StartedAt >= _timesliceDuration))
_windows.Dequeue();
} with: public void Reset_NeedsLock()
{
long alignWindowAfterReset = _currentWindow != null ? _currentWindow.StartedAt : SystemClock.UtcNow();
_windows.Clear();
_currentWindow = null;
ActualiseCurrentMetric_NeedsLock(alignWindowAfterReset);
} (I slightly prefer this, as expresses clearly what the alignment about, and avoids enqueuing, clearing, then enqueuing ... easier later to understand why ... but all building on your suggestion) Happy for views. Will have a think, your original 'start of universe' had simplicity too |
could be I'm making it too complicated. Could just maintain a separate private long alignTo on the class, and update it periodically (if necessary, even). Switching back to getting merge-down in hand without this, for now |
I actually also prefer the 'start of universe' setting. It would mean that we might start the circuit breaker a bit unaligned (When the circuit breaker is created first time, then the first window might be cut of). However, since it normally takes a little time for an application to start up, then I do not think this would be noticed. Having the 'start of universe' setting would mean that windows are always aligned even when reset, since it does not change. What would be the benefit of setting alignTo different from zero? I am not certain, but I would believe it takes the same amount of computation to calculate Should reset change the alignment of the windows or just continue as if the windows have been cleared? The latter would signify that how the timeslice is started is not affected, only the contents of the timeslice is affected, when doing a reset. EDIT: Its late. Going to bed 😄 Also this might be a talk on the more public forum. |
Excellent tests. Merged to my fork. |
Agree. Do you have any objection if I copy important parts of this conversation (correctly attributed to each person) over to the main issue 41 on Polly? |
Nope, I think it will be a very good idea to do it, so that we can get input from other people 😃 |
I think that doing a merge with your fork is probably a good idea, if your review doesn't uncover any big issues.
Any future changes, I can just create a new branch and another pull request.