Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: optimise rate calculation #810

Merged
merged 22 commits into from
Jul 7, 2023
Merged

chore: optimise rate calculation #810

merged 22 commits into from
Jul 7, 2023

Conversation

yhl25
Copy link
Contributor

@yhl25 yhl25 commented Jun 28, 2023

Closes #788

  • Optimized rate calculation as a slope ~ delta(count)/delta(t), instead of summing the count deltas between each of the timestamp interval pairs
  • Optimized the search for startIndex of TimestampedCounts from linear to binary
  • Rate calculation remains linear in case of pod restart, which is a rare case

veds-g and others added 15 commits June 22, 2023 18:23
Signed-off-by: veds-g <guptavedant2312@gmail.com>
Signed-off-by: Yashash H L <yashashhl25@gmail.com>
Signed-off-by: veds-g <guptavedant2312@gmail.com>
Signed-off-by: Yashash H L <yashashhl25@gmail.com>
Signed-off-by: veds-g <guptavedant2312@gmail.com>
Signed-off-by: Yashash H L <yashashhl25@gmail.com>
Signed-off-by: Yashash H L <yashashhl25@gmail.com>
Signed-off-by: Yashash H L <yashashhl25@gmail.com>
Signed-off-by: Yashash H L <yashashhl25@gmail.com>
Signed-off-by: veds-g <guptavedant2312@gmail.com>
Signed-off-by: veds-g <guptavedant2312@gmail.com>
Signed-off-by: veds-g <guptavedant2312@gmail.com>
Signed-off-by: veds-g <guptavedant2312@gmail.com>
Signed-off-by: veds-g <guptavedant2312@gmail.com>
@yhl25 yhl25 marked this pull request as ready for review July 6, 2023 06:08
@yhl25 yhl25 requested review from whynowy and vigith as code owners July 6, 2023 06:08
@whynowy
Copy link
Member

whynowy commented Jul 6, 2023

Can anybody explain to me why we introduced the concept of WidowClosed?

@vigith
Copy link
Member

vigith commented Jul 6, 2023

Can anybody explain to me why we introduced the concept of WidowClosed?

#795. Ideally, this should be removed by #788

@whynowy
Copy link
Member

whynowy commented Jul 6, 2023

Can anybody explain to me why we introduced the concept of WidowClosed?

#795. Ideally, this should be removed by #788

I was expecting the delta calculation happens during GetRate() which is memory operation only, would not cause any performance issue, there's no need to introduce the complexity to calculate it during metrics fetching.

@vigith
Copy link
Member

vigith commented Jul 6, 2023

I was expecting the delta calculation happens during GetRate() which is memory operation only, would not cause any performance issue, there's no need to introduce the complexity to calculate it during metrics fetching.

Won't GetRate be called frequently for auto-scaling?

@whynowy
Copy link
Member

whynowy commented Jul 6, 2023

I was expecting the delta calculation happens during GetRate() which is memory operation only, would not cause any performance issue, there's no need to introduce the complexity to calculate it during metrics fetching.

Won't GetRate be called frequently for auto-scaling?

3 times in 30 seconds.


// finalize the window by setting isWindowClosed to true and delta to the calculated value
// CloseWindow closes the window
func (tc *TimestampedCounts) CloseWindow() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this since the window is not closed only for the latest timestamp (last element of the queue) which we skip for rate calculation for the same reason.

Comment on lines 49 to 52
items[0].CloseWindow()
default:
// if the queue has more than one element, we close the window for the most recent element
items[n-1].CloseWindow(items[n-2])
items[n-1].CloseWindow()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are calling CloseWindow for all the cases? is it required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@veds-g
Copy link
Contributor

veds-g commented Jul 7, 2023

Can anybody explain to me why we introduced the concept of WidowClosed?

#795. Ideally, this should be removed by #788

I was expecting the delta calculation happens during GetRate() which is memory operation only, would not cause any performance issue, there's no need to introduce the complexity to calculate it during metrics fetching.

Delta calculation only happens once GetRates() is called, prior to that just the counts are fetched and updated in the queue.

yhl25 added 2 commits July 7, 2023 10:10
Signed-off-by: Yashash H L <yashashhl25@gmail.com>
@yhl25 yhl25 requested a review from vigith July 7, 2023 05:28
Signed-off-by: veds-g <guptavedant2312@gmail.com>
@veds-g veds-g merged commit 5639e5c into main Jul 7, 2023
14 checks passed
@veds-g veds-g deleted the optimise-rate branch July 7, 2023 15:04
// TODO: revisit this logic, we can just use the slope (counts[endIndex] - counts[startIndex] / timeDiff) to calculate the rate.
rate := getDeltaBetweenTimestampedCounts(counts[startIndex], counts[endIndex], partitionName) / float64(timeDiff)

// positive slope, meaning there was no restart in the last lookback seconds
Copy link
Member

@KeranYang KeranYang Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

positive slope, meaning there was no restart in the last lookback seconds

@yhl25 @veds-g , does this statement hold true when a pod restarted and then processed more messages than before restart?

Consider a single pod single partition scenario and let's assume the pod process 5 messages per second.
t1 -> 5
t2-> 10
t3 -> 0 (restart)
t4 -> 5
t5 -> 10
t6 -> 15

In this case, getDeltaBetweenTimestampedCounts(t1, t6) returns a positive delta but that doesn't mean no restart, right? It will also return us an incorrect rate of (15-5)/5 = 2 messages/second ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is indeed a problem; for now, i overlooked it because this can happen only if the pods are flapping, which will cause the prev value before restart to be closer to the current value.

please open an issue to track this, this has to be fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is, still use the approach of calculating each delta, again, it's just memory operation, won't have performance issue.

Copy link
Member

@KeranYang KeranYang Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whynowy +1. I will use a separate PR to address. #847

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Processing Rate calculation optimization
5 participants