Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INDY-1468] move avgLatency computing to EMA algorithm #812

Merged
merged 5 commits into from
Jul 19, 2018

Conversation

lampkin-diet
Copy link

Signed-off-by: Andrew Nikitin andrew.nikitin@dsr-corporation.com

Andrew Nikitin added 2 commits July 13, 2018 17:15
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
self._moving_average(curr_avg_lat,
duration))

def _moving_average(self, p0, p1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming p0 to something like accum or old_accum and p1 back to next_val will make intentions more clear


def add_request(self, ordered_ts):
self.update_time(ordered_ts)
self.reqs_in_window += 1

def _moving_average(self, next_val):
def add_duration(self, identifier, duration):
if identifier not in self.avg_latencies:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setdefault can be used here to avoid triple lookup


def add_request(self, ordered_ts):
self.update_time(ordered_ts)
self.reqs_in_window += 1

def _moving_average(self, next_val):
def add_duration(self, identifier, duration):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need separate add_duration and add_request? Why not single add_request(id, timestamp, tto/time_to_order/latency)?

Also can we come up with some name for identifier so it can be more clear what it identifies?

total_reqs, curr_avg_lat = self.avg_latencies[identifier]
total_reqs += 1
self.avg_latencies[identifier] = (total_reqs,
self._moving_average(curr_avg_lat,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using moving average function to calculate average latency here, and it's using alpha which is based on throughput_min_cnt. Either "throughput" part should be removed from names of variables that control alpha, or separate set for latencies should be created.

Also here we're calculating moving average over latencies taking into account only their number, not their temporal placement. When we do the same with throughput we get measurements at well defined intervals, so we can really say that throughput is averaged over window with well defined duration. Yet this is not a case with latencies now, do we really intend that here?

pass
def get_avg_latency(self, identifier):
if identifier not in self.avg_latencies:
return .0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not None?

return None
self.update_time(request_time)
return self._moving_average(self.reqs_in_window / self.inner_window)
return self._moving_average(self.throughput, self.reqs_in_window / self.throughput_window_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some thinking I came to conclusion that if we want to get smooth averages between window boundaries it's not that simple, and what's done here is not helping in any way. Imagine you had throughput of 1, then you measure it before you get another request and this gives you values less than 1, which is yet constant until you get a request, after which throughput jumps back to 1 and remains here until next window starts, at which point it jumps below 1 again. So, I believe right thing to do here is to either just return self.throughput or come up with better solution (which can turn out quite complicated). I would just stick with simple options for now.

assert rm.avg_latencies['some_client_identifier'][1] != 0


def test_avg_latency_accuracy(request_measurement):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests with different duration values

… max latency check

Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
plenum/config.py Outdated
@@ -137,6 +137,11 @@
LatencyWindowSize = 30
LatencyGraphDuration = 240

# This parameter defines minimal count of accumulated latencies for each client
MIN_LATENCY_COUNT = 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think requiring 50 for each client is too much

plenum/config.py Outdated
# This parameter defines minimal count of accumulated latencies for each client
MIN_LATENCY_COUNT = 50
# This parameter defines coefficient alpha, which represents the degree of weighting decrease.
LATENCY_ALPHA = 0.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better than the Alpha based on MinCount?

…LATENCY_COUNT

Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
self.reqs_in_window = 0
self.throughput = 0
self.inner_window = inner_window
self.min_cnt = min_cnt
self.throughput_window_size = throughput_window_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since class now is named ThroughputMeasurement probably there's no longer need in adding "thoughput" to some fields and parameter names. What about thoughput_window_size -> window_size, throughput_min_cnt -> min_window_count?


def add_request(self, ordered_ts):
self.update_time(ordered_ts)
self.reqs_in_window += 1

def _moving_average(self, next_val):
def _accumulate(self, old_accum, next_val):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always called with old_accum= self.throughput, probably this parameter can be removed

self.inner_window = inner_window
self.min_cnt = min_cnt
self.throughput_window_size = throughput_window_size
self.throughput_min_cnt = throughput_min_cnt
self.first_ts = time.perf_counter()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timestamps are injected everywhere except here. This is both inconsistency and can lead to problems if someone decides to use some other timing function instead of time.perf_counter.

def add_duration(self, identifier, duration):
if identifier not in self.avg_latencies:
self.avg_latencies[identifier] = (0, .0)
total_reqs, curr_avg_lat = self.avg_latencies[identifier]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 dict lookups can be replaced with just one:

total_reqs, curr_avg_lat = self.avg_latencies.get(identifier, (0, .0))

if avg_lat:
if cid not in avgLatencies:
avgLatencies[cid] = []
avgLatencies[cid].append(avg_lat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about replacing triple lookup with

avgLatencies.setdefault(cid, []).append(avg_lat)

Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
@ashcherbakov ashcherbakov merged commit af19746 into hyperledger:master Jul 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants