New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix zero-filled and wrongly emitted metrics #710

Merged
merged 3 commits into from Jul 23, 2018

Conversation

Projects
None yet
4 participants
@na--
Member

na-- commented Jul 12, 2018

The fix is not to emit metric samples when a VU's context is canceled. This should fix #708, but it still needs unit tests and more checks in general. For example, are there some metrics we actually want to emit even from VUs that are canceled?

@na-- na-- requested review from luizbafilho and robingustafsson Jul 12, 2018

@na--

This comment has been minimized.

Show comment
Hide comment
@na--

na-- Jul 12, 2018

Member

Here's a specific example where I'm not sure if we should "fix" the old behavior. That's the metric for data_sent, data_received and iteration_duration that's emitted at the end of an iteration. If a VU is canceled halfway through its final iteration, don't we want those metrics?

Member

na-- commented Jul 12, 2018

Here's a specific example where I'm not sure if we should "fix" the old behavior. That's the metric for data_sent, data_received and iteration_duration that's emitted at the end of an iteration. If a VU is canceled halfway through its final iteration, don't we want those metrics?

@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Jul 12, 2018

Codecov Report

Merging #710 into master will decrease coverage by <.01%.
The diff coverage is 46.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #710      +/-   ##
==========================================
- Coverage   64.39%   64.39%   -0.01%     
==========================================
  Files         101      101              
  Lines        8277     8302      +25     
==========================================
+ Hits         5330     5346      +16     
- Misses       2599     2608       +9     
  Partials      348      348
Impacted Files Coverage Δ
stats/stats.go 55.5% <0%> (-1.85%) ⬇️
lib/netext/dialer.go 36.23% <0%> (-1.65%) ⬇️
js/modules/k6/ws/ws.go 72.13% <100%> (+0.7%) ⬆️
js/runner.go 79.52% <100%> (+0.6%) ⬆️
js/modules/k6/k6.go 88.09% <100%> (+0.14%) ⬆️
js/modules/k6/http/http_request.go 80% <100%> (ø) ⬆️
js/modules/k6/metrics/metrics.go 93.33% <100%> (ø) ⬆️
core/local/local.go 77.89% <71.42%> (-1.22%) ⬇️
core/engine.go 91.94% <0%> (+2.36%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4875fb...48e84ce. Read the comment docs.

codecov-io commented Jul 12, 2018

Codecov Report

Merging #710 into master will decrease coverage by <.01%.
The diff coverage is 46.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #710      +/-   ##
==========================================
- Coverage   64.39%   64.39%   -0.01%     
==========================================
  Files         101      101              
  Lines        8277     8302      +25     
==========================================
+ Hits         5330     5346      +16     
- Misses       2599     2608       +9     
  Partials      348      348
Impacted Files Coverage Δ
stats/stats.go 55.5% <0%> (-1.85%) ⬇️
lib/netext/dialer.go 36.23% <0%> (-1.65%) ⬇️
js/modules/k6/ws/ws.go 72.13% <100%> (+0.7%) ⬆️
js/runner.go 79.52% <100%> (+0.6%) ⬆️
js/modules/k6/k6.go 88.09% <100%> (+0.14%) ⬆️
js/modules/k6/http/http_request.go 80% <100%> (ø) ⬆️
js/modules/k6/metrics/metrics.go 93.33% <100%> (ø) ⬆️
core/local/local.go 77.89% <71.42%> (-1.22%) ⬇️
core/engine.go 91.94% <0%> (+2.36%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4875fb...48e84ce. Read the comment docs.

@luizbafilho

This comment has been minimized.

Show comment
Hide comment
@luizbafilho

luizbafilho Jul 12, 2018

Contributor

In my opinion, it is ok to lose a single iteration metrics.

Contributor

luizbafilho commented Jul 12, 2018

In my opinion, it is ok to lose a single iteration metrics.

@na--

This comment has been minimized.

Show comment
Hide comment
@na--

na-- Jul 12, 2018

Member

Maybe, but it won't be just a single iteration, it will be one iteration for every VU scaled down, so anywhere from 0 to hundreds. Also, imagine that when a VU's context is canceled in the middle of its iteration due to scaling the number of VUs down, some of the HTTP requests in that iteration have already been completed and have their metrics sent to the engine. So we might want to send at least data_sent and data_received, and maybe even the iteration duration...

Member

na-- commented Jul 12, 2018

Maybe, but it won't be just a single iteration, it will be one iteration for every VU scaled down, so anywhere from 0 to hundreds. Also, imagine that when a VU's context is canceled in the middle of its iteration due to scaling the number of VUs down, some of the HTTP requests in that iteration have already been completed and have their metrics sent to the engine. So we might want to send at least data_sent and data_received, and maybe even the iteration duration...

@luizbafilho

This comment has been minimized.

Show comment
Hide comment
@luizbafilho

luizbafilho Jul 12, 2018

Contributor

Yeah, but hundreds from many thousands is not that much. I fear that to accomplish not losing a single metric, will lead to another huge refactoring.

Contributor

luizbafilho commented Jul 12, 2018

Yeah, but hundreds from many thousands is not that much. I fear that to accomplish not losing a single metric, will lead to another huge refactoring.

@na--

This comment has been minimized.

Show comment
Hide comment
@na--

na-- Jul 12, 2018

Member

No, in this case the fix is very simple, for metrics we want to preserve even when a VU's context is canceled, I just leave them the old way instead of using the helper function I did in this PR. That is, revert to using state.Samples <- someSampleContainer instead of the new stats.PushIfNotCancelled(ctx, state.Samples, someSampleContainer). In essence - un-"fix" them 😄

Member

na-- commented Jul 12, 2018

No, in this case the fix is very simple, for metrics we want to preserve even when a VU's context is canceled, I just leave them the old way instead of using the helper function I did in this PR. That is, revert to using state.Samples <- someSampleContainer instead of the new stats.PushIfNotCancelled(ctx, state.Samples, someSampleContainer). In essence - un-"fix" them 😄

@robingustafsson

This comment has been minimized.

Show comment
Hide comment
@robingustafsson

robingustafsson Jul 16, 2018

Member

We should definitely not loose data on the number of requests sent, as well as data sent and received as we use those in the cloud execution functionality (with IP address info) to track what systems we're hitting with traffic and how much, for abuse prevention and audit trails.

Member

robingustafsson commented Jul 16, 2018

We should definitely not loose data on the number of requests sent, as well as data sent and received as we use those in the cloud execution functionality (with IP address info) to track what systems we're hitting with traffic and how much, for abuse prevention and audit trails.

@na--

This comment has been minimized.

Show comment
Hide comment
@na--

na-- Jul 16, 2018

Member

Hmm after thinking about this a bit more, I realized that if we send the data_sent, data_received metrics, with the current implementation we'll also send the iteration_duration, which we definitely shouldn't do for VUs stopped in the middle of an iteration, because it could quite heavily skew the rest of the iteration_duration stats down... So either we leave it as currently is in this pull request (don't send any metric samples once a VU/iteration is cancelled) or I can add an extra parameter to netext/Dialer.GetTrail() to exclude the iteration_duration metric.

Member

na-- commented Jul 16, 2018

Hmm after thinking about this a bit more, I realized that if we send the data_sent, data_received metrics, with the current implementation we'll also send the iteration_duration, which we definitely shouldn't do for VUs stopped in the middle of an iteration, because it could quite heavily skew the rest of the iteration_duration stats down... So either we leave it as currently is in this pull request (don't send any metric samples once a VU/iteration is cancelled) or I can add an extra parameter to netext/Dialer.GetTrail() to exclude the iteration_duration metric.

@na--

This comment has been minimized.

Show comment
Hide comment
@na--

na-- Jul 17, 2018

Member

While finishing this up and writing the test, I realized that with the real-time metrics I'd also inadvertently reverted the decision we made in #652... That is, now even unfinished iterations emit an iterations metric 😑 ... I'll fix that as well and also test for it.

Member

na-- commented Jul 17, 2018

While finishing this up and writing the test, I realized that with the real-time metrics I'd also inadvertently reverted the decision we made in #652... That is, now even unfinished iterations emit an iterations metric 😑 ... I'll fix that as well and also test for it.

@na--

This comment has been minimized.

Show comment
Hide comment
@na--

na-- Jul 17, 2018

Member

Slight correction to the previous statement. Things works as expected when duration is specified, any metrics emitted after the specified duration are discarded, including iterations. The problem with iterations happens when there are stages that scale down the number of VUs. The VU iterations that are canceled in the middle of their execution (bit still before the actual test ends) due to that scaling down emit the unexpected iterations metric.

Member

na-- commented Jul 17, 2018

Slight correction to the previous statement. Things works as expected when duration is specified, any metrics emitted after the specified duration are discarded, including iterations. The problem with iterations happens when there are stages that scale down the number of VUs. The VU iterations that are canceled in the middle of their execution (bit still before the actual test ends) due to that scaling down emit the unexpected iterations metric.

na-- added some commits Jul 17, 2018

@na-- na-- changed the title from [WIP] Fix zero-filled metrics to Fix zero-filled and wrongly emitted metrics Jul 17, 2018

@na--

This comment has been minimized.

Show comment
Hide comment
@na--

na-- Jul 17, 2018

Member

With the latest commit this should hopefully be done. I'll take another look tomorrow just in case, and at least another pair of eyes would be very helpful, but I think that I've fixed all of the issues that I know of. And not only fixed the bugs, but with this patch the metrics emission when scaling down VUs would actually be a lot better than before we had real-time metrics.

Member

na-- commented Jul 17, 2018

With the latest commit this should hopefully be done. I'll take another look tomorrow just in case, and at least another pair of eyes would be very helpful, but I think that I've fixed all of the issues that I know of. And not only fixed the bugs, but with this patch the metrics emission when scaling down VUs would actually be a lot better than before we had real-time metrics.

@na-- na-- merged commit 9531175 into master Jul 23, 2018

5 checks passed

ci/circleci: build-docker-images Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
ci/circleci: test Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
license/cla Contributor License Agreement is signed.
Details

@na-- na-- deleted the rt-samples-fix branch Jul 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment