Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix VariableArrivalRate not working well #1285

Merged
merged 2 commits into from
Apr 14, 2020

Conversation

mstoykov
Copy link
Collaborator

@mstoykov mstoykov commented Dec 19, 2019

The previous implementation worked basically as follows:
A list of changes to the arrival rate is generated where there is a
minimum time between this changes (currently 250ms). This means that for
any ramp-up/down there is a step every 250ms (more or less).
After this is generated a goroutine will read that list and send the
change on a channel at the appropriate time. During ramp-up/down this is
every 250ms (more or less).
Another goroutine will be receiving those changes and resetting a timer
to the new value. And here is where the problem lies:
If the arrival rate is more then 1 iteration each 250ms this means that
it will never trigger a start of a VU.
The extreme example is having ramp-up to 4 iteration per second - this
is exactly 1 iteration each 250ms, which means that for the whole
duration up to end of the ramp up there will be zero iterations started.

In order to completely remove this I went the step further. The way we
seperate the interval in small pieces reminded me of integrals and
is one of the very easy cases for using integrals.
So I just calculate the integral of the function that is the number of
VUs over time. The answer is how many ioterations need to be done there,
so if I reverse this and instead calculate for how much time I will get
1 I get when the first iteration should start. I can do that for any
iteration number and if there is no result then obviously we can get
that :D.

This also has the awesome sideeffect that if we split the executib in 10
we only need to calculate 1/10th of the integrals on each instance.

@mstoykov mstoykov force-pushed the fix/VariableArrivalRate/wrongStepping branch from 052120d to eb65fed Compare December 19, 2019 07:53
@codecov-io
Copy link

codecov-io commented Dec 19, 2019

Codecov Report

Merging #1285 into new-schedulers will decrease coverage by 0.05%.
The diff coverage is 97.22%.

Impacted file tree graph

@@                Coverage Diff                 @@
##           new-schedulers    #1285      +/-   ##
==================================================
- Coverage           76.41%   76.36%   -0.06%     
==================================================
  Files                 162      162              
  Lines               12838    12809      -29     
==================================================
- Hits                 9810     9781      -29     
  Misses               2511     2511              
  Partials              517      517              
Impacted Files Coverage Δ
lib/executor/variable_arrival_rate.go 95.55% <97.14%> (-0.66%) ⬇️
lib/execution_segment.go 91.11% <100.00%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6758233...d68ebe3. Read the comment docs.

@mstoykov
Copy link
Collaborator Author

a recording showing an extreme case

@mstoykov mstoykov force-pushed the fix/VariableArrivalRate/wrongStepping branch from eb65fed to 9b59904 Compare January 22, 2020 09:59
lib/executor/variable_arrival_rate.go Outdated Show resolved Hide resolved
lib/executor/variable_arrival_rate.go Outdated Show resolved Hide resolved
@mstoykov mstoykov force-pushed the fix/VariableArrivalRate/wrongStepping branch from cf6d82a to 32ec8ce Compare March 27, 2020 10:54
@mstoykov mstoykov changed the base branch from new-schedulers to GetStripedOffsetsImplementation March 27, 2020 10:55
@mstoykov mstoykov marked this pull request as ready for review March 27, 2020 10:55
@mstoykov mstoykov force-pushed the GetStripedOffsetsImplementation branch from 9fa61d9 to 005b855 Compare March 30, 2020 13:05
The previous implementation worked basically as follows:
A list of changes to the arrival rate is generated where there is a
minimum time between this changes (currently 250ms). This means that for
any ramp-up/down there is a step every 250ms (more or less).
After this is generated a goroutine will read that list and send the
change on a channel at the appropriate time. During ramp-up/down this is
every 250ms (more or less).
Another goroutine will be receiving those changes and resetting a timer
to the new value. And here is where the problem lies:
If the arrival rate is more then 1 iteration each 250ms this means that
it will never trigger a start of a VU.
The extreme example is having ramp-up to 4 iteration per second - this
is exactly 1 iteration each 250ms, which means that for the whole
duration up to end of the ramp up there will be zero iterations started.

In order to completely remove this I went the step further. The way we
seperate the interval in small pieces reminded me of integrals and
is one of the very easy cases for using integrals.
So I just calculate the integral of the function that is the number of
VUs over time. The answer is how many ioterations need to be done there,
so if I reverse this and instead calculate for how much time I will get
1 I get when the first iteration should start. I can do that for any
iteration number and if there is no result then obviously we can get
that :D.

This also has the awesome sideeffect that if we split the executib in 10
we only need to calculate 1/10th of the integrals on each instance.
@mstoykov mstoykov force-pushed the fix/VariableArrivalRate/wrongStepping branch from 32ec8ce to 56f03c7 Compare April 1, 2020 13:19
@mstoykov mstoykov requested review from imiric and na-- April 1, 2020 13:20
@mstoykov mstoykov changed the base branch from GetStripedOffsetsImplementation to new-schedulers April 1, 2020 13:25
Comment on lines +314 to +316
var ch = make(chan time.Duration, 10) // buffer 10 iteration times ahead
var prevTime time.Duration
go varr.config.cal(varr.executionState.ExecutionTuple, ch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an excellent thing to put in the Init() method. The only problem that I can see is that this goroutine is going to leak. Not a big deal, but still not really great, given that test runs can have multiple executors, some of which working only at the start

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to leave it for when I rewrite the calculation as a state machine - maybe then it will be a bigger problem.

I don't think this takes enough time to warrant be moved and adding ch as a field to the executor's struct.

Do you mean that it leaks until it finishes ? is needed? because - yes that it does. But it will end when it goes through all the stages and emits all the iterations needed

Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There are some minor nitpicks I've left as inline comments, and I'll repeat the VU allocation comment in the issue.

lib/executor/variable_arrival_rate.go Outdated Show resolved Hide resolved
lib/executor/variable_arrival_rate.go Outdated Show resolved Hide resolved
for _, stage := range varc.Stages {
to = float64(stage.Target.ValueOrZero()) / float64(timeUnit)
dur = float64(time.Duration(stage.Duration.Duration).Nanoseconds())
if from != to { // ramp up/down
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not a big deal, but given that from and to are float64s, this comparison probably should be done with math.Abs(from-to) < tolerance and some fairly large epsilon/tolerance value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some of the earlier iterations, I was using directly the stage.Target int value but then I released that both from and to are always just stage.Target / timeUnit where one of them is a constant for the whole calculation and stage.Target is the only moving part ... I highly doubt that there is a way for this to not be equal... also timeUnit is, in general, some number such as 1000000000(1 second in nanoseconds) so if we are going from 1iterations/s to 2 iterations/s the difference will be something very small so the epsilon should also be something very small ...
Again I highly doubt that golang does anything that can make this math wrong ...

p.s. thinking about this I can probably move the timeUnit outside of the calculations .. somehow ... should try it when I finish with my code reviews ....

lib/executor/variable_arrival_rate.go Outdated Show resolved Hide resolved
lib/executor/variable_arrival_rate.go Outdated Show resolved Hide resolved
varr.logger.Warningf("Insufficient VUs, reached %d active VUs and cannot allocate more", maxVUs)
continue
}
vu, err = varr.executionState.GetUnplannedVU(maxDurationCtx, varr.logger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized this (and my original) implementation of initializing unplanned VUs in the middle of the iteration loop is very wrong... This synchronous operation can significantly derail the actual iteration pace... Another reason to implement the first part of #1386, or just allocation of the VU in a goroutine. But then again, if a few iterations in a row lack a VU to run on, we probably shouldn't allocate more than 1 VU concurrently...

@mstoykov mstoykov requested a review from na-- April 14, 2020 11:00
Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I like that the VAR implementation is conceptually simpler now, though I can't comment much on the math and trust you guys (and the passing tests) that it's working properly now. Hats off :)

@mstoykov mstoykov deleted the fix/VariableArrivalRate/wrongStepping branch June 9, 2020 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants