Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
HTTP: response time measurement omits time to connect and send request #2077
in 2.0 RC2 (likely the same for all previous versions):
eg. I run the following wrk test:
and freeze the server for 6 seconds:
we can see from the above that the median is calculated correctly.
when running the following workload similar to wrk above:
the measurements are not capturing the server freeze.
I suspect this may be because the measurement is not started before the connection is initiated? I haven't looked at the code yet.
However, this would change the whole meaning of those timestamps : it would no longer be the time when the first byte is received, but when we started waiting for a response.
However, this does not shows up in the console output.
And thanks for your kind words, I'm doing my best ;)
I'd still call it a bug, in how percentiles and other latency distribution information is reported. This is a case of a known server behavior (an induced 6 second freeze) that is simply not reported in the measurements. E.g. the 90% number is clearly off by 2+ orders of magnitude, and the reported percent of t < 800 ms (98%) is also clearly very wrong. Whatever the mechanism that causes this bug is, the measurements for all latencies and percentiles are carving out that 6 second event and act like it never happened.
I agree this is a bug, and certainly needs to be fixed before 2.0 is released.
I disagree that latency (or whatever we want to call the the time the client had to wait) is firstByteReceived - lastByteSent.
This would omit exactly the time in the test above which is the time to connect.
Where to start the timing is not obvious, see for an example:
therefore, I would strongly advise to change the latency to
which should start (I am currently assuming) before the HTTP request is sent, before the tcp connection is initiated.
this is a critical bug in 2.0
so I have been trying a simple change to the requirement to eliminate this risk where the latency ignores the impact to user experience where the connect time is omitted:
good and not so good news -
the ^Z is for 5 seconds out of the 10 seconds test
we can see some bunching though in the response time distribution. -> this could be caused by my MBP though. better do a next test on a linux machine.
it's a first stab in any case.
@slandelle yes looks good.
I think there's more work to do like the comment above about the bunching in the percentile distribution, but as a first step it does what it is suppose to do. LGTM.
I'll have to have a proper second look at CO, but it looks to me that in our case, it's more a matter of accuracy caused by the temporal bucket width: events much shorter than the bucket width might go unnoticed because they will be outweighed. If the moto is "percentiles are not a silver bullet", I agree.
CO is a (load test design / load test tool implementation) issue.
This happens when you try to execute a bunch of unrelated requests sequentially into a loop (or multiple concurrent loops). If the system under test starts lagging, the next request won't be started in due time.
IMO, the root cause here is: "why were those unrelated requests executed sequentially in the first place???". Those should have been scheduled independently!
Beware that it's your job to figure out if your requests are unrelated or not. For example:
This CO issue looks very similar to me to the difference between closed and open workload models.
In a closed model, you only have a given number of users in your system, and a new user can't enter the system until another one exits.
In most system, you want an open workload, where new users keep on arriving whatever your latency, possibly causing your system under test to crash.
Back to Gatling: Gatling emphasizes open workload model: you schedule the virtual users arrival rate.
If you really want a closed workload model, you can do what most other tools do: wrap your scenario into a loop, and reset your virtual users on each iteration.