Speed up calibration? #16

tobz · 2020-05-13T13:31:59Z

In magnet/metered-rs#21, it was noted that the calibration time for Clock takes one second, which may come across as a surprise to users. This is totally fair, especially since it's not documented.

Are there alternative calibration approaches we can take to avoid spending a full second of undocumented time while still achieving an accurate calibration?

One idea is that we loop until we have a statistically significant number of measurements and have reached some stable deviance of the measurements, while falling back to a maximum amount of time spent where we would use the current calibration logic.

The text was updated successfully, but these errors were encountered:

antifuchs · 2020-05-16T05:26:04Z

Hah! We're (in governor) one of the users stumbling across that undocumented startup time. I would love to upgrade to 0.5, but the way it works currently is that a new clock is created for every rate limiter, incurring a 1s delay every time... Not great (especially because previously, construction took only 40µs).

As a work-around, would it be safe to keep a static clock around and only initialize it once, the first time somebody needs it? Or are calibrations different across threads / processor bounds?

tobz · 2020-05-16T23:11:32Z

@antifuchs So this brings up some interesting tidbits!

At a high-level, modern CPUs have invariant/non-stop TSC: it ticks along at a fixed rate regardless of clock rate, ACPI P-/C-/T- states, etc. We're talking like 2011 and later. We implicitly depend on this feature because otherwise our timing would be changing over time without frequent recalibration.

The biggest thing we don't account for is SMP systems. From a bunch of reading, Linux, Windows, and presumably others will be synchronizing the starting TSC value across cores. From the little I do know about SMP systems, there's going to be an inherent latency involved going cross-socket, compared to cross-core on the same socket. Extrapolating from data I've seen on bits like the latency of AMD's Infinity Fabric, we might expect this cross-socket latency to be 10-20x higher than the same-socket latency.

This is all to say: your chance of reading the TSC on a different socket, having a different TSC offset, gets higher with the number of CPUs your machine has, which will naturally lead to potential mismatches. However, the calibration itself should still be consistent over time: if mismatches came up, they would be for a fixed amount depending on which socket the measurement was taken one relative to the socket used for calibration.

This gets into the part of the convo where we talk about how I'm not a subject matter expert here. I haven't read all of the TSC initialization code for the Linux kernel, so maybe they're already estimating for NUMA node latency differences? Maybe the Linux scheduler fights very hard not to reschedule threads on a different NUMA node so we're very unlikely to see this in practice? I honestly just don't know.

One solution I think I can offer, at least, is some potential "fallback" feature flag where all of the niceties of quanta are preserved: mocking, the "recent" time, etc, but we simply use the normal OS facilities under the hood i.e. the same as std::time::Instant.

For your use case in governer, I think this would be an acceptable solution because it would:

entirely avoid calibration time/overhead
let you keep being able to mock the clock (do you even mock it? I have no idea :P)
still let you take advantage of Clock::recent

Admittedly, though, I'm not sure if this approach would work if two versions of quanta are present in the same process where the feature flags enabled for each are different?

tobz · 2020-05-16T23:37:25Z

I created #17 to explore some of the thoughts I jotted down in the comment above.

33: Updates dependencies. r=antifuchs a=azriel91 Updates the following dependency versions. ``` parking_lot v0.10.0 -> v0.10.2 proptest v0.9.4 -> v0.9.6 criterion v0.3.0 -> v0.3.2 futures v0.3.1 -> v0.3.5 futures v0.3.1 -> v0.3.5 rand v0.7.2 -> v0.7.3 libc 1 v0.2.4 -> v0.2.70 dashmap v3.1.0 -> v3.11.1 quanta v0.4.1 -> v0.5.2 tynm v0.1.1 -> v0.1.4 ``` Notably, updating `quanta` from `0.4` to `0.5` runs into metrics-rs/quanta#16, where initializing a `quanta::Clock` takes 1 second, as calibration is done. That's why some tests require `Instant.now()` to be moved after instantiating the `RateLimiter`. Co-authored-by: Azriel Hoh <azriel91@gmail.com>

antifuchs · 2020-05-18T01:28:26Z

That's very reasonable thinking! I'd appreciate having a feature flag like that, but knowing that calibration is important for an accurate reading from quanta can also influence our API design: We could just take an already-initialized Clock as an input parameter, instead of (what we do now) constructing one via Default::default().

While we don't currently use mocking (interacts badly with the things that require actual time to pass, like Futures), I think that would still work. Letting our users pick how accurate they need readings to be seems like the right call.

That leaves the TSC/SMP issue, of course, but I'm neither a hardware hacker nor a kernel person, so I'll leave this to more knowledgeable people (:

This might actually be something that kernel folks would understand really well - I'd recommend getting in touch with lwn, possibly!

tobz · 2020-05-26T19:05:52Z

@antifuchs I've released an alpha version of quanta that has my work to speed up calibration as well as allow the first calibration to be shared: quanta@0.5.3-alpha.1

Not sure if you want to give it a spin in a side branch just to see if it fixes the issues you were seeing. I still plan to do some more work related to testing of quanta so I'm not quite ready to release a non-alpha version... yet. :)

antifuchs · 2021-06-19T03:30:39Z

Oh man, after more than a year, I've finally managed to get quanta upgraded past 0.4: boinkor-net/governor#83 - the main difficulty I had was in real-time-using integration tests, where the clock initialization caused delays greater than the tests were expecting (you can't fake out time in that test, unfortunately!). boinkor-net/governor@b0481e6 has details.

Other than that, I think this works great (and is much more performant & correct!) - so, thank you all for continuing to improve quanta (:

azriel91 mentioned this issue May 14, 2020

Updates dependencies. boinkor-net/governor#33

Merged

antifuchs mentioned this issue May 16, 2020

quanta: 0.5.2 + calibrate it only once boinkor-net/governor#36

Closed

1 task

tobz mentioned this issue May 16, 2020

Should quanta read the Time Stamp Counter at all? #17

Closed

tobz mentioned this issue May 24, 2020

Quanta overhaul: calibration speed, accuracy, and more #19

Merged

tobz closed this as completed in #19 May 26, 2020

antifuchs mentioned this issue Jul 11, 2020

Update quanta requirement from 0.4.1 to 0.6.0 boinkor-net/governor#42

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up calibration? #16

Speed up calibration? #16

tobz commented May 13, 2020

antifuchs commented May 16, 2020

tobz commented May 16, 2020 •

edited

tobz commented May 16, 2020

antifuchs commented May 18, 2020

tobz commented May 26, 2020

antifuchs commented Jun 19, 2021

Speed up calibration? #16

Speed up calibration? #16

Comments

tobz commented May 13, 2020

antifuchs commented May 16, 2020

tobz commented May 16, 2020 • edited

tobz commented May 16, 2020

antifuchs commented May 18, 2020

tobz commented May 26, 2020

antifuchs commented Jun 19, 2021

tobz commented May 16, 2020 •

edited