-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up calibration? #16
Comments
Hah! We're (in governor) one of the users stumbling across that undocumented startup time. I would love to upgrade to 0.5, but the way it works currently is that a new clock is created for every rate limiter, incurring a 1s delay every time... Not great (especially because previously, construction took only 40µs). As a work-around, would it be safe to keep a static clock around and only initialize it once, the first time somebody needs it? Or are calibrations different across threads / processor bounds? |
@antifuchs So this brings up some interesting tidbits! At a high-level, modern CPUs have invariant/non-stop TSC: it ticks along at a fixed rate regardless of clock rate, ACPI P-/C-/T- states, etc. We're talking like 2011 and later. We implicitly depend on this feature because otherwise our timing would be changing over time without frequent recalibration. The biggest thing we don't account for is SMP systems. From a bunch of reading, Linux, Windows, and presumably others will be synchronizing the starting TSC value across cores. From the little I do know about SMP systems, there's going to be an inherent latency involved going cross-socket, compared to cross-core on the same socket. Extrapolating from data I've seen on bits like the latency of AMD's Infinity Fabric, we might expect this cross-socket latency to be 10-20x higher than the same-socket latency. This is all to say: your chance of reading the TSC on a different socket, having a different TSC offset, gets higher with the number of CPUs your machine has, which will naturally lead to potential mismatches. However, the calibration itself should still be consistent over time: if mismatches came up, they would be for a fixed amount depending on which socket the measurement was taken one relative to the socket used for calibration. This gets into the part of the convo where we talk about how I'm not a subject matter expert here. I haven't read all of the TSC initialization code for the Linux kernel, so maybe they're already estimating for NUMA node latency differences? Maybe the Linux scheduler fights very hard not to reschedule threads on a different NUMA node so we're very unlikely to see this in practice? I honestly just don't know. One solution I think I can offer, at least, is some potential "fallback" feature flag where all of the niceties of For your use case in
Admittedly, though, I'm not sure if this approach would work if two versions of |
I created #17 to explore some of the thoughts I jotted down in the comment above. |
33: Updates dependencies. r=antifuchs a=azriel91 Updates the following dependency versions. ``` parking_lot v0.10.0 -> v0.10.2 proptest v0.9.4 -> v0.9.6 criterion v0.3.0 -> v0.3.2 futures v0.3.1 -> v0.3.5 futures v0.3.1 -> v0.3.5 rand v0.7.2 -> v0.7.3 libc 1 v0.2.4 -> v0.2.70 dashmap v3.1.0 -> v3.11.1 quanta v0.4.1 -> v0.5.2 tynm v0.1.1 -> v0.1.4 ``` Notably, updating `quanta` from `0.4` to `0.5` runs into metrics-rs/quanta#16, where initializing a `quanta::Clock` takes 1 second, as calibration is done. That's why some tests require `Instant.now()` to be moved after instantiating the `RateLimiter`. Co-authored-by: Azriel Hoh <azriel91@gmail.com>
That's very reasonable thinking! I'd appreciate having a feature flag like that, but knowing that calibration is important for an accurate reading from quanta can also influence our API design: We could just take an already-initialized Clock as an input parameter, instead of (what we do now) constructing one via While we don't currently use mocking (interacts badly with the things that require actual time to pass, like Futures), I think that would still work. Letting our users pick how accurate they need readings to be seems like the right call. That leaves the TSC/SMP issue, of course, but I'm neither a hardware hacker nor a kernel person, so I'll leave this to more knowledgeable people (: This might actually be something that kernel folks would understand really well - I'd recommend getting in touch with lwn, possibly! |
@antifuchs I've released an alpha version of Not sure if you want to give it a spin in a side branch just to see if it fixes the issues you were seeing. I still plan to do some more work related to testing of |
Oh man, after more than a year, I've finally managed to get quanta upgraded past 0.4: boinkor-net/governor#83 - the main difficulty I had was in real-time-using integration tests, where the clock initialization caused delays greater than the tests were expecting (you can't fake out time in that test, unfortunately!). boinkor-net/governor@b0481e6 has details. Other than that, I think this works great (and is much more performant & correct!) - so, thank you all for continuing to improve quanta (: |
In magnet/metered-rs#21, it was noted that the calibration time for
Clock
takes one second, which may come across as a surprise to users. This is totally fair, especially since it's not documented.Are there alternative calibration approaches we can take to avoid spending a full second of undocumented time while still achieving an accurate calibration?
One idea is that we loop until we have a statistically significant number of measurements and have reached some stable deviance of the measurements, while falling back to a maximum amount of time spent where we would use the current calibration logic.
The text was updated successfully, but these errors were encountered: