Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up calibration? #16

Closed
tobz opened this issue May 13, 2020 · 6 comments · Fixed by #19
Closed

Speed up calibration? #16

tobz opened this issue May 13, 2020 · 6 comments · Fixed by #19

Comments

@tobz
Copy link
Member

tobz commented May 13, 2020

In magnet/metered-rs#21, it was noted that the calibration time for Clock takes one second, which may come across as a surprise to users. This is totally fair, especially since it's not documented.

Are there alternative calibration approaches we can take to avoid spending a full second of undocumented time while still achieving an accurate calibration?

One idea is that we loop until we have a statistically significant number of measurements and have reached some stable deviance of the measurements, while falling back to a maximum amount of time spent where we would use the current calibration logic.

@antifuchs
Copy link
Contributor

Hah! We're (in governor) one of the users stumbling across that undocumented startup time. I would love to upgrade to 0.5, but the way it works currently is that a new clock is created for every rate limiter, incurring a 1s delay every time... Not great (especially because previously, construction took only 40µs).

As a work-around, would it be safe to keep a static clock around and only initialize it once, the first time somebody needs it? Or are calibrations different across threads / processor bounds?

@tobz
Copy link
Member Author

tobz commented May 16, 2020

@antifuchs So this brings up some interesting tidbits!

At a high-level, modern CPUs have invariant/non-stop TSC: it ticks along at a fixed rate regardless of clock rate, ACPI P-/C-/T- states, etc. We're talking like 2011 and later. We implicitly depend on this feature because otherwise our timing would be changing over time without frequent recalibration.

The biggest thing we don't account for is SMP systems. From a bunch of reading, Linux, Windows, and presumably others will be synchronizing the starting TSC value across cores. From the little I do know about SMP systems, there's going to be an inherent latency involved going cross-socket, compared to cross-core on the same socket. Extrapolating from data I've seen on bits like the latency of AMD's Infinity Fabric, we might expect this cross-socket latency to be 10-20x higher than the same-socket latency.

This is all to say: your chance of reading the TSC on a different socket, having a different TSC offset, gets higher with the number of CPUs your machine has, which will naturally lead to potential mismatches. However, the calibration itself should still be consistent over time: if mismatches came up, they would be for a fixed amount depending on which socket the measurement was taken one relative to the socket used for calibration.

This gets into the part of the convo where we talk about how I'm not a subject matter expert here. I haven't read all of the TSC initialization code for the Linux kernel, so maybe they're already estimating for NUMA node latency differences? Maybe the Linux scheduler fights very hard not to reschedule threads on a different NUMA node so we're very unlikely to see this in practice? I honestly just don't know.

One solution I think I can offer, at least, is some potential "fallback" feature flag where all of the niceties of quanta are preserved: mocking, the "recent" time, etc, but we simply use the normal OS facilities under the hood i.e. the same as std::time::Instant.

For your use case in governer, I think this would be an acceptable solution because it would:

  1. entirely avoid calibration time/overhead
  2. let you keep being able to mock the clock (do you even mock it? I have no idea :P)
  3. still let you take advantage of Clock::recent

Admittedly, though, I'm not sure if this approach would work if two versions of quanta are present in the same process where the feature flags enabled for each are different?

@tobz
Copy link
Member Author

tobz commented May 16, 2020

I created #17 to explore some of the thoughts I jotted down in the comment above.

bors bot added a commit to boinkor-net/governor that referenced this issue May 17, 2020
33: Updates dependencies. r=antifuchs a=azriel91

Updates the following dependency versions.

```
parking_lot  v0.10.0 -> v0.10.2
proptest      v0.9.4 -> v0.9.6
criterion     v0.3.0 -> v0.3.2
futures       v0.3.1 -> v0.3.5
futures       v0.3.1 -> v0.3.5
rand          v0.7.2 -> v0.7.3
libc 1        v0.2.4 -> v0.2.70
dashmap       v3.1.0 -> v3.11.1
quanta        v0.4.1 -> v0.5.2
tynm          v0.1.1 -> v0.1.4
```

Notably, updating `quanta` from `0.4` to `0.5` runs into metrics-rs/quanta#16, where initializing a `quanta::Clock` takes 1 second, as calibration is done. That's why some tests require `Instant.now()` to be moved after instantiating the `RateLimiter`.


Co-authored-by: Azriel Hoh <azriel91@gmail.com>
@antifuchs
Copy link
Contributor

That's very reasonable thinking! I'd appreciate having a feature flag like that, but knowing that calibration is important for an accurate reading from quanta can also influence our API design: We could just take an already-initialized Clock as an input parameter, instead of (what we do now) constructing one via Default::default().

While we don't currently use mocking (interacts badly with the things that require actual time to pass, like Futures), I think that would still work. Letting our users pick how accurate they need readings to be seems like the right call.

That leaves the TSC/SMP issue, of course, but I'm neither a hardware hacker nor a kernel person, so I'll leave this to more knowledgeable people (:

This might actually be something that kernel folks would understand really well - I'd recommend getting in touch with lwn, possibly!

@tobz
Copy link
Member Author

tobz commented May 26, 2020

@antifuchs I've released an alpha version of quanta that has my work to speed up calibration as well as allow the first calibration to be shared: quanta@0.5.3-alpha.1

Not sure if you want to give it a spin in a side branch just to see if it fixes the issues you were seeing. I still plan to do some more work related to testing of quanta so I'm not quite ready to release a non-alpha version... yet. :)

@antifuchs
Copy link
Contributor

Oh man, after more than a year, I've finally managed to get quanta upgraded past 0.4: boinkor-net/governor#83 - the main difficulty I had was in real-time-using integration tests, where the clock initialization caused delays greater than the tests were expecting (you can't fake out time in that test, unfortunately!). boinkor-net/governor@b0481e6 has details.

Other than that, I think this works great (and is much more performant & correct!) - so, thank you all for continuing to improve quanta (:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants