This repository has been archived by the owner on Nov 7, 2019. It is now read-only.
forked from illumos/illumos-gate
9234 reduce apic calibration error by taking multiple measurements #578
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
brad-lewis
changed the title
DLPX-50219 reduce apic calibration error by taking multiple measurements
reduce apic calibration error by taking multiple measurements
Mar 6, 2018
ikozhukhov
reviewed
Mar 8, 2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i prefer to put 'void' to this line, the same in prototype where we have it
ikozhukhov
reviewed
Mar 8, 2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put 'void' here too
brad-lewis
changed the title
reduce apic calibration error by taking multiple measurements
9234 reduce apic calibration error by taking multiple measurements
Mar 9, 2018
brad-lewis
force-pushed
the
openzfs-DLPX-50219
branch
from
April 13, 2018 20:32
988b77d
to
a75c4f8
Compare
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> The APIC is used as a timer in Illumos. Specifically, it is used by the callout and cyclic frameworks to generate an interrupt around the time that the closest timer would expire. Once in the interrupt context those frameworks call `gethrtime()` to determine which timers have expired, thus the system doesn't solely rely on the accuracy of the APIC. If the APIC is lagging behind the real time then we will have more jitter and shorter timeouts will tend to be late. If the APIC is quicker than it should then we will generate an excessive amount of interrupts as the APIC would fire an interrupt before any timers expire. In any case, I've tested what happens if the APIC is severely miscalibrated (10% or 1000% of target speed) and it doesn't seem to create any unstability on the system. With 1000% of the speed: we'd see a significant increase of the number of interrupts fired, especially when system is idle: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys dt idl 0 41 0 5 9711 247 343 6 20 3 0 527 1 3 0 96 1 79 0 14 9366 409 1046 8 20 4 0 2894 1 3 0 96 vs, normally: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys dt idl 0 120 0 10 797 254 1082 9 20 3 0 2564 1 2 0 97 1 80 0 11 830 387 385 7 19 4 0 1175 1 1 0 98 The way that the APIC is calibrated is by using the 8254 fixed frequency timer (PIT). We wait for it to count a certain amount of ticks and then we check how many ticks does the APIC count in the same time interval. The main issue is that on some hypervisors, notably hyperv, both the 8254 and the APIC are emulated and thus can sometimes be inconsistent. I've done an experiment to measure how much of an effect do those inconsistencies have on the apic calibration factor (which determines how many apic ticks pass in a certain amount of nanoseconds), and here are the results for about 15000 measurements (done by performing 1000 measurements at a time on each boot). The main observation is that calibration doesn't seem to change from boot to boot and that the accuracy of measurements doesn't seem to have any correlation to the given time of measurement, which means that very inaccurate measurements happen randomly. Most measurements are quite accurate, except for some rare outliers (as can be seen in the graph). It was determined that a 5-value median filter would significantly reduce the worst case calibrations. In the results below, `stdev %` is the standard deviation divided by the average; `min %` is how far is the lowest calibration value measured compared to the average and `max %` is how far is the highest calibration value measured to the average. Base Results: stdev % min % max % dcenter 0.02 0.2 0.2 AWS 0.02 1.4 0.1 hyperv 0.79 6.4 5.5 Azure 2.87 35.1 331.1 Using 5-value Median Filter: stdev % min % max % dcenter 0.01 0.02 0.04 AWS 0.01 0.01 0.03 hyperv 0.47 1.47 1.76 Azure 0.50 2.67 1.39 As we can see, using the median filter significantly reduces the worst-case (min/max) miscalibrations on all platforms, and seems to be a necessity on Azure to insure a proper worst-case calibration. Closes openzfs#578
prakashsurya
force-pushed
the
openzfs-DLPX-50219
branch
from
May 16, 2018 18:25
a75c4f8
to
2e9d99f
Compare
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reviewed by: George Wilson george.wilson@delphix.com
Reviewed by: Sebastien Roy sebastien.roy@delphix.com
Calibration of the APIC timer is currently performed by doing one mesurement at boot time. On some hypervisors, like Azure, the calibration can be quite off because timers are emulated. By taking the median of 5 mesurements instead, we can significantly reduce the worst case calibration error and slightly improve average calibration error, while not having to introduce any invasive code changes.
The APIC is used as a timer in Illumos. Specifically, it is used by the callout and cyclic frameworks to generate an interrupt around the time that the closest timer would expire. Once in the interrupt context those frameworks call gethrtime() to determine which timers have expired, thus the system doesn't solely rely on the accuracy of the APIC.
If the APIC is lagging behind the real time then we will have more jitter and shorter timeouts will tend to be late. If the APIC is quicker than it should then we will generate an excessive amount of interrupts as the APIC would fire an interrupt before any timers expire. In any case, I've tested what happens if the APIC is severely miscalibrated (10% or 1000% of target speed) and it doesn't seem to create any unstability on the system.
With 1000% of the speed: we'd see a significant increase of the number of interrupts fired, especially when system is idle:
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys dt idl
0 41 0 5 9711 247 343 6 20 3 0 527 1 3 0 96
1 79 0 14 9366 409 1046 8 20 4 0 2894 1 3 0 96
vs, normally:
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys dt idl
0 120 0 10 797 254 1082 9 20 3 0 2564 1 2 0 97
1 80 0 11 830 387 385 7 19 4 0 1175 1 1 0 98
The way that the APIC is calibrated is by using the 8254 fixed frequency timer (PIT). We wait for it to count a certain amount of ticks and then we check how many ticks does the APIC count in the same time interval. The main issue is that on some hypervisors, notably hyperv, both the 8254 and the APIC are emulated and thus can sometimes be inconsistent.
I've done an experiment to measure how much of an effect do those inconsistencies have on the apic calibration factor (which determines how many apic ticks pass in a certain amount of nanoseconds), and here are the results for about 15000 measurements (done by performing 1000 measurements at a time on each boot).
The main observation is that calibration doesn't seem to change from boot to boot and that the accuracy of measurements doesn't seem to have any correlation to the given time of measurement, which means that very inaccurate measurements happen randomly. Most measurements are quite accurate, except for some rare outliers (as can be seen in the graph). It was determined that a 5-value median filter would significantly reduce the worst case calibrations.
In the results below, stdev % is the standard deviation divided by the average; min % is how far is the lowest calibration value measured compared to the average and max % is how far is the highest calibration value measured to the average.
Base Results:
stdev % min % max %
AWS 0.02 1.4 0.1
hyperv 0.79 6.4 5.5
Azure 2.87 35.1 331.1
Using 5-value Median Filter:
stdev % min % max %
AWS 0.01 0.01 0.03
hyperv 0.47 1.47 1.76
Azure 0.50 2.67 1.39
As we can see, using the median filter significantly reduces the worst-case (min/max) mis-calibrations on all platforms, and seems to be a necessity on Azure to insure a proper worst-case calibration.
Upstream bug: DLPX-50219