Skip to content

Commit 974850b

Browse files
Like Xusean-jc
authored andcommitted
KVM: x86/pmu: Add PRIR++ and PDist support for SPR and later models
The pebs capability on the SPR is basically the same as Ice Lake Server with the exception of two special facilities that have been enhanced and require special handling. Upon triggering a PEBS assist, there will be a finite delay between the time the counter overflows and when the microcode starts to carry out its data collection obligations. Even if the delay is constant in core clock space, it invariably manifest as variable "skids" in instruction address space. On the Ice Lake Server, the Precise Distribution of Instructions Retire (PDIR) facility mitigates the "skid" problem by providing an early indication of when the counter is about to overflow. On SPR, the PDIR counter available (Fixed 0) is unchanged, but the capability is enhanced to Instruction-Accurate PDIR (PDIR++), where PEBS is taken on the next instruction after the one that caused the overflow. SPR also introduces a new Precise Distribution (PDist) facility only on general programmable counter 0. Per Intel SDM, PDist eliminates any skid or shadowing effects from PEBS. With PDist, the PEBS record will be generated precisely upon completion of the instruction or operation that causes the counter to overflow (there is no "wait for next occurrence" by default). In terms of KVM handling, when guest accesses those special counters, the KVM needs to request the same index counters via the perf_event kernel subsystem to ensure that the guest uses the correct pebs hardware counter (PRIR++ or PDist). This is mainly achieved by adjusting the event precise level to the maximum, where the semantics of this magic number is mainly defined by the internal software context of perf_event and it's also backwards compatible as part of the user space interface. Opportunistically, refine confusing comments on TNT+, as the only ones that currently support pebs_ept are Ice Lake server and SPR (GLC+). Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20221109082802.27543-3-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
1 parent 2de154f commit 974850b

File tree

1 file changed

+33
-12
lines changed

1 file changed

+33
-12
lines changed

arch/x86/kvm/pmu.c

Lines changed: 33 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,18 @@
2929
struct x86_pmu_capability __read_mostly kvm_pmu_cap;
3030
EXPORT_SYMBOL_GPL(kvm_pmu_cap);
3131

32-
static const struct x86_cpu_id vmx_icl_pebs_cpu[] = {
32+
/* Precise Distribution of Instructions Retired (PDIR) */
33+
static const struct x86_cpu_id vmx_pebs_pdir_cpu[] = {
3334
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, NULL),
3435
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, NULL),
36+
/* Instruction-Accurate PDIR (PDIR++) */
37+
X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, NULL),
38+
{}
39+
};
40+
41+
/* Precise Distribution (PDist) */
42+
static const struct x86_cpu_id vmx_pebs_pdist_cpu[] = {
43+
X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, NULL),
3544
{}
3645
};
3746

@@ -156,6 +165,28 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
156165
kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
157166
}
158167

168+
static u64 pmc_get_pebs_precise_level(struct kvm_pmc *pmc)
169+
{
170+
/*
171+
* For some model specific pebs counters with special capabilities
172+
* (PDIR, PDIR++, PDIST), KVM needs to raise the event precise
173+
* level to the maximum value (currently 3, backwards compatible)
174+
* so that the perf subsystem would assign specific hardware counter
175+
* with that capability for vPMC.
176+
*/
177+
if ((pmc->idx == 0 && x86_match_cpu(vmx_pebs_pdist_cpu)) ||
178+
(pmc->idx == 32 && x86_match_cpu(vmx_pebs_pdir_cpu)))
179+
return 3;
180+
181+
/*
182+
* The non-zero precision level of guest event makes the ordinary
183+
* guest event becomes a guest PEBS event and triggers the host
184+
* PEBS PMI handler to determine whether the PEBS overflow PMI
185+
* comes from the host counters or the guest.
186+
*/
187+
return 1;
188+
}
189+
159190
static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
160191
bool exclude_user, bool exclude_kernel,
161192
bool intr)
@@ -187,22 +218,12 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
187218
}
188219
if (pebs) {
189220
/*
190-
* The non-zero precision level of guest event makes the ordinary
191-
* guest event becomes a guest PEBS event and triggers the host
192-
* PEBS PMI handler to determine whether the PEBS overflow PMI
193-
* comes from the host counters or the guest.
194-
*
195221
* For most PEBS hardware events, the difference in the software
196222
* precision levels of guest and host PEBS events will not affect
197223
* the accuracy of the PEBS profiling result, because the "event IP"
198224
* in the PEBS record is calibrated on the guest side.
199-
*
200-
* On Icelake everything is fine. Other hardware (GLC+, TNT+) that
201-
* could possibly care here is unsupported and needs changes.
202225
*/
203-
attr.precise_ip = 1;
204-
if (x86_match_cpu(vmx_icl_pebs_cpu) && pmc->idx == 32)
205-
attr.precise_ip = 3;
226+
attr.precise_ip = pmc_get_pebs_precise_level(pmc);
206227
}
207228

208229
event = perf_event_create_kernel_counter(&attr, -1, current,

0 commit comments

Comments
 (0)