Skip to content

Commit 3261432

Browse files
lirui34wenlingz
authored andcommitted
doc: Add document of RT performance tuning.
Add document of RT performance tuning. Signed-off-by: lirui34 <ruix.li@intel.com>
1 parent ca27f8e commit 3261432

File tree

3 files changed

+199
-0
lines changed

3 files changed

+199
-0
lines changed

doc/develop.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ Configuration Tutorials
2727
tutorials/using_sdc2_mode_on_nuc
2828
tutorials/using_hybrid_mode_on_nuc
2929
tutorials/building_acrn_in_docker
30+
tutorials/realtime_performance_tuning
3031

3132
User VM Tutorials
3233
*****************

doc/tutorials/images/vm_exits_log.png

77.1 KB
Loading
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
.. _rt_performance_tuning:
2+
3+
Trace and Data Collection for ACRN Real-Time(RT) Performance Tuning
4+
###################################################################
5+
The document describes the methods to collect trace/data for ACRN RT VM real-time
6+
performance analysis. Two parts are included:
7+
8+
- Method to use trace for the VM exits analysis;
9+
- Method to collect performance monitoring counts for tuning based on PMU.
10+
11+
VM exits analysis for ACRN RT performance
12+
*****************************************
13+
14+
VM exits in response to certain instructions and events are a key source of
15+
performance degradation in virtual machines. During the runtime of hard RTVM
16+
of ACRN, there are still some instructions and events which will impact the
17+
RT latency's determinism.
18+
19+
- CPUID
20+
- TSC_Adjust read/write
21+
- TSC write
22+
- APICID/LDR read
23+
- ICR write
24+
25+
Generally, we don't want to see any VM exits occur during the critical section
26+
of the RT task.
27+
28+
The methodology of VM exits analysis is very simple. Firstly, we should clearly
29+
identify the critical section of RT task. The critical section is the duration
30+
of time where we do not want to see any VM exits occur. Different RT tasks get
31+
different critical section. So this article will take the cyclictest as an example
32+
to elaborate how to do VM exits analysis.
33+
34+
The critical sections
35+
=====================
36+
37+
Here is example pseudocode of cyclictest implementation.
38+
39+
.. code-block:: none
40+
41+
while (!shutdown) {
42+
43+
clock_nanosleep(&next)
44+
clock_gettime(&now)
45+
latency = calcdiff(now, next)
46+
47+
next += interval
48+
}
49+
50+
Time point ``now`` is the actual point at which the cyclictest is wakeuped and
51+
scheduled. Time point ``next`` is the expected point at which we want the cyclictest
52+
to be woken up and scheduled. Here we can get the latency by ``now - next``. We don't
53+
want to see VM exits during ``next`` through ``now``. So define the start point of
54+
critical section as ``next`` and end point ``now``.
55+
56+
Log and trace data collection
57+
=============================
58+
59+
#. Add timestamps (in TSC) at ``next`` and ``now``.
60+
#. Capture the log with the above timestamps in RTVM.
61+
#. Capture the acrntrace log in Service VM at the same time.
62+
63+
Offline analysis
64+
================
65+
66+
#. Convert the raw trace data to human readable format.
67+
#. Merge the logs in RTVM and ACRN hypervisor trace based on timestamps (in TSC).
68+
#. Check if there is any VM exit within the critical sections, the pattern is as follows:
69+
70+
.. figure:: images/vm_exits_log.png
71+
:align: center
72+
:name: vm_exits_log
73+
74+
Performance monitoring counts collecting
75+
****************************************
76+
77+
Enable Performance Monitoring Unit (PMU) support in VM
78+
======================================================
79+
80+
By default, ACRN hypervisor doesn't expose the PMU related CPUID and MSRs to
81+
guest VM. In order to use Performance Monitoring Counters (PMCs) in guest VM,
82+
need to modify the ACRN hypervisor code to expose the capability to RTVM.
83+
84+
.. note:: Precise Event Based Sampling (PEBS) is not enabled in VM yet.
85+
86+
#. Expose CPUID leaf 0xA as below:
87+
88+
.. code-block:: none
89+
90+
--- a/hypervisor/arch/x86/guest/vcpuid.c
91+
+++ b/hypervisor/arch/x86/guest/vcpuid.c
92+
@@ -345,7 +345,7 @@ int32_t set_vcpuid_entries(struct acrn_vm *vm)
93+
break;
94+
/* These features are disabled */
95+
/* PMU is not supported */
96+
- case 0x0aU:
97+
+ //case 0x0aU:
98+
/* Intel RDT */
99+
case 0x0fU:
100+
case 0x10U:
101+
102+
#. Expose PMU related MSRs to VM as below:
103+
104+
.. code-block:: none
105+
106+
--- a/hypervisor/arch/x86/guest/vmsr.c
107+
+++ b/hypervisor/arch/x86/guest/vmsr.c
108+
@@ -337,6 +337,41 @@ void init_msr_emulation(struct acrn_vcpu *vcpu)
109+
/* don't need to intercept rdmsr for these MSRs */
110+
enable_msr_interception(msr_bitmap, MSR_IA32_TIME_STAMP_COUNTER, INTERCEPT_WRITE);
111+
112+
+
113+
+ /* Passthru PMU related MSRs to guest */
114+
+ enable_msr_interception(msr_bitmap, MSR_IA32_FIXED_CTR_CTL, INTERCEPT_DISABLE);
115+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PERF_GLOBAL_CTRL, INTERCEPT_DISABLE);
116+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PERF_GLOBAL_STATUS, INTERCEPT_DISABLE);
117+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PERF_GLOBAL_OVF_CTRL, INTERCEPT_DISABLE);
118+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PERF_GLOBAL_STATUS_SET, INTERCEPT_DISABLE);
119+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PERF_GLOBAL_INUSE, INTERCEPT_DISABLE);
120+
+
121+
+ enable_msr_interception(msr_bitmap, MSR_IA32_FIXED_CTR0, INTERCEPT_DISABLE);
122+
+ enable_msr_interception(msr_bitmap, MSR_IA32_FIXED_CTR1, INTERCEPT_DISABLE);
123+
+ enable_msr_interception(msr_bitmap, MSR_IA32_FIXED_CTR2, INTERCEPT_DISABLE);
124+
+
125+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PMC0, INTERCEPT_DISABLE);
126+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PMC1, INTERCEPT_DISABLE);
127+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PMC2, INTERCEPT_DISABLE);
128+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PMC3, INTERCEPT_DISABLE);
129+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PMC4, INTERCEPT_DISABLE);
130+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PMC5, INTERCEPT_DISABLE);
131+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PMC6, INTERCEPT_DISABLE);
132+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PMC7, INTERCEPT_DISABLE);
133+
+
134+
+ enable_msr_interception(msr_bitmap, MSR_IA32_A_PMC0, INTERCEPT_DISABLE);
135+
+ enable_msr_interception(msr_bitmap, MSR_IA32_A_PMC1, INTERCEPT_DISABLE);
136+
+ enable_msr_interception(msr_bitmap, MSR_IA32_A_PMC2, INTERCEPT_DISABLE);
137+
+ enable_msr_interception(msr_bitmap, MSR_IA32_A_PMC3, INTERCEPT_DISABLE);
138+
+ enable_msr_interception(msr_bitmap, MSR_IA32_A_PMC4, INTERCEPT_DISABLE);
139+
+ enable_msr_interception(msr_bitmap, MSR_IA32_A_PMC5, INTERCEPT_DISABLE);
140+
+ enable_msr_interception(msr_bitmap, MSR_IA32_A_PMC6, INTERCEPT_DISABLE);
141+
+ enable_msr_interception(msr_bitmap, MSR_IA32_A_PMC7, INTERCEPT_DISABLE);
142+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PERFEVTSEL0, INTERCEPT_DISABLE);
143+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PERFEVTSEL1, INTERCEPT_DISABLE);
144+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PERFEVTSEL2, INTERCEPT_DISABLE);
145+
+ enable_msr_interception(msr_bitmap, MSR_IA32_PERFEVTSEL3, INTERCEPT_DISABLE);
146+
+
147+
/* Setup MSR bitmap - Intel SDM Vol3 24.6.9 */
148+
value64 = hva2hpa(vcpu->arch.msr_bitmap);
149+
exec_vmwrite64(VMX_MSR_BITMAP_FULL, value64);
150+
151+
Use Perf/PMU tool in performance analysis
152+
=========================================
153+
154+
After exposing PMU related CPUID/MSRs to VM, the performance analysis tool such as
155+
perf and pmu tool can be used inside VM to locate the bottleneck of the application.
156+
**Perf** is a profiler tool for Linux 2.6+ based systems that abstracts away CPU
157+
hardware differences in Linux performance measurements and presents a simple command
158+
line interface. Perf is based on the perf_events interface exported by recent versions
159+
of the Linux kernel.
160+
**PMU** tools is a collection of tools for profile collection and performance analysis
161+
on Intel CPUs on top of Linux Perf. You can refer to the following links for the usage
162+
of Perf:
163+
164+
- https://perf.wiki.kernel.org/index.php/Main_Page
165+
- https://perf.wiki.kernel.org/index.php/Tutorial
166+
167+
You can refer to https://github.com/andikleen/pmu-tools for the usage of PMU tool.
168+
169+
Top-down Micro-architecture Analysis Method (TMAM)
170+
==================================================
171+
172+
The Top-down Micro-architecture Analysis Method based on the Top-Down Characterization
173+
methodology aims to provide an insight into whether you have made wise choices with your
174+
algorithms and data structures. See the Intel |reg| 64 and IA-32 `Architectures Optimization
175+
Reference Manual <http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf>`_,
176+
Appendix B.1 for more details on the Top-down Micro-architecture Analysis Method.
177+
You can refer to this `technical paper
178+
<https://fd.io/wp-content/uploads/sites/34/2018/01/performance_analysis_sw_data_planes_dec21_2017.pdf>`_
179+
which adopts TMAM for systematic performance benchmarking and analysis of compute-native
180+
Network Function data planes executed on Commercial-Off-The-Shelf (COTS) servers using available
181+
open-source measurement tools.
182+
183+
Example: Using Perf to analysis TMAM level 1 on CPU core 1.
184+
185+
.. code-block:: console
186+
187+
perf stat --topdown -C 1 taskset -c 1 dd if=/dev/zero of=/dev/null count=10
188+
10+0 records in
189+
10+0 records out
190+
5120 bytes (5.1 kB, 5.0 KiB) copied, 0.00336348 s, 1.5 MB/s
191+
192+
Performance counter stats for 'CPU(s) 1':
193+
194+
retiring bad speculation frontend bound backend bound
195+
S0-C1 1 10.6% 1.5% 3.9% 84.0%
196+
197+
0.006737123 seconds time elapsed
198+

0 commit comments

Comments
 (0)