Skip to content

Commit bf7b1cf

Browse files
committed
doc: update HLD Device passthrough
transcode, edit, and upload HLD 0.7 section 3.9 (Device passthrough) Tracked-on: #1645 Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
1 parent 7c192db commit bf7b1cf

11 files changed

+276
-0
lines changed

doc/developer-guides/hld/hld-hypervisor.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,4 @@ Hypervisor high-level design
1515
Timer <hv-timer>
1616
Virtual Interrupt <hv-virt-interrupt>
1717
VT-d <hv-vt-d>
18+
Device Passthrough <hv-dev-passthrough>
Lines changed: 275 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
.. _hv-device-passthrough:
2+
3+
Device PassThrough
4+
##################
5+
6+
A critical part of virtualization is virtualizing devices: exposing all
7+
aspects of a device including its I/O, interrupts, DMA, and configuration.
8+
There are three typical device
9+
virtualization methods: emulation, para-virtualization, and passthrough.
10+
Both emulation and passthrough are used in ACRN project. Device
11+
emulation is discussed in :ref:`hld-io-emulation` and
12+
device passthrough will be discussed here.
13+
14+
In the ACRN project, device emulation means emulating all existing hardware
15+
resource through a software component device model running in the
16+
Service OS (SOS). Device
17+
emulation must maintain the same SW interface as a native device,
18+
providing transparency to the VM software stack. Passthrough implemented in
19+
hypervisor assigns a physical device to a VM so the VM can access
20+
the hardware device directly with minimal (if any) VMM involvement.
21+
22+
The difference between device emulation and passthrough is shown in
23+
:numref:`emu-passthru-diff`. You can notice device emulation has
24+
a longer access path which causes worse performance compared with
25+
passthrough. Passthrough can deliver near-native performance, but
26+
can’t support device sharing.
27+
28+
.. figure:: images/passthru-image30.png
29+
:align: center
30+
:name: emu-passthru-diff
31+
32+
Difference between Emulation and passthrough
33+
34+
Passthrough in the hypervisor provides the following functionalities to
35+
allow VM to access PCI devices directly:
36+
37+
- DMA Remapping by VT-d for PCI device: hypervisor will setup DMA
38+
remapping during VM initialization phase.
39+
- MMIO Remapping between virtual and physical BAR
40+
- Device configuration Emulation
41+
- Remapping interrupts for PCI device
42+
- ACPI configuration Virtualization
43+
- GSI sharing violation check
44+
45+
The following diagram details passthrough initialization control flow in ACRN:
46+
47+
.. figure:: images/passthru-image22.png
48+
:align: center
49+
50+
Passthrough devices initialization control flow
51+
52+
Passthrough Device status
53+
*************************
54+
55+
Most common devices on supported platforms are enabled for
56+
passthrough, as detailed here:
57+
58+
.. figure:: images/passthru-image77.png
59+
:align: center
60+
61+
Passthrough Device Status
62+
63+
DMA Remapping
64+
*************
65+
66+
To enable passthrough, for VM DMA access the VM can only
67+
support GPA, while physical DMA requires HPA. One work-around
68+
is building identity mapping so that GPA is equal to HPA, but this
69+
is not recommended as some VM don’t support relocation well. To
70+
address this issue, Intel introduces VT-d in chipset to add one
71+
remapping engine to translate GPA to HPA for DMA operations.
72+
73+
Each VT-d engine (DMAR Unit), maintains a remapping structure
74+
similar to a page table with device BDF (Bus/Dev/Func) as input and final
75+
page table for GPA/HPA translation as output. The GPA/HPA translation
76+
page table is similar to a normal multi-level page table.
77+
78+
VM DMA depends on Intel VT-d to do the translation from GPA to HPA, so we
79+
need to enable VT-d IOMMU engine in ACRN before we can passthrough any device. SOS
80+
in ACRN is a VM running in non-root mode which also depends
81+
on VT-d to access a device. In SOS DMA remapping
82+
engine settings, GPA is equal to HPA.
83+
84+
ACRN hypervisor checks DMA-Remapping Hardware unit Definition (DRHD) in
85+
host DMAR ACPI table to get basic info, then sets up each DMAR unit. For
86+
simplicity, ACRN reuses EPT table as the translation table in DMAR
87+
unit for each passthrough device. The control flow is shown in the
88+
following figures:
89+
90+
.. figure:: images/passthru-image72.png
91+
:align: center
92+
93+
DMA Remapping control flow during HV init
94+
95+
.. figure:: images/passthru-image86.png
96+
:align: center
97+
98+
ptdev assignment control flow
99+
100+
.. figure:: images/passthru-image42.png
101+
:align: center
102+
103+
ptdev de-assignment control flow
104+
105+
106+
MMIO Remapping
107+
**************
108+
109+
For PCI MMIO BAR, hypervisor builds EPT mapping between virtual BAR and
110+
physical BAR, then VM can access MMIO directly.
111+
112+
Device configuration emulation
113+
******************************
114+
115+
PCI configuration is based on access of port 0xCF8/CFC. ACRN
116+
implements PCI configuration emulation to handle 0xCF8/CFC to control
117+
PCI device through two paths: implemented in hypervisor or in SOS device
118+
model.
119+
120+
- When configuration emulation is in the hypervisor, the interception of
121+
0xCF8/CFC port and emulatation of PCI configuration space access are
122+
tricky and unclean. Therefore the final solution is to reuse the
123+
PCI emulation infrastructure of SOS device model. The hypervisor
124+
routes the UOS 0xCF8/CFC access to device model, and keeps blind to the
125+
physical PCI devices. Upon receiving UOS PCI configuration space access
126+
request, device model needs to emulate some critical space, for instance,
127+
BAR, MSI capability, and INTLINE/INTPIN.
128+
129+
- For other access, device model
130+
reads/writes physical configuration space on behalf of UOS. To do
131+
this, device model is linked with lib pci access to access physical PCI
132+
device.
133+
134+
Interrupt Remapping
135+
*******************
136+
137+
When the physical interrupt of a passthrough device happens, hypervisor has
138+
to distribute it to the relevant VM according to interrupt remapping
139+
relationships. The structure ``ptdev_remapping_info`` is used to define
140+
the subordination relation between physical interrupt and VM, the
141+
virtual destination, etc. See the following figure for details:
142+
143+
.. figure:: images/passthru-image91.png
144+
:align: center
145+
146+
Remapping of physical interrupts
147+
148+
There are two different types of interrupt source: IOAPIC and MSI.
149+
The hypervisor will record different information for interrupt
150+
distribution: physical and virtual IOAPIC pin for IOAPIC source,
151+
physical and virtual BDF and other info for MSI source.
152+
153+
SOS passthrough is also in the scope of interrupt remapping which is
154+
done on-demand rather than on hypervisor initialization.
155+
156+
.. figure:: images/passthru-image102.png
157+
:align: center
158+
:name: init-remapping
159+
160+
Initialization of remapping of virtual IOAPIC interrupts for SOS
161+
162+
:numref:`init-remapping` above illustrates how remapping of (virtual) IOAPIC
163+
interrupts are remappied for SOS. VM exit occurs whenever SOS tries to
164+
unmask an interrupt in (virtual) IOAPIC by writing to the Redirection
165+
Table Entry (or RTE). The hypervisor then invokes the IOAPIC emulation
166+
handler (refer to :ref:`hld-io-emulation` for details on I/O emulation) which
167+
calls APIs to set up a remapping for the to-be-unmasked interrupt.
168+
169+
Remapping of (virtual) PIC interrupts are set up in a similar sequence:
170+
171+
.. figure:: images/passthru-image98.png
172+
:align: center
173+
174+
Initialization of remapping of virtual MSI for SOS
175+
176+
This figure illustrates how mappings of MSI or MSIX are set up for
177+
SOS. SOS is responsible for issuing an hypercall to notify the
178+
hypervisor before it configures the PCI configuration space to enable an
179+
MSI. The hypervisor takes this opportunity to set up a remapping for the
180+
given MSI or MSIX before it is actually enabled by SOS.
181+
182+
When the UOS needs to access the physical device by passthrough, it uses
183+
the following steps:
184+
185+
- UOS gets a virtual interrupt
186+
- VM exit happens and the trapped vCPU is the target where the interrup
187+
will be injected.
188+
- Hypervisor will handle the interrupt and translate the vector
189+
according to ptdev_remapping_info.
190+
- Hypervisor delivers the interrupt to UOS.
191+
192+
When the SOS needs to use the physical device, the passthrough is also
193+
active because the SOS is the first VM. The detail steps are:
194+
195+
- SOS get all physical interrupts. It assigns different interrupts for
196+
different VMs during initialization and reassign when a VM is created or
197+
deleted.
198+
- When physical interrupt is trapped, an exception will happen after VMCS
199+
has been set.
200+
- Hypervisor will handle the vm exit issue according to
201+
ptdev_remapping_info and translates the vector.
202+
- The interrupt will be injected the same as a virtual interrupt.
203+
204+
ACPI Virtualization
205+
*******************
206+
207+
ACPI virtualization is designed in ACRN with these assumptions:
208+
209+
- HV has no knowledge of ACPI,
210+
- SOS owns all physical ACPI resources,
211+
- UOS sees virtual ACPI resources emulated by device model.
212+
213+
Some passthrough devices require physical ACPI table entry for
214+
initialization. The device model will create such device entry based on
215+
the physical one according to vendor ID and device ID. Virtualization is
216+
implemented in SOS device model and not in scope of the hypervisor.
217+
218+
GSI Sharing Violation Check
219+
***************************
220+
221+
All the PCI devices that are sharing the same GSI should be assigned to
222+
the same VM to avoid physical GSI sharing between multiple VMs. For
223+
devices that don't support MSI, ACRN DM
224+
shares the same GSI pin to a GSI
225+
sharing group. The devices in the same group should be assigned together to
226+
the current VM, otherwise, none of them should be assigned to the
227+
current VM. A device that violates the rule will be rejected to be
228+
passthrough. The checking logic is implemented in Device Mode and not
229+
in scope of hypervisor.
230+
231+
Data structures and interfaces
232+
******************************
233+
234+
.. note:: replace with reference to API docs
235+
236+
The following APIs are provided to initialize interrupt remapping for
237+
SOS:
238+
239+
- int ptdev_intx_pin_remap(struct vm \*vm, uint8_t virt_pin, enum
240+
ptdev_vpin_source vpin_src);
241+
242+
Set up the remapping of the given virtual pin for the given vm.
243+
244+
- int ptdev_msix_remap(struct vm \*vm, uint16_t virt_bdf, uint16_t
245+
entry_nr, struct ptdev_msi_info \*info);
246+
247+
The following APIs are provided to manipulate the interrupt remapping
248+
for UOS.
249+
250+
- int ptdev_add_intx_remapping(struct vm \*vm, uint16_t virt_bdf,
251+
uint16_t phys_bdf, uint8_t virt_pin, uint8_t phys_pin, bool
252+
pic_pin);
253+
254+
Add mapping between the given virtual and physical pin for the
255+
given vm.
256+
257+
- void ptdev_remove_intx_remapping(struct vm \*vm, uint8_t
258+
virt_pin, bool pic_pin);
259+
260+
Remove mapping of the given virtual pin for the given vm.
261+
262+
- int ptdev_add_msix_remapping(struct vm \*vm, uint16_t virt_bdf,
263+
uint16_t phys_bdf, uint32_t vector_count);
264+
265+
Add mapping of the given number of vectors between the given
266+
physical and virtual BDF for the given vm.
267+
268+
- void ptdev_remove_msix_remapping(struct vm \*vm, uint16_t
269+
virt_bdf, uint32_t vector_count);
270+
271+
Remove the mapping of given number of vectors of the given virtual
272+
BDF for the given vm.
273+
274+
The following APIs are provided to acknowledge a virtual interrupt.
275+
20.8 KB
Loading
40 KB
Loading
104 KB
Loading
27.3 KB
Loading
4.21 KB
Loading
25.7 KB
Loading
25.6 KB
Loading
19.9 KB
Loading

0 commit comments

Comments
 (0)