Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 71 additions & 62 deletions modules/virt-creating-and-exposing-mediated-devices.adoc
Original file line number Diff line number Diff line change
@@ -1,19 +1,24 @@
// Module included in the following assemblies:
//
// * virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc
// * virt/managing_vms/advanced_vm_management/virt-configuring-virtual-gpus.adoc

:_mod-docs-content-type: PROCEDURE
[id="virt-creating-exposing-mediated-devices_{context}"]
= Creating and exposing mediated devices

As an administrator, you can create mediated devices and expose them to the cluster by editing the `HyperConverged` custom resource (CR).
As an administrator, you can create mediated devices and expose them to the cluster by editing the `HyperConverged` custom resource (CR). The mediated device values that you supply can vary depending on the particular Graphics Processing Units (GPUs) you are using.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This additional sentence doesn't seem to add much for the reader. For conciseness, consider removing it, or you could shorten it to something like: "The values you add to the CR depend on your GPU model and vendor."


.Prerequisites

* You have installed the {oc-first}.
* You enabled the Input-Output Memory Management Unit (IOMMU) driver.
* If your hardware vendor provides drivers, you installed them on the nodes where you want to create mediated devices.
** If you use NVIDIA cards, you link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/openshift-virtualization.html[installed the NVIDIA GRID driver].
* You have enabled the Input-Output Memory Management Unit (IOMMU) driver.
* If your hardware vendor provides drivers, you have installed them on the nodes where you want to create mediated devices.
** If you use NVIDIA cards, you have link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/openshift-virtualization.html[installed the NVIDIA GRID driver].
Comment on lines +14 to +16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you changed the simple past tense verbs (like "You enabled") to use "have" (like "You have enabled"), which is not consistent with our style guidelines. The ISG has a short section on tense, indicating that we use simple present tense for most things, and simple past or future tense when present tense doesn't make sense.


[IMPORTANT]
====
Before {VirtProductName} 4.14, the `mediatedDeviceTypes` field was named `mediatedDevicesTypes`. Ensure that you use the correct field name when configuring mediated devices.
====
Comment on lines +18 to +21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we still need this :)


.Procedure

Expand All @@ -24,10 +29,10 @@ As an administrator, you can create mediated devices and expose them to the clus
$ oc edit hyperconverged kubevirt-hyperconverged -n {CNVNamespace}
----
+
.Example configuration file with mediated devices configured
*Example configuration*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Example configuration*
Example configuration file:
+

[%collapsible]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line and the next ==== because the code snippet is now smaller and the collapsible content does not render properly on docs.redhat

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [%collapsible] line still needs to be removed

====
[source,yaml,subs="attributes+"]

[source,yaml]
Comment on lines -30 to +35
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be reverted to [source,yaml,subs="attributes+"] because the code block has an attribute in it ({CNVNamespace})

----
apiVersion: hco.kubevirt.io/v1
kind: HyperConverged
Expand All @@ -43,81 +48,85 @@ spec:
- nvidia-233
nodeSelector:
kubernetes.io/hostname: node-11.redhat.com
permittedHostDevices:
mediatedDevices:
- mdevNameSelector: GRID T4-2Q
resourceName: nvidia.com/GRID_T4-2Q
- mdevNameSelector: GRID T4-8Q
resourceName: nvidia.com/GRID_T4-8Q
# ...
# ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really minor nit but I think it makes more sense without the leading spaces

----
====

. Create mediated devices by adding them to the `spec.mediatedDevicesConfiguration` stanza:
Identify the name selector and resource name values for the devices that you want to expose to the cluster, as shown in the following example. You can use the same value for both, replacing any spaces in the name with an underscore.
+
.Example YAML snippet
[source,yaml]
[source,terminal]
----
# ...
spec:
mediatedDevicesConfiguration:
mediatedDeviceTypes: <1>
- <device_type>
nodeMediatedDeviceTypes: <2>
- mediatedDeviceTypes: <3>
- <device_type>
nodeSelector: <4>
<node_selector_key>: <node_selector_value>
# ...
$ oc debug node/node-11.redhat.com
sh-5.1# chroot /host
sh-5.1# cd sys/class/mdev_bus
sh-5.1# ls
sh-5.1# cd 0000:4b:00.4/mdev_supported_types
sh-5.1# ls
sh-5.1# cd nvidia-745
sh-5.1# ls
sh-5.1# cat name
Comment on lines +58 to +66
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically 9 commands, and we are supposed to include only one command per code block. Additionally, the ls commands are not meaningful without example output, because the subsequent commands are based on the ls output.

----
<1> Required: Configures global settings for the cluster.
<2> Optional: Overrides the global configuration for a specific node or group of nodes. Must be used with the global `mediatedDeviceTypes` configuration.
<3> Required if you use `nodeMediatedDeviceTypes`. Overrides the global `mediatedDeviceTypes` configuration for the specified nodes.
<4> Required if you use `nodeMediatedDeviceTypes`. Must include a `key:value` pair.
+
[IMPORTANT]
====
Before {VirtProductName} 4.14, the `mediatedDeviceTypes` field was named `mediatedDevicesTypes`. Ensure that you use the correct field name when configuring mediated devices.
====

. Identify the name selector and resource name values for the devices that you want to expose to the cluster. You will add these values to the `HyperConverged` CR in the next step.
.. Find the `resourceName` value by running the following command:
+
.Example output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.Example output
Example output:
+

[source,terminal]
----
$ oc get $NODE -o json \
| jq '.status.allocatable \
| with_entries(select(.key | startswith("nvidia.com/"))) \
| with_entries(select(.value != "0"))'
0000:4b:00.4
nvidia-742 nvidia-744 nvidia-746 nvidia-748 nvidia-750 nvidia-752
nvidia-743 nvidia-745 nvidia-747 nvidia-749 nvidia-751 nvidia-753
available_instances create description device_api devices name
NVIDIA A2-2Q
----

.. Find the `mdevNameSelector` value by viewing the contents of `/sys/bus/pci/devices/<slot>:<bus>:<domain>.<function>/mdev_supported_types/<type>/name`, substituting the correct values for your system.
+
For example, the name file for the `nvidia-231` type contains the selector string `GRID T4-2Q`. Using `GRID T4-2Q` as the `mdevNameSelector` value allows nodes to use the `nvidia-231` type.
. Create and expose mediated devices by:
.. Adding them to the `spec.mediatedDevicesConfiguration` stanza.
.. Adding the `mdevNameSelector` and `resourceName` values to the `spec.permittedHostDevices.mediatedDevices` stanza of the `HyperConverged` CR.

. Expose the mediated devices to the cluster by adding the `mdevNameSelector` and `resourceName` values to the
`spec.permittedHostDevices.mediatedDevices` stanza of the `HyperConverged` CR:
. Identify the `mdevNameSelector` value by viewing the contents of:
`/sys/bus/pci/devices/<slot>:<bus>:<domain>.<function>/mdev_supported_types/<type>/name`.
Comment on lines +79 to +84
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbering restarts here due to formatting issues (see screenshot). In general, I am confused about the order in which this info is being presented... the "Identify the mdevNameSelector value..." step is immediately followed by the example snippet that shows the HCO spec, making it seem like the example should be specific to that step.

Image

+
.Example YAML snippet
.Example snippet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with removing "YAML" from this - it needs some sort of modifier of "snippet," like "YAML" or "manifest". Also, please remove the block title syntax for DITA conversion prep reasons, and do the same for any other similar examples. Here is the preferred way to do it (doesn't need to be bold):

Suggested change
.Example snippet
Example YAML snippet:
+

[source,yaml]
----
# ...
permittedHostDevices:
mediatedDevices:
- mdevNameSelector: GRID T4-2Q <1>
resourceName: nvidia.com/GRID_T4-2Q <2>
# ...
spec:
mediatedDevicesConfiguration:
mediatedDeviceTypes:
- nvidia-745
nodeMediatedDeviceTypes:
- mediatedDeviceTypes:
- nvidia-746
nodeSelector:
kubernetes.io/hostname: node-11.redhat.com
permittedHostDevices:
mediatedDevices:
- mdevNameSelector: GRID A2-2Q
resourceName: nvidia.com/GRID_A2-2Q
- mdevNameSelector: GRID A2-4Q
resourceName: nvidia.com/GRID_A2-4Q
Comment on lines +89 to +103
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a snippet, you need to put # ... before and after the code itself.

Suggested change
spec:
mediatedDevicesConfiguration:
mediatedDeviceTypes:
- nvidia-745
nodeMediatedDeviceTypes:
- mediatedDeviceTypes:
- nvidia-746
nodeSelector:
kubernetes.io/hostname: node-11.redhat.com
permittedHostDevices:
mediatedDevices:
- mdevNameSelector: GRID A2-2Q
resourceName: nvidia.com/GRID_A2-2Q
- mdevNameSelector: GRID A2-4Q
resourceName: nvidia.com/GRID_A2-4Q
# ...
spec:
mediatedDevicesConfiguration:
mediatedDeviceTypes:
- nvidia-745
nodeMediatedDeviceTypes:
- mediatedDeviceTypes:
- nvidia-746
nodeSelector:
kubernetes.io/hostname: node-11.redhat.com
permittedHostDevices:
mediatedDevices:
- mdevNameSelector: GRID A2-2Q
resourceName: nvidia.com/GRID_A2-2Q
- mdevNameSelector: GRID A2-4Q
resourceName: nvidia.com/GRID_A2-4Q
# ...

----
<1> Exposes the mediated devices that map to this value on the host.
<2> Matches the resource name that is allocated on the node.
+
where:

<mediatedDeviceTypes>:: Specifies global settings for the cluster and is required.

<nodeMediatedDeviceTypes>:: Specifies global configuration overrides for a specific node or group of nodes and is optional. Must be used with the global `mediatedDeviceTypes` configuration.

<mediatedDeviceTypes>:: Specifies an override to the global `mediatedDeviceTypes` configuration for the specified nodes. Required if you use `nodeMediatedDeviceTypes`.

<nodeSelector>:: Specifies the node selector and must include a `key:value` pair. Required if you use `nodeMediatedDeviceTypes`.

<mdevNameSelector>:: Specifies the mediated devices that map to this value on the host.

<resourceName>:: Specifies the matching resource name that is allocated on the node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, you don't need to use the < > characters because the example doesn't contain them. You just need to put the parameters in backticks.

Suggested change
<resourceName>:: Specifies the matching resource name that is allocated on the node.
`mediatedDeviceTypes`:: Specifies global settings for the cluster and is required.
`nodeMediatedDeviceTypes`:: Specifies global configuration overrides for a specific node or group of nodes and is optional. Must be used with the global `mediatedDeviceTypes` configuration.
`mediatedDeviceTypes`:: Specifies an override to the global `mediatedDeviceTypes` configuration for the specified nodes. Required if you use `nodeMediatedDeviceTypes`.
`nodeSelector`:: Specifies the node selector and must include a `key:value` pair. Required if you use `nodeMediatedDeviceTypes`.
`mdevNameSelector`:: Specifies the mediated devices that map to this value on the host.
`resourceName`:: Specifies the matching resource name that is allocated on the node.


. Save your changes and exit the editor.

.Verification

* Optional: Confirm that a device was added to a specific node by running the following command:
* Confirm that the virtual GPU is attached to the node by running the following command:
+
[source,terminal]
----
$ oc describe node <node_name>
$ oc get node <node_name> -o json \
| jq '.status.allocatable' \
| with_entries(select(.key | startswith("nvidia.com/"))) \
| with_entries(select(.value != "0"))
----