diff --git a/machine_management/creating_machinesets/creating-machineset-aws.adoc b/machine_management/creating_machinesets/creating-machineset-aws.adoc index 45ff002afd26..6aae019be3df 100644 --- a/machine_management/creating_machinesets/creating-machineset-aws.adoc +++ b/machine_management/creating_machinesets/creating-machineset-aws.adoc @@ -37,3 +37,18 @@ include::modules/machineset-non-guaranteed-instance.adoc[leveloffset=+1] //Creating Spot Instances by using machine sets include::modules/machineset-creating-non-guaranteed-instances.adoc[leveloffset=+2] + +//Machine sets that enable AWS Elastic Fabric Adapter (EFA) +include::modules/machineset-efa-options.adoc[leveloffset=+1] + +//Creating machines that use an AWS EFA +include::modules/machineset-creating-efa-options.adoc[leveloffset=+2] + +//Enabling MPI workloads that use an AWS EFA +include::modules/machineset-enabling-efa-options.adoc[leveloffset=+2] +[role="_additional-resources"] +.Additional resources +* link:https://github.com/aws/libfabric[libfabric] +* xref:../../post_installation_configuration/node-tasks.adoc#configuring-huge-pages_post-install-node-tasks[Configuring huge pages] +* link:https://cloud.redhat.com/blog/how-to-use-kubeflow-and-the-mpi-operator-on-openshift[How to use Kubeflow and the MPI Operator on OpenShift] +* xref:../../hardware_enablement/psap-node-feature-discovery-operator.adoc#installing-the-node-feature-discovery-operator_node-feature-discovery-operator[Installing the Node Feature Discovery Operator] diff --git a/modules/machineset-creating-efa-options.adoc b/modules/machineset-creating-efa-options.adoc new file mode 100644 index 000000000000..460a74d274d6 --- /dev/null +++ b/modules/machineset-creating-efa-options.adoc @@ -0,0 +1,44 @@ +// Module included in the following assemblies: +// +// * machine_management/creating_machinesets/creating-machineset-aws.adoc + +:_content-type: PROCEDURE +[id="machineset-creating-efa-options_{context}"] += Creating machines that use an AWS Elastic Fabric Adapter + +You can deploy compute machines that use an AWS Elastic Fabric Adapter (EFA) by adding the `networkInterfaceType` field to the machine set YAML file for your compute machines. + +[NOTE] +==== +Machines that use an EFA must belong to security groups that allow all traffic between all hosts in the security group. You might find it helpful to create a dedicated security group for machines that use an EFA. + +You can manually configure your security groups to support the use of an EFA by using the AWS Management Console or the AWS CLI. For more information, see the Amazon EC2 documentation about https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/working-with-security-groups.html#updating-security-group-rules[updating security group rules]. +==== + +.Prerequisites + +* You have configured security groups to support the use of an EFA. + +.Procedure + +. In a text editor, open the YAML file for an existing AWS machine set or create a new one. + +. Add the following line under the `providerSpec` field: ++ +[source,yaml] +---- +providerSpec: + value: + networkInterfaceType: EFA <1> +---- +<1> Specify the type of network interface to use. To use an EFA, set this value to `EFA`. To use a standard Elastic Network Adapter (ENA), set this value to `ENA`. If no value is specified, machines deployed by the machine set use a standard ENA. + +.Verification + +. In the AWS Management Console, locate an EC2 instance that the machine set deployed. + +. On the *Networking* tab, verify that the *Interface type* value under *Network interfaces* is `Elastic Fabric Adapter`. + +.Next steps + +* If you plan to run MPI workloads with the EFA node, you must install additional software. For more information, see "Enabling MPI workloads that use an AWS Elastic Fabric Adapter". \ No newline at end of file diff --git a/modules/machineset-efa-options.adoc b/modules/machineset-efa-options.adoc new file mode 100644 index 000000000000..e1e5b97f7b76 --- /dev/null +++ b/modules/machineset-efa-options.adoc @@ -0,0 +1,16 @@ +// Module included in the following assemblies: +// +// * machine_management/creating_machinesets/creating-machineset-aws.adoc + +:_content-type: CONCEPT +[id="machineset-efa-options_{context}"] += Machine sets that support using an Elastic Fabric Adapter + +You can use machine sets to create compute machines that use an link:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html[Elastic Fabric Adapter] (EFA) as their primary network interface. + +[NOTE] +==== +Control plane machines cannot use an EFA as their primary network interface. +==== + +For more information about instance types that support using an EFA, see the Amazon EC2 documentation about https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types[supported instance types]. \ No newline at end of file diff --git a/modules/machineset-enabling-efa-options.adoc b/modules/machineset-enabling-efa-options.adoc new file mode 100644 index 000000000000..173c844059d1 --- /dev/null +++ b/modules/machineset-enabling-efa-options.adoc @@ -0,0 +1,109 @@ +// Module included in the following assemblies: +// +// * machine_management/creating_machinesets/creating-machineset-aws.adoc + +:_content-type: PROCEDURE +[id="machineset-enabling-efa-options_{context}"] += Enabling MPI workloads that use an AWS Elastic Fabric Adapter + +After configuring a machine set to support the use of an AWS Elastic Fabric Adapter (EFA), you must install additional software to run MPI workloads. + +For more information about using Kubeflow and the MPI Operator in {product-title} and an example, see link:https://cloud.redhat.com/blog/how-to-use-kubeflow-and-the-mpi-operator-on-openshift[How to use Kubeflow and the MPI Operator on OpenShift]. + +.Prerequisites + +* You have configured a machine set to support the use of an EFA. + +.Procedure + +. Create a machine configuration that allows for an unlimited `memlock`. + +.. Generate a base64-encoded string for a file that removes `memlock` limits. ++ +.Example raw data +[source,terminal] +---- +[crio.runtime] +default_ulimits = [ + "memlock=-1:-1" +] +---- ++ +.Example base64-encoded data +[source,terminal] +---- +W2NyaW8ucnVudGltZV0KZGVmYXVsdF91bGltaXRzID0gWwogICAgICAgICJtZW1sb2NrPS0xOi0xIgpdCg== +---- + +.. Create a file named `unlimited-memlock.yaml` with the following YAML definition: ++ +[source,yaml] +---- +apiVersion: machineconfiguration.openshift.io/v1 +kind: MachineConfig +metadata: + labels: + machineconfiguration.openshift.io/role: worker + name: 02-worker-container-runtime <1> +spec: + config: + ignition: + version: 3.2.0 + storage: + files: + - contents: + source: data:text/plain;charset=utf-8;base64, <2> + mode: 420 + overwrite: true + path: /etc/crio/crio.conf.d/10-memlock <3> +---- +<1> Specify a name for the machine configuration. +<2> Specify a base64-encoded string for the unlimited `memlock` file data. +<3> Specify the path for the `memlock` resource. + +.. To create the `MachineConfig` object, enter the following command: ++ +[source,terminal] +---- +$ oc create -f unlimited-memlock.yaml +---- + +. Install link:https://github.com/aws/libfabric[libfabric] on your cluster. ++ +.Verification ++ +Verify that the libfabric DaemonSet that exposes EFA capabilities is running by entering the following command and observing the output: ++ +[source,terminal] +---- +$ oc get po -n kube-system +---- ++ +.Example output ++ +[source,terminal] +---- +NAME READY STATUS RESTARTS AGE +aws-efa-k8s-device-plugin-daemonset-zz5p9 1/1 Running 0 8h +---- + +. Configure huge pages with a minimum size of 2MB to support the MPI Operator. ++ +For more information, see "Configuring huge pages" in the "Node tasks" section of the "Post-installation configuration" documentation. ++ +[NOTE] +==== +The 2MB minimum required huge page size to support the MPI Operator might not be enough for the size of the instance types within your cluster. Ensure that your huge page configuration meets your requirements. +==== + +. Install the Kubeflow MPI Operator on your cluster. ++ +For more information, see link:https://cloud.redhat.com/blog/how-to-use-kubeflow-and-the-mpi-operator-on-openshift[How to use Kubeflow and the MPI Operator on OpenShift]. + +. Install the Node Feature Discovery Operator from the OperatorHub on your cluster. ++ +For more information, see "Installing the Node Feature Discovery Operator" in the "Node Feature Discovery Operator" section of the "Specialized hardware and driver enablement" documentation. + +.Verification + +* Verify that the Node Feature Discovery Operator has configured the node status to show the EFA interface as an allocatable resource. \ No newline at end of file