-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[OSDOCS-3134]: AWS machineset support for EFA #46408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
jeana-redhat
wants to merge
1
commit into
openshift:main
from
jeana-redhat:OSDOCS-3134-AWS-EFA-support
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| // Module included in the following assemblies: | ||
| // | ||
| // * machine_management/creating_machinesets/creating-machineset-aws.adoc | ||
|
|
||
| :_content-type: PROCEDURE | ||
| [id="machineset-creating-efa-options_{context}"] | ||
| = Creating machines that use an AWS Elastic Fabric Adapter | ||
|
|
||
| You can deploy compute machines that use an AWS Elastic Fabric Adapter (EFA) by adding the `networkInterfaceType` field to the machine set YAML file for your compute machines. | ||
|
|
||
| [NOTE] | ||
| ==== | ||
| Machines that use an EFA must belong to security groups that allow all traffic between all hosts in the security group. You might find it helpful to create a dedicated security group for machines that use an EFA. | ||
|
|
||
| You can manually configure your security groups to support the use of an EFA by using the AWS Management Console or the AWS CLI. For more information, see the Amazon EC2 documentation about https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/working-with-security-groups.html#updating-security-group-rules[updating security group rules]. | ||
| ==== | ||
|
|
||
| .Prerequisites | ||
|
|
||
| * You have configured security groups to support the use of an EFA. | ||
|
|
||
| .Procedure | ||
|
|
||
| . In a text editor, open the YAML file for an existing AWS machine set or create a new one. | ||
|
|
||
| . Add the following line under the `providerSpec` field: | ||
| + | ||
| [source,yaml] | ||
| ---- | ||
| providerSpec: | ||
| value: | ||
| networkInterfaceType: EFA <1> | ||
| ---- | ||
| <1> Specify the type of network interface to use. To use an EFA, set this value to `EFA`. To use a standard Elastic Network Adapter (ENA), set this value to `ENA`. If no value is specified, machines deployed by the machine set use a standard ENA. | ||
|
|
||
| .Verification | ||
|
|
||
| . In the AWS Management Console, locate an EC2 instance that the machine set deployed. | ||
|
|
||
| . On the *Networking* tab, verify that the *Interface type* value under *Network interfaces* is `Elastic Fabric Adapter`. | ||
|
|
||
| .Next steps | ||
|
|
||
| * If you plan to run MPI workloads with the EFA node, you must install additional software. For more information, see "Enabling MPI workloads that use an AWS Elastic Fabric Adapter". |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| // Module included in the following assemblies: | ||
| // | ||
| // * machine_management/creating_machinesets/creating-machineset-aws.adoc | ||
|
|
||
| :_content-type: CONCEPT | ||
| [id="machineset-efa-options_{context}"] | ||
| = Machine sets that support using an Elastic Fabric Adapter | ||
|
|
||
| You can use machine sets to create compute machines that use an link:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html[Elastic Fabric Adapter] (EFA) as their primary network interface. | ||
|
|
||
| [NOTE] | ||
| ==== | ||
| Control plane machines cannot use an EFA as their primary network interface. | ||
| ==== | ||
|
|
||
| For more information about instance types that support using an EFA, see the Amazon EC2 documentation about https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types[supported instance types]. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| // Module included in the following assemblies: | ||
| // | ||
| // * machine_management/creating_machinesets/creating-machineset-aws.adoc | ||
|
|
||
| :_content-type: PROCEDURE | ||
| [id="machineset-enabling-efa-options_{context}"] | ||
| = Enabling MPI workloads that use an AWS Elastic Fabric Adapter | ||
|
|
||
| After configuring a machine set to support the use of an AWS Elastic Fabric Adapter (EFA), you must install additional software to run MPI workloads. | ||
|
|
||
| For more information about using Kubeflow and the MPI Operator in {product-title} and an example, see link:https://cloud.redhat.com/blog/how-to-use-kubeflow-and-the-mpi-operator-on-openshift[How to use Kubeflow and the MPI Operator on OpenShift]. | ||
|
|
||
| .Prerequisites | ||
|
|
||
| * You have configured a machine set to support the use of an EFA. | ||
|
|
||
| .Procedure | ||
|
|
||
| . Create a machine configuration that allows for an unlimited `memlock`. | ||
|
|
||
| .. Generate a base64-encoded string for a file that removes `memlock` limits. | ||
| + | ||
| .Example raw data | ||
| [source,terminal] | ||
| ---- | ||
| [crio.runtime] | ||
| default_ulimits = [ | ||
| "memlock=-1:-1" | ||
| ] | ||
| ---- | ||
| + | ||
| .Example base64-encoded data | ||
| [source,terminal] | ||
| ---- | ||
| W2NyaW8ucnVudGltZV0KZGVmYXVsdF91bGltaXRzID0gWwogICAgICAgICJtZW1sb2NrPS0xOi0xIgpdCg== | ||
| ---- | ||
|
|
||
| .. Create a file named `unlimited-memlock.yaml` with the following YAML definition: | ||
| + | ||
| [source,yaml] | ||
| ---- | ||
| apiVersion: machineconfiguration.openshift.io/v1 | ||
| kind: MachineConfig | ||
| metadata: | ||
| labels: | ||
| machineconfiguration.openshift.io/role: worker | ||
| name: 02-worker-container-runtime <1> | ||
| spec: | ||
| config: | ||
| ignition: | ||
| version: 3.2.0 | ||
| storage: | ||
| files: | ||
| - contents: | ||
| source: data:text/plain;charset=utf-8;base64,<base64-encoded-memlock-data> <2> | ||
| mode: 420 | ||
| overwrite: true | ||
| path: /etc/crio/crio.conf.d/10-memlock <3> | ||
| ---- | ||
| <1> Specify a name for the machine configuration. | ||
| <2> Specify a base64-encoded string for the unlimited `memlock` file data. | ||
| <3> Specify the path for the `memlock` resource. | ||
|
|
||
| .. To create the `MachineConfig` object, enter the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc create -f unlimited-memlock.yaml | ||
| ---- | ||
|
|
||
| . Install link:https://github.com/aws/libfabric[libfabric] on your cluster. | ||
|
||
| + | ||
| .Verification | ||
| + | ||
| Verify that the libfabric DaemonSet that exposes EFA capabilities is running by entering the following command and observing the output: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc get po -n kube-system | ||
| ---- | ||
| + | ||
| .Example output | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| NAME READY STATUS RESTARTS AGE | ||
| aws-efa-k8s-device-plugin-daemonset-zz5p9 1/1 Running 0 8h | ||
| ---- | ||
|
|
||
| . Configure huge pages with a minimum size of 2MB to support the MPI Operator. | ||
| + | ||
| For more information, see "Configuring huge pages" in the "Node tasks" section of the "Post-installation configuration" documentation. | ||
| + | ||
| [NOTE] | ||
| ==== | ||
| The 2MB minimum required huge page size to support the MPI Operator might not be enough for the size of the instance types within your cluster. Ensure that your huge page configuration meets your requirements. | ||
| ==== | ||
|
|
||
| . Install the Kubeflow MPI Operator on your cluster. | ||
| + | ||
| For more information, see link:https://cloud.redhat.com/blog/how-to-use-kubeflow-and-the-mpi-operator-on-openshift[How to use Kubeflow and the MPI Operator on OpenShift]. | ||
|
|
||
| . Install the Node Feature Discovery Operator from the OperatorHub on your cluster. | ||
| + | ||
| For more information, see "Installing the Node Feature Discovery Operator" in the "Node Feature Discovery Operator" section of the "Specialized hardware and driver enablement" documentation. | ||
|
|
||
| .Verification | ||
|
|
||
| * Verify that the Node Feature Discovery Operator has configured the node status to show the EFA interface as an allocatable resource. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using a blog; as the source of 'how to install' and get setup to use EFA capabilities'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we cloning this (from: https://github.com/kubeflow/mpi-operator); vs instilling it from Operator Hub?