Skip to content

Commit

Permalink
[Doc] Add vSphere cluster configuration reference with examples (ray-…
Browse files Browse the repository at this point in the history
…project#39379)

Similar to other providers, we add example-minimal.yaml and example-full.yaml to vSphere autoscaler. And we add and refine vSphere related references in the Getting Started guide as well as the cluster configuration reference page, based on the newly added examples.

Why are these changes needed?
In PR ray-project#37815 we've added vSphere platform support to Ray Autoscaler. However, the related documents are not sufficient. This follow-up change adds related examples similar to other platforms. The related documents including the getting-started guide as well as the cluster configuration reference also need to be updated to include descriptions specific for vSphere.

We will do another follow-up PR to add a "Launching Ray Clusters on vSphere" user guide at https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/index.html


Signed-off-by: Fangchi Wang <wfangchi@vmware.com>
  • Loading branch information
wfangchi authored and harborn committed Sep 8, 2023
1 parent 23eaa0b commit 032f570
Show file tree
Hide file tree
Showing 6 changed files with 365 additions and 32 deletions.
26 changes: 3 additions & 23 deletions doc/source/cluster/vms/getting-started.rst
Expand Up @@ -109,7 +109,7 @@ Next, if you're not set up to use your cloud provider from the command line, you

.. code-block:: shell
$ export VSPHERE_SERVER=192.168.0.1 # Enter your vSphere IP
$ export VSPHERE_SERVER=192.168.0.1 # Enter your vSphere vCenter Address
$ export VSPHERE_USER=user # Enter your username
$ export VSPHERE_PASSWORD=password # Enter your password
Expand Down Expand Up @@ -262,28 +262,8 @@ A minimal sample cluster configuration file looks as follows:

.. tab:: vSphere

.. code-block:: yaml
# A unique identifier for the head node and workers of this cluster.
cluster_name: minimal
# Cloud-provider specific configuration.
provider:
type: vsphere
auth:
ssh_user: ray # The VMs are initialised with an user called ray.
available_node_types:
ray.head.default:
node_config:
resource_pool: ray # Resource pool where the Ray cluster will get created
library_item: ray-head-debian # OVF file name from which the head will be created
worker:
node_config:
clone: True # If True, all the workers will be instant-cloned from a frozen VM
library_item: ray-frozen-debian # The OVF file from which a frozen VM will be created
.. literalinclude:: ../../../../python/ray/autoscaler/vsphere/example-minimal.yaml
:language: yaml


Save this configuration file as ``config.yaml``. You can specify a lot more details in the configuration file: instance types to use, minimum and maximum number of workers to start, autoscaling strategy, files to sync, and more. For a full reference on the available configuration properties, please refer to the :ref:`cluster YAML configuration options reference <cluster-config>`.
Expand Down
206 changes: 206 additions & 0 deletions doc/source/cluster/vms/references/ray-cluster-configuration.rst
Expand Up @@ -96,6 +96,12 @@ Auth
:ref:`ssh_user <cluster-configuration-ssh-user>`: str
:ref:`ssh_private_key <cluster-configuration-ssh-private-key>`: str
.. tab-item:: vSphere

.. parsed-literal::
:ref:`ssh_user <cluster-configuration-ssh-user>`: str
.. _cluster-configuration-provider-type:

Provider
Expand Down Expand Up @@ -137,6 +143,14 @@ Provider
:ref:`cache_stopped_nodes <cluster-configuration-cache-stopped-nodes>`: bool
:ref:`use_internal_ips <cluster-configuration-use-internal-ips>`: bool
.. tab-item:: vSphere

.. parsed-literal::
:ref:`type <cluster-configuration-type>`: str
:ref:`vsphere_config <cluster-configuration-vsphere-config>`:
:ref:`vSphere Config <cluster-configuration-vsphere-config-type>`
.. _cluster-configuration-security-group-type:

Security Group
Expand All @@ -152,6 +166,35 @@ Security Group
:ref:`IpPermissions <cluster-configuration-ip-permissions>`:
- `IpPermission <https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_IpPermission.html>`_
.. _cluster-configuration-vsphere-config-type:

vSphere Config
~~~~~~~~~~~~~~

.. tab-set::

.. tab-item:: vSphere

.. parsed-literal::
:ref:`credentials <cluster-configuration-vsphere-credentials>`:
:ref:`vSphere Credentials <cluster-configuration-vsphere-credentials-type>`
.. _cluster-configuration-vsphere-credentials-type:

vSphere Credentials
~~~~~~~~~~~~~~~~~~~

.. tab-set::

.. tab-item:: vSphere

.. parsed-literal::
:ref:`user <cluster-configuration-vsphere-user>`: str
:ref:`password <cluster-configuration-vsphere-password>`: str
:ref:`server <cluster-configuration-vsphere-server>`: str
.. _cluster-configuration-node-types-type:

Node types
Expand Down Expand Up @@ -204,6 +247,20 @@ nodes with the newly applied ``node_config`` will then be created according to c

A YAML object as defined in `the GCP docs <https://cloud.google.com/compute/docs/reference/rest/v1/instances>`_.

.. tab-item:: vSphere

.. parsed-literal::
# The resource pool where the head node should live, if unset, will be
# the frozen VM's resource pool.
resource_pool: str
# Mandatory: The frozen VM name from which the head node will be instant-cloned.
frozen_vm_name: str
# The datastore to store the vmdk of the head node vm, if unset, will be
# the frozen VM's datastore.
datastore: str
.. _cluster-configuration-node-docker-type:

Node Docker
Expand Down Expand Up @@ -738,6 +795,10 @@ The user that Ray will authenticate with when launching new nodes.
* **Importance:** Low
* **Type:** String

.. tab-item:: vSphere

Not available. The vSphere provider expects the key to be located at a fixed path ``~/ray-bootstrap-key.pem`` and will automatically generate one if not found.

.. _cluster-configuration-ssh-public-key:

``auth.ssh_public_key``
Expand All @@ -761,6 +822,10 @@ The user that Ray will authenticate with when launching new nodes.

Not available.

.. tab-item:: vSphere

Not available.

.. _cluster-configuration-type:

``provider.type``
Expand Down Expand Up @@ -792,6 +857,14 @@ The user that Ray will authenticate with when launching new nodes.
* **Importance:** High
* **Type:** String

.. tab-item:: vSphere

The cloud service provider. For vSphere and VCF, this must be set to ``vsphere``.

* **Required:** Yes
* **Importance:** High
* **Type:** String

.. _cluster-configuration-region:

``provider.region``
Expand Down Expand Up @@ -821,6 +894,10 @@ The user that Ray will authenticate with when launching new nodes.
* **Type:** String
* **Default:** us-west1

.. tab-item:: vSphere

Not available.

.. _cluster-configuration-availability-zone:

``provider.availability_zone``
Expand Down Expand Up @@ -852,6 +929,10 @@ The user that Ray will authenticate with when launching new nodes.
* **Type:** String
* **Default:** us-west1-a

.. tab-item:: vSphere

Not available.

.. _cluster-configuration-location:

``provider.location``
Expand All @@ -876,6 +957,10 @@ The user that Ray will authenticate with when launching new nodes.

Not available.

.. tab-item:: vSphere

Not available.

.. _cluster-configuration-resource-group:

``provider.resource_group``
Expand All @@ -900,6 +985,10 @@ The user that Ray will authenticate with when launching new nodes.

Not available.

.. tab-item:: vSphere

Not available.

.. _cluster-configuration-subscription-id:

``provider.subscription_id``
Expand All @@ -924,6 +1013,10 @@ The user that Ray will authenticate with when launching new nodes.

Not available.

.. tab-item:: vSphere

Not available.

.. _cluster-configuration-project-id:

``provider.project_id``
Expand All @@ -948,6 +1041,10 @@ The user that Ray will authenticate with when launching new nodes.
* **Type:** String
* **Default:** ``null``

.. tab-item:: vSphere

Not available.

.. _cluster-configuration-cache-stopped-nodes:

``provider.cache_stopped_nodes``
Expand Down Expand Up @@ -1005,6 +1102,37 @@ controlled by your cloud provider's configuration.

Not available.

.. tab-item:: vSphere

Not available.

.. _cluster-configuration-vsphere-config:

``provider.vsphere_config``
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. tab-set::

.. tab-item:: AWS

Not available.

.. tab-item:: Azure

Not available.

.. tab-item:: GCP

Not available.

.. tab-item:: vSphere

vSphere configuations used to connect vCenter Server. If not configured,
the VSPHERE_* environment variables will be used.

* **Required:** No
* **Importance:** Low
* **Type:** :ref:`vSphere Config <cluster-configuration-vsphere-config-type>`

.. _cluster-configuration-group-name:

Expand All @@ -1029,6 +1157,50 @@ The inbound rules associated with the security group.
* **Importance:** Medium
* **Type:** `IpPermission <https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_IpPermission.html>`_

.. _cluster-configuration-vsphere-credentials:

``vsphere_config.credentials``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The credential to connect to the vSphere vCenter Server.

* **Required:** No
* **Importance:** Low
* **Type:** :ref:`vSphere Credentials <cluster-configuration-vsphere-credentials-type>`

.. _cluster-configuration-vsphere-user:

``vsphere_config.credentials.user``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Username to connect to vCenter Server.

* **Required:** No
* **Importance:** Low
* **Type:** String

.. _cluster-configuration-vsphere-password:

``vsphere_config.credentials.password``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Password of the user to connect to vCenter Server.

* **Required:** No
* **Importance:** Low
* **Type:** String

.. _cluster-configuration-vsphere-server:

``vsphere_config.credentials.server``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The vSphere vCenter Server address.

* **Required:** No
* **Importance:** Low
* **Type:** String

.. _cluster-configuration-node-config:

``available_node_types.<node_type_name>.node_type.node_config``
Expand Down Expand Up @@ -1127,6 +1299,14 @@ A list of commands to run to set up worker nodes of this type. These commands wi
* **Importance:** High
* **Type:** Integer

.. tab-item:: vSphere

The number of CPUs made available by this node. If not configured, the nodes will use the same settings as the frozen VM.

* **Required:** No
* **Importance:** High
* **Type:** Integer


.. _cluster-configuration-gpu:

Expand Down Expand Up @@ -1193,6 +1373,14 @@ A list of commands to run to set up worker nodes of this type. These commands wi
* **Importance:** High
* **Type:** Integer

.. tab-item:: vSphere

The memory in bytes allocated for python worker heap memory on the node.
If not configured, the node will use the same memory settings as the frozen VM.

* **Required:** No
* **Importance:** High
* **Type:** Integer

.. _cluster-configuration-object-store-memory:

Expand Down Expand Up @@ -1225,6 +1413,14 @@ A list of commands to run to set up worker nodes of this type. These commands wi
* **Importance:** High
* **Type:** Integer

.. tab-item:: vSphere

The memory in bytes allocated for the object store on the node.

* **Required:** No
* **Importance:** High
* **Type:** Integer

.. _cluster-configuration-node-docker:

``available_node_types.<node_type_name>.docker``
Expand Down Expand Up @@ -1260,6 +1456,11 @@ Minimal configuration
.. literalinclude:: ../../../../../python/ray/autoscaler/gcp/example-minimal.yaml
:language: yaml

.. tab-item:: vSphere

.. literalinclude:: ../../../../../python/ray/autoscaler/vsphere/example-minimal.yaml
:language: yaml

Full configuration
~~~~~~~~~~~~~~~~~~

Expand All @@ -1280,6 +1481,11 @@ Full configuration
.. literalinclude:: ../../../../../python/ray/autoscaler/gcp/example-full.yaml
:language: yaml

.. tab-item:: vSphere

.. literalinclude:: ../../../../../python/ray/autoscaler/vsphere/example-full.yaml
:language: yaml

TPU Configuration
~~~~~~~~~~~~~~~~~

Expand Down

0 comments on commit 032f570

Please sign in to comment.