Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s][docs] Add feature cards for users and admins #3582

Merged
merged 8 commits into from
May 27, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/_gallery_original/index.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _ai-gallery:

AI Gallery
====================

Expand Down
81 changes: 67 additions & 14 deletions docs/source/reference/kubernetes/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,70 @@
Running on Kubernetes
=============================

.. note::
Kubernetes support is under active development. `Please share your feedback <https://forms.gle/KmAtyNhEysiw2ZCR7>`_
or `directly reach out to the development team <http://slack.skypilot.co>`_
for feature requests and more.

SkyPilot tasks can be run on your private on-prem or cloud Kubernetes clusters.
The Kubernetes cluster gets added to the list of "clouds" in SkyPilot and SkyPilot
tasks can be submitted to your Kubernetes cluster just like any other cloud provider.

**Benefits of using SkyPilot to run jobs on your Kubernetes cluster:**
Why use SkyPilot on Kubernetes?
-------------------------------

* Get SkyPilot features (setup management, job execution, queuing, logging, SSH access) on your Kubernetes resources
* Replace complex Kubernetes manifests with simple SkyPilot tasks
* Seamlessly "burst" jobs to the cloud if your Kubernetes cluster is congested
* Retain observability and control over your cluster with your existing Kubernetes tools
.. tab-set::

**Supported Kubernetes deployments:**
.. tab-item:: For AI Developers
:sync: why-ai-devs-tab

* Hosted Kubernetes services (EKS, GKE)
* On-prem clusters (Kubeadm, Rancher)
* Local development clusters (KinD, minikube)
.. grid:: 2
:gutter: 3

.. grid-item-card:: ✅ Ease of Use
:text-align: center

No complex kubernetes manifests - write a simple SkyPilot YAML and run ``sky launch``.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link for an example comparing the Kubernetes manifests vs the SkyPilot YAML?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that requires a little more work since we need to work on compressing our existing YAMLs (in terms of lines of code #3594) to get a good comparison against kubernetes manifests. E.g., k8s vllm gemma manifest is 65 lines, while SkyPilot gemma is 38 lines. We should do it as a part of the K8s blog post, marked it as a TODO for now.

.. grid-item-card:: 📋 Interactive development on Kubernetes
:text-align: center

:ref:`SSH access to pods <dev-ssh>`, :ref:`VSCode integration <dev-vscode>`, :ref:`job management <managed-jobs>`, :ref:`model serving <sky-serve>`, :ref:`autodown idle pods <auto-stop>` and more.
romilbhardwaj marked this conversation as resolved.
Show resolved Hide resolved

.. grid-item-card:: ☁️ Burst to the cloud
:text-align: center

Kubernetes cluster is full? SkyPilot seamlessly gets resources on the cloud to get your job running sooner.
romilbhardwaj marked this conversation as resolved.
Show resolved Hide resolved

.. grid-item-card:: 🖼 Run popular models on Kubernetes
:text-align: center

Train and serve `Llama-3 <https://skypilot.readthedocs.io/en/latest/gallery/llms/llama-3.html>`_, `Mixtral <https://skypilot.readthedocs.io/en/latest/gallery/llms/mixtral.html>`_, and more on your Kubernetes with ready-to-use recipes from the :ref:`AI gallery <ai-gallery>`.


.. tab-item:: For Infrastructure Admins
:sync: why-admins-tab

.. grid:: 2
:gutter: 3

.. grid-item-card:: ☁️ Burst to the cloud
:text-align: center

Scale out to capacity across clouds and regions without requiring manual intervention.
romilbhardwaj marked this conversation as resolved.
Show resolved Hide resolved

.. grid-item-card:: 🚯️ Minimize resource wastage
:text-align: center

SkyPilot can automatically terminate idle pods to free up resources for other users.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention that it respects user's existing kubernetes scheduling config and additionally do autodown?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased as SkyPilot can run with your custom pod scheduler and automatically terminate idle pods to free up resources for other users. wdyt?


.. grid-item-card:: 👀 Observability
:text-align: center

Use your existing tools, such as the :ref:`Kubernetes Dashboard <kubernetes-observability>`, to monitor SkyPilot pods.
romilbhardwaj marked this conversation as resolved.
Show resolved Hide resolved

.. grid-item-card:: 🍽️ Self-serve infra for your teams
:text-align: center

..
This point should maybe talk about quotas + sharing through kueue/native k8s quotas.

Reduce operational overhead by letting your teams provision their own resources on Kubernetes, while you retain control over the cluster.


Kubernetes Cluster Requirements
Expand All @@ -34,6 +77,12 @@ To connect and use a Kubernetes cluster, SkyPilot needs:
* An existing Kubernetes cluster running Kubernetes v1.20 or later.
* A `Kubeconfig <https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/>`_ file containing access credentials and namespace to be used.

**Supported Kubernetes deployments:**

* Hosted Kubernetes services (EKS, GKE)
* On-prem clusters (Kubeadm, Rancher, K3s)
* Local development clusters (KinD, minikube)
romilbhardwaj marked this conversation as resolved.
Show resolved Hide resolved

In a typical workflow:

1. A cluster administrator sets up a Kubernetes cluster. Detailed admin guides for
Expand Down Expand Up @@ -218,6 +267,10 @@ FAQs

For isolation, you can create separate Kubernetes namespaces and set them in the kubeconfig distributed to users. SkyPilot will use the namespace set in the kubeconfig for running all tasks.

* **How do I view the pods created by SkyPilot on my Kubernetes cluster?**

You can use your existing observability tools to filter resources with the label :code:`parent=skypilot`. As an example, follow the instructions :ref:`here <kubernetes-observability>` to deploy the Kubernetes Dashboard on your cluster.
romilbhardwaj marked this conversation as resolved.
Show resolved Hide resolved

* **How can I specify custom configuration for the pods created by SkyPilot?**

You can override the pod configuration used by SkyPilot by setting the :code:`pod_config` key in :code:`~/.sky/config.yaml`.
Expand Down
Loading