Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* ray integration * init securityContext * reduce cpu request * UPGRADE.md * replace Jupyter notebook with JupyterLab * update * add KFP to the figure * Disable istio sidecars for ray head in raycluster_example.yaml * Update README.md * Never ever use the default namespace * Update README.md * Disable the ray worker sidecar * Update kustomization.yaml * Create namespace.yaml * Update test.sh to use the right namespace * Update README.md * Update README.md for Kubeflow 1.7 --------- Co-authored-by: kaihsun <kaihsun@anyscale.com>
- Loading branch information
1 parent
182e81d
commit afc6a0e
Showing
12 changed files
with
36,055 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
name: Build & Apply Ray manifest in KinD | ||
on: | ||
pull_request: | ||
paths: | ||
- contrib/ray/** | ||
|
||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v3 | ||
|
||
- name: Install KinD | ||
run: ./tests/gh-actions/install_kind.sh | ||
|
||
- name: Create KinD Cluster | ||
run: kind create cluster --image=kindest/node:v1.23.0 | ||
|
||
- name: Install kustomize | ||
run: ./tests/gh-actions/install_kustomize.sh | ||
|
||
- name: Build & Apply manifests | ||
run: | | ||
cd contrib/ray/ | ||
make test |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
KUBERAY_RELEASE_VERSION ?= 0.4.0 | ||
KUBERAY_HELM_CHART_REPO ?= https://ray-project.github.io/kuberay-helm/ | ||
|
||
.PHONY: kuberay-operator/base | ||
kuberay-operator/base: | ||
mkdir -p kuberay-operator/base | ||
cd kuberay-operator/base && helm template --include-crds kuberay-operator kuberay-operator --version ${KUBERAY_RELEASE_VERSION} --repo ${KUBERAY_HELM_CHART_REPO} > resources.yaml | ||
|
||
.PHONY: test | ||
test: | ||
./test.sh | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
approvers: | ||
- juliusvonkohout | ||
reviewers: | ||
- juliusvonkohout | ||
- kimwnasptd |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
TODO | ||
- The ray dashboard, worker and head must only be available from inside your kubeflow user namespace | ||
- Reenable the istio sidecar for the ray head and worker in the user namespace and provide the corresponding Istio Authorizationpolicies. We can keep the istio sidecar for the deployment kuberay-operator in the namespace kubeflow, since it does NOT use a webhook, but something else to reconcile rayclusters. This means we also do not need a networkpolicy for the ray operator. | ||
|
||
|
||
> Credit: This manifest refers a lot to the engineering blog ["Building a Machine Learning Platform with Kubeflow and Ray on Google Kubernetes Engine"](https://cloud.google.com/blog/products/ai-machine-learning/build-a-ml-platform-with-kubeflow-and-ray-on-gke) from Google Cloud. | ||
# Ray | ||
[Ray](https://github.com/ray-project/ray) is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for simplifying ML compute. | ||
|
||
<figure> | ||
<img | ||
src="assets/map-of-ray.png" | ||
alt="Ray"> | ||
<figcaption>Stack of Ray libraries - unified toolkit for ML workloads. (ref: https://docs.ray.io/en/latest/ray-overview/index.html)</figcaption> | ||
</figure> | ||
|
||
# KubeRay | ||
[KubeRay](https://github.com/ray-project/kuberay) is an open-source Kubernetes operator for Ray. It provides several CRDs to simplify managing Ray clusters on Kubernetes. We will integrate Kubeflow and KubeRay in this document. | ||
|
||
# Requirements | ||
* Dependencies | ||
* `kustomize`: v3.2.0 (Kubeflow manifest is sensitive to `kustomize` version.) | ||
* `Kubernetes`: v1.23 | ||
|
||
* Computing resources: | ||
* 16GB RAM | ||
* 8 CPUs | ||
|
||
# Example | ||
<figure> | ||
<img | ||
src="assets/architecture.svg" | ||
alt="ray/kubeflow integration"> | ||
<figcaption>Note: (1) Kubeflow Central Dashboard will be renamed to workbench in the future. (2) Kubeflow Pipeline (KFP) is an important component of Kubeflow, but it is not included in this example.</figcaption> | ||
</figure> | ||
|
||
## Step 1: Install Kubeflow v1.7-branch | ||
* This example installs Kubeflow with the [v1.7-branch](https://github.com/kubeflow/manifests/tree/v1.7-branch). | ||
|
||
* Install all Kubeflow official components and all common services using [one command](https://github.com/kubeflow/manifests/tree/v1.7-branch#install-with-a-single-command). | ||
* If you do not want to install all components, you can comment out **KNative**, **Katib**, **Tensorboards Controller**, **Tensorboard Web App**, **Training Operator**, and **KServe** from [example/kustomization.yaml](https://github.com/kubeflow/manifests/blob/v1.7-branch/example/kustomization.yaml). | ||
|
||
## Step 2: Install KubeRay operator | ||
|
||
We never ever break Kubernetes standards and do not use the "default" namespace, but a proper one, in our case "kubeflow" for the ray operator. | ||
|
||
```sh | ||
# Install a KubeRay operator and custom resource definitions. | ||
kustomize build kuberay-operator/base | kubectl apply --server-side -f - | ||
|
||
# Check KubeRay operator | ||
kubectl get pod -l app.kubernetes.io/component=kuberay-operator | ||
# NAME READY STATUS RESTARTS AGE | ||
# kuberay-operator-5b8cd69758-rkpvh 1/1 Running 0 6m23s | ||
``` | ||
|
||
## Step 3: Install RayCluster | ||
```sh | ||
# Create a RayCluster CR, and the KubeRay operator will reconcile a Ray cluster | ||
# with 1 head Pod and 1 worker Pod. | ||
# $MY_KUBEFLOW_USER_NAMESPACE is a proper Kubeflow user namespace with istio sidecar injection and never ever the wrong "default" | ||
export MY_KUBEFLOW_USER_NAMESPACE=development | ||
kubectl apply -f $raycluster_example.yaml -n $MY_KUBEFLOW_USER_NAMESPACE | ||
|
||
# Check RayCluster | ||
kubectl get pod -l ray.io/cluster=kubeflow-raycluster -n $MY_KUBEFLOW_USER_NAMESPACE | ||
# NAME READY STATUS RESTARTS AGE | ||
# kubeflow-raycluster-head-p6dpk 1/1 Running 0 70s | ||
# kubeflow-raycluster-worker-small-group-l7j6c 1/1 Running 0 70s | ||
``` | ||
* `raycluster_example.yaml` uses `rayproject/ray:2.2.0-py38-cpu` as its OCI image. Ray is very sensitive to the Python versions and Ray versions between the server (RayCluster) and client (JupyterLab) sides. This image uses: | ||
* Python 3.8.13 | ||
* Ray 2.2.0 | ||
|
||
## Step 4: Forward the port of Istio's Ingress-Gateway | ||
* Follow the [instructions](https://github.com/kubeflow/manifests/tree/v1.7-branch#port-forward) to forward the port of Istio's Ingress-Gateway and log in to Kubeflow Central Dashboard. | ||
|
||
## Step 5: Create a JupyterLab via Kubeflow Central Dashboard | ||
* Click "Notebooks" icon in the left panel. | ||
* Click "New Notebook" | ||
* Select `kubeflownotebookswg/jupyter-scipy:v1.7.0` as OCI image. | ||
* Click "Launch" | ||
* Click "CONNECT" to connect into the JupyterLab instance. | ||
|
||
## Step 6: Use Ray client in the JupyterLab to connect to the RayCluster | ||
* As I mentioned in Step 3, Ray is very sensitive to the Python versions and Ray versions between the server (RayCluster) and client (JupyterLab) sides. | ||
```sh | ||
# Check Python version. The version's MAJOR and MINOR should match with RayCluster (i.e. Python 3.8) | ||
python --version | ||
# Python 3.8.10 | ||
|
||
# Install Ray 2.2.0 | ||
pip install -U ray[default]==2.2.0 | ||
``` | ||
* Connect to RayCluster via Ray client. | ||
```python | ||
# Open a new .ipynb page. | ||
|
||
import ray | ||
# For other namespaces use ray://${RAYCLUSTER_HEAD_SVC}.${NAMESPACE}.svc.cluster.local:${RAY_CLIENT_PORT} | ||
# But we use of course our per namespace ray cluster to have multi-tenancy and | ||
# We never ever use "default" as namespace since this would violate Kubernetes standards | ||
ray.init(address="ray://kubeflow-raycluster-head-svc:10001") | ||
print(ray.cluster_resources()) | ||
# {'node:10.244.0.41': 1.0, 'memory': 3000000000.0, 'node:10.244.0.40': 1.0, 'object_store_memory': 805386239.0, 'CPU': 2.0} | ||
|
||
# Try Ray task | ||
@ray.remote | ||
def f(x): | ||
return x * x | ||
|
||
futures = [f.remote(i) for i in range(4)] | ||
print(ray.get(futures)) # [0, 1, 4, 9] | ||
|
||
# Try Ray actor | ||
@ray.remote | ||
class Counter(object): | ||
def __init__(self): | ||
self.n = 0 | ||
|
||
def increment(self): | ||
self.n += 1 | ||
|
||
def read(self): | ||
return self.n | ||
|
||
counters = [Counter.remote() for i in range(4)] | ||
[c.increment.remote() for c in counters] | ||
futures = [c.read.remote() for c in counters] | ||
print(ray.get(futures)) # [1, 1, 1, 1] | ||
``` | ||
|
||
# Upgrading | ||
See [UPGRADE.md](UPGRADE.md) for more details. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Upgrading | ||
```sh | ||
# Step 1: Update KUBERAY_RELEASE_VERSION in Makefile | ||
# Step 2: Create new KubeRay operator manifest | ||
make kuberay-operator/base | ||
``` |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
patches: | ||
# Add securityContext to KubeRay operator Pod. | ||
- target: | ||
kind: Deployment | ||
labelSelector: "app.kubernetes.io/name=kuberay-operator" | ||
patch: |- | ||
- op: add | ||
path: /spec/template/spec/containers/0/securityContext | ||
value: | ||
runAsUser: 1000 | ||
allowPrivilegeEscalation: false | ||
capabilities: | ||
drop: ["ALL"] | ||
runAsNonRoot: true | ||
seccompProfile: | ||
type: RuntimeDefault | ||
namespace: kubeflow | ||
resources: | ||
- namespace.yaml | ||
- resources.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
apiVersion: v1 | ||
kind: Namespace | ||
metadata: | ||
name: kubeflow |
Oops, something went wrong.