Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect Ray client with TLS using Nginx Ingress on Kind cluster (#729) #1051

Merged

Conversation

tedhtchang
Copy link
Contributor

@tedhtchang tedhtchang commented Apr 25, 2023

Why are these changes needed?

Instructions to connect Ray client in a Kind cluster via Nginx Ingress controller

Related issue number

Closes #729

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Tested with:
Ubuntu 22.04.1 LTS
kind v0.18.0 go1.20.2 linux/amd64

Setup:

# Create Kind cluster fist w/ extra arg for enabling ingress
cat <<EOF | kind create cluster --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true"
  extraPortMappings:
  - containerPort: 80
    hostPort: 80
    protocol: TCP
  - containerPort: 443
    hostPort: 443
    protocol: TCP
EOF
# Deploy Ingress and add extra arg to enable ssl passthrough
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
kubectl patch deploy --type json --patch '[{"op":"add","path": "/spec/template/spec/containers/0/args/-","value":"--enable-ssl-passthrough"}]' ingress-nginx-controller -n ingress-nginx
#Verify log has Starting TLS proxy for SSL Passthrough
kubectl logs deploy/ingress-nginx-controller -n ingress-nginx
# Deploy KubeRay
export KUBERAY_VERSION=v0.5.0
kubectl create -k "github.com/ray-project/kuberay/ray-operator/config/default?ref=${KUBERAY_VERSION}&timeout=90s"
# Create tls cluster
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-cluster.tls.yaml
# Create ingress:
cat << EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rayclient-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/ssl-passthrough: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
spec:
  rules:
    - host: "localhost"
      http:
        paths:
        - path: "/"
          pathType: Prefix
          backend:
            service:
              name: raycluster-tls-head-svc
              port:
                number: 10001
EOF
# Download ca key pair and create cert signing request (CSR)
kubectl get secret ca-tls -o template='{{index .data "ca.key"}}'|base64 -d > ./ca.key
kubectl get secret ca-tls -o template='{{index .data "ca.crt"}}'|base64 -d > ./ca.crt
openssl req -nodes -newkey rsa:2048 -keyout ./tls.key -out ./tls.csr -subj '/CN=local'
cat <<EOF >./cert.conf
authorityKeyIdentifier=keyid,issuer
basicConstraints=CA:FALSE
subjectAltName = @alt_names
[alt_names]
DNS.1 = localhost
IP.1 = 127.0.0.1
EOF
# Sign and create cert
openssl x509 -req -CA ./ca.crt -CAkey ./ca.key -in ./tls.csr -out ./tls.crt -days 365 -CAcreateserial -extfile ./cert.conf

# Connect Ray client in the cluster using the tls keypair and the ca cert 
python -c '
import os
import ray
os.environ["RAY_USE_TLS"] = "1"
os.environ["RAY_TLS_SERVER_CERT"] = os.path.join("./", "tls.crt")
os.environ["RAY_TLS_SERVER_KEY"] = os.path.join("./", "tls.key")
os.environ["RAY_TLS_CA_CERT"] = os.path.join("./", "ca.crt")
ray.init(address="ray://localhost", logging_level="DEBUG")'

Verify output similar to:

2023-04-25 16:33:32,452	INFO client_builder.py:253 -- Passing the following kwargs to ray.init() on the server: logging_level
2023-04-25 16:33:32,460	DEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.IDLE
2023-04-25 16:33:32,664	DEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.CONNECTING
2023-04-25 16:33:32,671	DEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.READY

@tedhtchang
Copy link
Contributor Author

Hi @kevin85421 Let me know if the instruction work for you.

@kevin85421
Copy link
Member

Hi @kevin85421 Let me know if the instruction work for you.

I did not manually try these instructions, but the structure looks good to me. Some suggestions:

  1. Add a link to the operator installation document (example) instead of providing instructions in the document. In addition, I recommend using Helm for the installation method in the document since different versions of Kustomize require different installation commands for the KubeRay operator. Using Helm can help avoid issues related to Kustomize versions for users.
  2. Describe the details of Kubernetes ingress (e.g. annotations)
  3. Add "Warning: Ray client has some known limitations and is not actively maintained." to make users know the limitations (example).

@tedhtchang tedhtchang force-pushed the Ray-client-via-ingress-on-kind branch 4 times, most recently from 2712459 to 514a1ce Compare May 3, 2023 00:13
@tedhtchang
Copy link
Contributor Author

@kevin85421 Please take a look.

@kevin85421
Copy link
Member

cc @jasoonn can you take a look at this PR?

@jasoonn
Copy link
Contributor

jasoonn commented May 6, 2023

LGTM. I have tried the instructions on Debian GNU/Linux 11 with kind version 0.17.0. A small suggestion is to remove the specific versions of the environments in the doc's requirements.

@kevin85421 kevin85421 self-requested a review May 8, 2023 06:07
@kevin85421 kevin85421 self-assigned this May 8, 2023
@tedhtchang tedhtchang force-pushed the Ray-client-via-ingress-on-kind branch 2 times, most recently from 75bc803 to 2e0a3e0 Compare May 8, 2023 06:52
@tedhtchang
Copy link
Contributor Author

@jasoonn I removed the versions from the requirement. Let me know if this is what you looking for.

docs/guidance/rayclient-nginx-ingress.md Outdated Show resolved Hide resolved
docs/guidance/rayclient-nginx-ingress.md Show resolved Hide resolved
docs/guidance/rayclient-nginx-ingress.md Outdated Show resolved Hide resolved

The output should be similar to:
```
2023-04-25 16:33:32,452 INFO client_builder.py:253 -- Passing the following kwargs to ray.init() on the server: logging_level
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot reproduce this.

# test.py
import os
import ray
os.environ["RAY_USE_TLS"] = "1"
os.environ["RAY_TLS_SERVER_CERT"] = os.path.join("./", "tls.crt")
os.environ["RAY_TLS_SERVER_KEY"] = os.path.join("./", "tls.key")
os.environ["RAY_TLS_CA_CERT"] = os.path.join("./", "ca.crt")

ray.init(address="ray://localhost", logging_level="DEBUG")

print(ray.cluster_resources())

Screen Shot 2023-05-08 at 2 07 04 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used rayproject/ray:2.4.0-py310 in the TLS RayCluster.

In addition, my local environment is Python 3.10.10 and Ray 2.4.0.
Screen Shot 2023-05-08 at 2 10 17 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to match your environment except the python version. Seems to work fine. Is there any useful stack trace after idle for some time?
image

@kevin85421
Copy link
Member

By the way, could you check the format of this doc on GitHub pages? You can follow this instruction to deploy the doc webpage locally: https://github.com/ray-project/kuberay/blob/master/docs/development/development.md#deploying-documentation-locally

@tedhtchang tedhtchang force-pushed the Ray-client-via-ingress-on-kind branch from 2e0a3e0 to eaf060a Compare May 9, 2023 13:15
@tedhtchang
Copy link
Contributor Author

By the way, could you check the format of this doc on GitHub pages? You can follow this instruction to deploy the doc webpage locally: https://github.com/ray-project/kuberay/blob/master/docs/development/development.md#deploying-documentation-locally

Some URLs return 404 - Not found on my local GitHub page. For example:

[document](../../helm-chart/kuberay-operator/README.md)
[rayclient-ingress](../../ray-operator/config/samples/ingress-rayclient-tls.yaml)

@kevin85421
Copy link
Member

cc @Yicheng-Lu-llll could you review this PR? If you can reproduce this PR, I will merge it.

@Yicheng-Lu-llll
Copy link
Contributor

LGTM! I was able to follow the documentation and reproduce everything as expected.

One small detail, though not directly related to the document, is that I observed the Python version for rayproject/ray:2.4.0 to be 3.7.15. Consequently, I initially encountered an error since version dismatch:


root@ip-172-31-8-217:/home/ubuntu/workspace# python3 -c '
import os
import ray
os.environ["RAY_USE_TLS"] = "1"
os.environ["RAY_TLS_SERVER_CERT"] = os.path.join("./", "tls.crt")
os.environ["RAY_TLS_SERVER_KEY"] = os.path.join("./", "tls.key")
os.environ["RAY_TLS_CA_CERT"] = os.path.join("./", "ca.crt")
ray.init(address="ray://localhost", logging_level="DEBUG")'
2023-05-23 15:57:09,451 INFO client_builder.py:252 -- Passing the following kwargs to ray.init() on the server: logging_level
2023-05-23 15:57:09,537 DEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.IDLE
2023-05-23 15:57:09,742 DEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.CONNECTING
2023-05-23 15:57:09,874 DEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.READY
2023-05-23 15:57:09,874 DEBUG worker.py:813 -- Pinging server.
2023-05-23 15:57:11,984 DEBUG dataclient.py:287 -- Got unawaited response connection_cleanup {
}

2023-05-23 15:57:13,052 DEBUG dataclient.py:278 -- Shutting down data channel.
Traceback (most recent call last):
  File "<string>", line 8, in <module>
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 1339, in init
    ctx = builder.connect()
  File "/usr/local/lib/python3.10/dist-packages/ray/client_builder.py", line 182, in connect
    client_info_dict = ray.util.client_connect.connect(
  File "/usr/local/lib/python3.10/dist-packages/ray/util/client_connect.py", line 57, in connect
    conn = ray.connect(
  File "/usr/local/lib/python3.10/dist-packages/ray/util/client/__init__.py", line 252, in connect
    conn = self.get_context().connect(*args, **kw_args)
  File "/usr/local/lib/python3.10/dist-packages/ray/util/client/__init__.py", line 104, in connect
    self._check_versions(conn_info, ignore_version)
  File "/usr/local/lib/python3.10/dist-packages/ray/util/client/__init__.py", line 135, in _check_versions
    raise RuntimeError(msg)
RuntimeError: Python minor versions differ between client and server: client is 3.10.6, server is 3.7.15

After aligning the Python versions, everything functioned perfectly.

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have not been able to successfully reproduce this PR, but two other reviewers have been able to reproduce it successfully. It is possible that the issue is related to my local environment.

@kevin85421 kevin85421 merged commit f52b8bc into ray-project:master May 23, 2023
19 checks passed
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
…project#1051)

Connect Ray client with TLS using Nginx Ingress on Kind cluster
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug?][Documentation] nginx Ingress for client times out
4 participants