Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Test sample RayService YAML to catch invalid or out of date one #731

Merged
merged 14 commits into from
Nov 19, 2022

Conversation

jasoonn
Copy link
Contributor

@jasoonn jasoonn commented Nov 16, 2022

Why are these changes needed?

Use #605 to test sample RayService YAML.
The test includes the following two checking rules.

  1. It is able to submit a simple job to the created raycluster.
  2. It is able to use curl in the pod to access the service.

Why we choose Subprocess instead of os.system

According to doc of os.system, the subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using subprocess is preferable to using this function.

  • shell=True in subprocess.run(): The specified command will be executed through the shell.
  • check=True in some of subprocess.run(): If the process exits with a non-zero exit code, a CalledProcessError exception will be raised.

Why removing the kubectl wait when creating kind cluster

def create_kind_cluster():
"""Create a KinD cluster"""
shell_subprocess_run("kind create cluster")
shell_subprocess_run(
"kubectl wait --for=condition=ready pod -n kube-system --all --timeout=900s")

Sometimes when calling "kubectl wait --for=condition=ready pod -n kube-system --all --timeout=900s", it will report an error error: no matching resources found(example log in line 1157), which I think because the namespace has not yet been created. When this situation happened, the test in the past still passed because we only use os.system() to execute the command and do not check the return code. In this pr, we updated os.system to subprocess.run(..., check=True) which will automatically raise an error when the return code of the exection is not 0. To solve this, we utilize --wait flag in kind to substitute the kubectl wait functionality.

Why we set tail=-1 in getting log?

shell_subprocess_run(f'kubectl logs -n={cr_namespace} -l ray.io/node-type=head --tail=-1')

The argument tail default is -1 with no selector, showing all log lines. However, we utilize a selector in this case. Hence, we need to specify --tail=-1 to display all of the log.

Related issue number

Closes #719

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(
cd tests/framework
python3 test_sample_rayservice_yamls.py 2>&1 | tee log
# I have run more than ten times on my cluster, and the result is always pass.

image

@kevin85421 kevin85421 self-requested a review November 16, 2022 04:59
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @jasoonn for your contribution!

.github/workflows/actions/configuration/action.yaml Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
@jasoonn
Copy link
Contributor Author

jasoonn commented Nov 16, 2022

@kevin85421 Thanks for the suggestions. I have updated the pr.

@jasoonn jasoonn changed the title Ray service test [Feature] Test sample RayServiceYAML to catch invalid or out of date one Nov 16, 2022
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
@kevin85421
Copy link
Member

@shrekris-anyscale Can you help review the wait function of RayServiceAddCREvent? The function waits until RayService is ready to serve traffic. Thank you!

@jasoonn
Copy link
Contributor Author

jasoonn commented Nov 17, 2022

@kevin85421 Thanks for the suggestions. I have updated the pr.

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are too many shell = ..., check = .... Can we simplify it?

tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
@kevin85421 kevin85421 changed the title [Feature] Test sample RayServiceYAML to catch invalid or out of date one [Feature] Test sample RayService YAML to catch invalid or out of date one Nov 17, 2022
@jasoonn
Copy link
Contributor Author

jasoonn commented Nov 18, 2022

@kevin85421 Thanks for the suggestion. I have updated the pr.

@jasoonn
Copy link
Contributor Author

jasoonn commented Nov 18, 2022

@kevin85421 Thanks for the suggestion. I have updated the pr.

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevin85421 kevin85421 merged commit 65a7703 into ray-project:master Nov 19, 2022
@sihanwang41
Copy link
Contributor

Hi @kevin85421 @jasoonn , is this test enabled in ci? (I don't see it is enabled.)

@kevin85421
Copy link
Member

Hi @kevin85421 @jasoonn , is this test enabled in ci? (I don't see it is enabled.)

@sihanwang41 No, KubeRay CI (2 CPUs) does not have enough CPUs to run this test. I will deploy these tests in Ray CI (#695).

lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
… one (ray-project#731)

Use ray-project#605 to test sample RayService YAML.
The test includes the following two checking rules.

1. It is able to [submit a simple job](https://github.com/jasoonn/kuberay/blob/69efe590970f9d8a4fe0fff66300049f008c6ae9/tests/framework/prototype.py#L188-L197) to the created raycluster.

2. It is able to use [curl in the pod](https://github.com/jasoonn/kuberay/blob/69efe590970f9d8a4fe0fff66300049f008c6ae9/tests/framework/prototype.py#L199-L216) to access the service.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Test sample RayService YAMLs to catch invalid or out of date ones
3 participants