Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Improve the observability of integration tests #775

Merged
merged 8 commits into from
Dec 3, 2022

Conversation

jasoonn
Copy link
Contributor

@jasoonn jasoonn commented Dec 1, 2022

Why are these changes needed?

Add app.kubernetes.io/component: kuberay-operator label in helm-chart/kuberay-operator/templates/deployment.yaml to make operator pod can be selected in show_cluster_info().

Why not call show_cluster_info() in tearDown()?

The only way I found to check if the test pass in teardown() is to access the private field self._outcome in unittest.TestCase object(reference). The fields in self._outcome changes in Python 3.11. It seems like accessing self._outcome in unittest.TestCase is not a robust approach.

Why added hard code label app.kubernetes.io/component=kuberay-operator?

Because all of the labels app.kubernetes.io/component: for the operator is hard code with value kuberay-operato, I added app.kubernetes.io/component: kuberay-operator to the helm chart to make show_cluster_info able to get logs from the operator pod.

Related issue number

Closes #766

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution!

tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
tests/framework/prototype.py Outdated Show resolved Hide resolved
@jasoonn
Copy link
Contributor Author

jasoonn commented Dec 2, 2022

@kevin85421 @DmitriGekhtman Thanks for the suggestion. I have updated my pr.

@DmitriGekhtman
Copy link
Collaborator

Was there a Git issue? I think we lost most of the changes (I see Files changed 1)

@jasoonn
Copy link
Contributor Author

jasoonn commented Dec 2, 2022

@DmitriGekhtman Because the label utilized in the original show_cluster_info() is app.kubernetes.io/component: kuberay-operator, we only need to add that label in the helm chart. Besides, I changed the shell_subprocess_run() back.

@jasoonn
Copy link
Contributor Author

jasoonn commented Dec 2, 2022

To improve the observability of integration tests, I called show_cluster_info("default") in several places in tests/compatibility-test.py. The places I choose are where the most CI tests failed in: nonzero return in the execution of script files. e.g. test_ray_serve and test_detach_actor.

@DmitriGekhtman
Copy link
Collaborator

LGTM, I'll merge this after the CI finishes running.
This will be useful to have in the release branch.

@@ -198,6 +193,7 @@ def test_ray_serve(self):
exit_code, _ = utils.exec_run_container(container, f'python3 /usr/local/test_ray_serve_1.py {ray_namespace}', timeout_sec = 180)

if exit_code != 0:
show_cluster_info("default")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that we hard-coded a lot of "default" in both test_ray_serve and test_detached_actor. How about creating a local variable in each test to replace this hard-coded method?

For example,

utils.get_pod(namespace='default', label_selector='ray.io/node-type=head')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated in d5c8fd6 and c0d8ef6. To be noted, these two functions will utilize the same raycluster which is created in setUpClass(). If we want to separate the namespace in the future, we will need to move the creation of raycluster into corresponding functions.

'kubectl logs $(kubectl get pods | grep -e "-head" | awk "{print \$1}")')
shell_subprocess_run(
'kubectl logs -n $(kubectl get pods -A | grep -e "-operator" | awk \'{print $1 " " $2}\')')
show_cluster_info("default")
assert rtn == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the assert and raise an exception in the if statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 7d640b0.

@jasoonn
Copy link
Contributor Author

jasoonn commented Dec 2, 2022

@kevin85421 Thanks for the suggestion. I have updated the pr.


def test_ray_serve(self):
cluster_name_space = "default"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster_namespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update in bc948f2.

@jasoonn
Copy link
Contributor Author

jasoonn commented Dec 3, 2022

@kevin85421 Thanks for the suggestion. I have updated the pr.

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Wait for CI to pass.

@DmitriGekhtman DmitriGekhtman merged commit c6df15e into ray-project:master Dec 3, 2022
DmitriGekhtman pushed a commit to DmitriGekhtman/kuberay that referenced this pull request Dec 6, 2022
…#775)

Show debug information in failed test cases.
Make operator label consistent across Kustomize and Helm deployment methods.
DmitriGekhtman added a commit that referenced this pull request Dec 6, 2022
* [Feature] Improve the observability of integration tests (#775)

Show debug information in failed test cases.
Make operator label consistent across Kustomize and Helm deployment methods.
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
…#775)

Show debug information in failed test cases.
Make operator label consistent across Kustomize and Helm deployment methods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Improve the observability of integration tests
3 participants