Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2e test in k8s cluster and Namespace option #1774

Merged
merged 10 commits into from
Mar 15, 2023

Conversation

nagar-ajay
Copy link
Contributor

@nagar-ajay nagar-ajay commented Mar 14, 2023

What this PR does / why we need it:

  • Currently, the JOB_NAMESPACE is hardcoded in e2e test files. This limits a user to run these tests in the default namespace. A user might want to run these tests in a different namespace instead of the default namespace. One use case is using these tests in Kubeflow mode for training-operator conformance tests (or some other testing). As per the design doc of the conformance test, it seems we want to run these tests in kf-conformance namespace.

  • Currently, the training client initialization code looks for Kube config which may not be present if these tests are running inside k8s cluster. Fix training client initialization code to run inside and outside of k8s cluster.

  • Created conftest.py for pytest fixtures. Make job namespace a fixture. After this PR, users can pass the --namespace option from the command line to run e2e tests in a specific namespace.

  • As part of hack/python-sdk/gen-sdk.sh all the python files under the test directory gets deleted. We need to delete only autogenerated test files, not every file. Updated the code in the gen-sdk.sh file to delete python files that start with test_.

  • Testing: Tested modified tests on the NKE cluster from local.

    • without any namespace option
(kserve-env) ajaynagar@LJND24D43D python % pytest test/e2e -v
================================================================ test session starts =================================================================
platform darwin -- Python 3.9.6, pytest-7.2.1, pluggy-1.0.0 -- /Users/ajaynagar/venv/kserve-env/bin/python3
cachedir: .pytest_cache
rootdir: /Users/ajaynagar/work/training-operator/sdk/python
collected 12 items

test/e2e/test_e2e_mpijob.py::test_sdk_e2e_with_gang_scheduling PASSED                                                                          [  8%]
test/e2e/test_e2e_mpijob.py::test_sdk_e2e PASSED                                                                                               [ 16%]
test/e2e/test_e2e_mxjob.py::test_sdk_e2e_with_gang_scheduling PASSED                                                                           [ 25%]
test/e2e/test_e2e_mxjob.py::test_sdk_e2e PASSED                                                                                                [ 33%]
test/e2e/test_e2e_paddlejob.py::test_sdk_e2e_with_gang_scheduling PASSED                                                                       [ 41%]
test/e2e/test_e2e_paddlejob.py::test_sdk_e2e PASSED                                                                                            [ 50%]
test/e2e/test_e2e_pytorchjob.py::test_sdk_e2e_with_gang_scheduling PASSED                                                                      [ 58%]
test/e2e/test_e2e_pytorchjob.py::test_sdk_e2e PASSED                                                                                           [ 66%]
test/e2e/test_e2e_tfjob.py::test_sdk_e2e_with_gang_scheduling PASSED                                                                           [ 75%]
test/e2e/test_e2e_tfjob.py::test_sdk_e2e PASSED                                                                                                [ 83%]
test/e2e/test_e2e_xgboostjob.py::test_sdk_e2e_with_gang_scheduling PASSED                                                                      [ 91%]
test/e2e/test_e2e_xgboostjob.py::test_sdk_e2e PASSED                                                                                           [100%]

========================================================== 12 passed in 1094.55s (0:18:14) ===========================================================
  • with the given namespace option
(kserve-env) ajaynagar@LJND24D43D python % pytest test/e2e -v --namespace=kf-conformance-test
================================================================ test session starts =================================================================
platform darwin -- Python 3.9.6, pytest-7.2.1, pluggy-1.0.0 -- /Users/ajaynagar/venv/kserve-env/bin/python3
cachedir: .pytest_cache
rootdir: /Users/ajaynagar/work/training-operator/sdk/python
collected 12 items

test/e2e/test_e2e_mpijob.py::test_sdk_e2e_with_gang_scheduling PASSED                                                                          [  8%]
test/e2e/test_e2e_mpijob.py::test_sdk_e2e PASSED                                                                                               [ 16%]
test/e2e/test_e2e_mxjob.py::test_sdk_e2e_with_gang_scheduling PASSED                                                                           [ 25%]
test/e2e/test_e2e_mxjob.py::test_sdk_e2e PASSED                                                                                                [ 33%]
test/e2e/test_e2e_paddlejob.py::test_sdk_e2e_with_gang_scheduling PASSED                                                                       [ 41%]
test/e2e/test_e2e_paddlejob.py::test_sdk_e2e PASSED                                                                                            [ 50%]
test/e2e/test_e2e_pytorchjob.py::test_sdk_e2e_with_gang_scheduling PASSED                                                                      [ 58%]
test/e2e/test_e2e_pytorchjob.py::test_sdk_e2e PASSED                                                                                           [ 66%]
test/e2e/test_e2e_tfjob.py::test_sdk_e2e_with_gang_scheduling PASSED                                                                           [ 75%]
test/e2e/test_e2e_tfjob.py::test_sdk_e2e PASSED                                                                                                [ 83%]
test/e2e/test_e2e_xgboostjob.py::test_sdk_e2e_with_gang_scheduling PASSED                                                                      [ 91%]
test/e2e/test_e2e_xgboostjob.py::test_sdk_e2e PASSED                                                                                           [100%]

========================================================== 12 passed in 1052.87s (0:17:32) ===========================================================

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

  • Docs included if any changes are user facing

@google-cla
Copy link

google-cla bot commented Mar 14, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@nagar-ajay nagar-ajay changed the title [WIP] - E2e test in k8s cluster E2e test in k8s cluster and Namespace option Mar 14, 2023
@coveralls
Copy link

coveralls commented Mar 14, 2023

Pull Request Test Coverage Report for Build 4424758153

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 7 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.09%) to 39.57%

Files with Coverage Reduction New Missed Lines %
pkg/controller.v1/mpi/mpijob_controller.go 7 76.97%
Totals Coverage Status
Change from base Build 4414964912: 0.09%
Covered Lines: 2741
Relevant Lines: 6927

💛 - Coveralls

@johnugeorge
Copy link
Member

@nagar-ajay Can you do a rebase as #1775 is merged?

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagar-ajay Thank you for creating this PR! I left a few comments.

sdk/python/test/e2e/conftest.py Outdated Show resolved Hide resolved
@nagar-ajay nagar-ajay changed the title E2e test in k8s cluster and Namespace option [WIP] - E2e test in k8s cluster and Namespace option Mar 15, 2023
@nagar-ajay nagar-ajay changed the title [WIP] - E2e test in k8s cluster and Namespace option E2e test in k8s cluster and Namespace option Mar 15, 2023
@nagar-ajay nagar-ajay requested review from tenzen-y and removed request for kuizhiqing March 15, 2023 09:05
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
/lgtm

/assign @johnugeorge

@johnugeorge
Copy link
Member

Thanks @nagar-ajay

/lgtm
/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge, nagar-ajay

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit d52e2b0 into kubeflow:master Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants