Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run C++ tests on CI with run_test.py #99956

Closed
wants to merge 40 commits into from
Closed

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Apr 25, 2023

After #99559, we can now run C++ test with run_test.py. Although advance features such as --import-slow-tests and --import-disabled-tests won't work for now, there will still be a gain in reliability and performance as C++ can now be retried and run in parallel.

This covers all C++ tests in the CI including aten, libtorch, and Vulkan C++ tests across all platforms Linux, Windows, MacOS.

Notes:

  • To support C++ test discovery, the env variable CPP_TESTS_DIR can be set to where the C++ test binaries is located
  • Support pytest -k argument via run_test as this is used by pytest-cpp to replace --gtest-filter
  • The XML output is in pytest format, but it's ok now because we don't have slow test or flaky test support for C++ test yet
  • I need to figure out why conftest.py doesn't work when I invoke pytest directly for C++ test, so --sc is not available for C++ tests at the moment. Proper pytest plugin like stepwise works fine though. I'll investigate and fix it in a separate PR Found the cause, conftest.py is per directory and needs to be in any arbitrary directory that holds C++ test
  • Two tests test_api and test_tensorexpr timed out on ASAN, I suspect that ASAN is now used on top of the python executable, which is slower than running native C++ code. IMO, it's ok to run these tests as before on ASAN for now

@huydhn huydhn added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 25, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Apr 25, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99956

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 4d545dd:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@huydhn huydhn added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Apr 29, 2023
@huydhn huydhn requested a review from clee2000 April 29, 2023 01:22
@huydhn huydhn marked this pull request as ready for review April 29, 2023 01:32
@huydhn huydhn added the ciflow/mps Run MPS tests (subset of trunk) label Apr 30, 2023
@huydhn huydhn requested a review from malfet May 4, 2023 07:57
@huydhn huydhn requested a review from ZainRizvi May 8, 2023 21:24
@huydhn huydhn added the ciflow/unstable Run all experimental or flaky jobs on PyTorch unstable workflow label May 9, 2023
fi

python test/cpp/jit/tests_setup.py shutdown
# Cleaning up test artifacts in the test folder
pushd test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why the pushd/popd was needed now when it wasn't needed before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script test/cpp/jit/tests_setup.py creates and cleans up a dummy JIT output for testing to the current directory. Previously, when the C++ tests was run directly as an executable, there was no need to change the directory. On the other hand, run_test.py changes the working directory to test, thus I need to do the same here to create that dummy output in test folder instead.

fi
if [[ -x ./cuda_complex_math_test ]]; then
./cuda_complex_math_test
python test/run_test.py --cpp --verbose -i cpp/cuda_complex_math_test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make these verbose? Does this turn them into "as verbose as normal python tests" mode, or do cpp tests need to be extra verbose for some reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a force of habit I guess as other Python tests all run with --verbose. Now that you point it out, there is no difference in this case from what I see between python test/run_test.py --cpp -i cpp/basic --verbose and python test/run_test.py --cpp -i cpp/basic for example.

@huydhn
Copy link
Contributor Author

huydhn commented May 9, 2023

@pytorchbot merge -f 'ROCm distributed tests are flaky and periodic tests are broken. None are C++ tests'

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request May 11, 2023
The flaky issue has been fixed by #100909.  In addition, retry support for C++ is now available after #99956.  So it's safe to move this back from unstable now.

Per the discussion with @clee2000, it makes sense to have this as a periodic job given that this one rarely fails.  To prevent any gap here, I have created pytorch/test-infra#4144 to add the necessary testing for Vulkan change.

Pull Request resolved: #101026
Approved by: https://github.com/kit1980
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/mps Run MPS tests (subset of trunk) ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request ciflow/unstable Run all experimental or flaky jobs on PyTorch unstable workflow Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants