Skip to content

Commit

Permalink
Run C++ tests on CI with run_test.py (#99956)
Browse files Browse the repository at this point in the history
After #99559, we can now run C++ test with `run_test.py`.  Although advance features such as `--import-slow-tests` and `--import-disabled-tests` won't work for now, there will still be a gain in reliability and performance as C++ can now be retried and run in parallel.

This covers all C++ tests in the CI including aten, libtorch, and Vulkan C++ tests across all platforms Linux, Windows, MacOS.

Notes:
* To support C++ test discovery, the env variable `CPP_TESTS_DIR` can be set to where the C++ test binaries is located
* Support pytest -k argument via run_test as this is used by pytest-cpp to replace `--gtest-filter`
* The XML output is in pytest format, but it's ok now because we don't have slow test or flaky test support for C++ test yet
* ~~I need to figure out why conftest.py doesn't work when I invoke pytest directly for C++ test, so `--sc` is not available for C++ tests at the moment.  Proper pytest plugin like stepwise works fine though.  I'll investigate and fix it in a separate PR~~ Found the cause, `conftest.py` is per directory and needs to be in any arbitrary directory that holds C++ test
* Two tests `test_api` and `test_tensorexpr` timed out on ASAN, I suspect that ASAN is now used on top of the python executable, which is slower than running native C++ code.  IMO, it's ok to run these tests as before on ASAN for now
Pull Request resolved: #99956
Approved by: https://github.com/clee2000, https://github.com/ZainRizvi
  • Loading branch information
huydhn authored and pytorchmergebot committed May 9, 2023
1 parent a8c2cd1 commit 35834a4
Show file tree
Hide file tree
Showing 5 changed files with 171 additions and 120 deletions.
5 changes: 3 additions & 2 deletions .ci/pytorch/macos-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -71,13 +71,14 @@ test_libtorch() {
VERBOSE=1 DEBUG=1 python "$BUILD_LIBTORCH_PY"
popd

python tools/download_mnist.py --quiet -d test/cpp/api/mnist
MNIST_DIR="${PWD}/test/cpp/api/mnist"
python tools/download_mnist.py --quiet -d "${MNIST_DIR}"

# Unfortunately it seems like the test can't load from miniconda3
# without these paths being set
export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$PWD/miniconda3/lib"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PWD/miniconda3/lib"
TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$CPP_BUILD"/caffe2/bin/test_api
TORCH_CPP_TEST_MNIST_PATH="${MNIST_DIR}" CPP_TESTS_DIR="${CPP_BUILD}/caffe2/bin" python test/run_test.py --cpp --verbose -i cpp/test_api

assert_git_not_dirty
fi
Expand Down
105 changes: 59 additions & 46 deletions .ci/pytorch/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ test_inductor() {
# docker build uses bdist_wheel which does not work with test_aot_inductor
# TODO: need a faster way to build
BUILD_AOT_INDUCTOR_TEST=1 python setup.py develop
LD_LIBRARY_PATH="$TORCH_LIB_DIR $TORCH_BIN_DIR"/test_aot_inductor
CPP_TESTS_DIR="${BUILD_BIN_DIR}" LD_LIBRARY_PATH="${TORCH_LIB_DIR}" python test/run_test.py --cpp --verbose -i cpp/test_aot_inductor
}

# "Global" flags for inductor benchmarking controlled by TEST_CONFIG
Expand Down Expand Up @@ -547,43 +547,60 @@ test_libtorch() {
ln -sf "$TORCH_LIB_DIR"/libtbb* "$TORCH_BIN_DIR"
ln -sf "$TORCH_LIB_DIR"/libnvfuser* "$TORCH_BIN_DIR"

export CPP_TESTS_DIR="${TORCH_BIN_DIR}"

# Start background download
python tools/download_mnist.py --quiet -d test/cpp/api/mnist &
MNIST_DIR="${PWD}/test/cpp/api/mnist"
python tools/download_mnist.py --quiet -d "${MNIST_DIR}" &

# Make test_reports directory
# NB: the ending test_libtorch must match the current function name for the current
# test reporting process to function as expected.
TEST_REPORTS_DIR=test/test-reports/cpp-unittest/test_libtorch
mkdir -p $TEST_REPORTS_DIR
# Prepare the model used by test_jit, the model needs to be in the test directory
# to get picked up by run_test
pushd test
python cpp/jit/tests_setup.py setup
popd

# Run JIT cpp tests
python test/cpp/jit/tests_setup.py setup

if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
"$TORCH_BIN_DIR"/test_jit --gtest_output=xml:$TEST_REPORTS_DIR/test_jit.xml
"$TORCH_BIN_DIR"/nvfuser_tests --gtest_output=xml:$TEST_REPORTS_DIR/nvfuser_tests.xml
python test/run_test.py --cpp --verbose -i cpp/test_jit cpp/nvfuser_tests
else
"$TORCH_BIN_DIR"/test_jit --gtest_filter='-*CUDA' --gtest_output=xml:$TEST_REPORTS_DIR/test_jit.xml
# CUDA tests have already been skipped when CUDA is not available
python test/run_test.py --cpp --verbose -i cpp/test_jit -k "not CUDA"
fi

# Run Lazy Tensor cpp tests
if [[ "$BUILD_ENVIRONMENT" == *cuda* && "$TEST_CONFIG" != *nogpu* ]]; then
LTC_TS_CUDA=1 "$TORCH_BIN_DIR"/test_lazy --gtest_output=xml:$TEST_REPORTS_DIR/test_lazy.xml
LTC_TS_CUDA=1 python test/run_test.py --cpp --verbose -i cpp/test_lazy
else
"$TORCH_BIN_DIR"/test_lazy --gtest_output=xml:$TEST_REPORTS_DIR/test_lazy.xml
python test/run_test.py --cpp --verbose -i cpp/test_lazy
fi

python test/cpp/jit/tests_setup.py shutdown
# Cleaning up test artifacts in the test folder
pushd test
python cpp/jit/tests_setup.py shutdown
popd

# Wait for background download to finish
wait
# Exclude IMethodTest that relies on torch::deploy, which will instead be ran in test_deploy.
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$TORCH_BIN_DIR"/test_api --gtest_filter='-IMethodTest.*' --gtest_output=xml:$TEST_REPORTS_DIR/test_api.xml
"$TORCH_BIN_DIR"/test_tensorexpr --gtest_output=xml:$TEST_REPORTS_DIR/test_tensorexpr.xml

if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then
TEST_REPORTS_DIR=test/test-reports/cpp-unittest/test_libtorch
mkdir -p $TEST_REPORTS_DIR

# TODO: Not quite sure why these tests time out only on ASAN, probably
# this is due to the fact that a python executable is used and ASAN
# treats that differently
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="${MNIST_DIR}" "$TORCH_BIN_DIR"/test_api --gtest_filter='-IMethodTest.*' --gtest_output=xml:$TEST_REPORTS_DIR/test_api.xml
"$TORCH_BIN_DIR"/test_tensorexpr --gtest_output=xml:$TEST_REPORTS_DIR/test_tensorexpr.xml
else
# Exclude IMethodTest that relies on torch::deploy, which will instead be ran in test_deploy
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="${MNIST_DIR}" python test/run_test.py --cpp --verbose -i cpp/test_api -k "not IMethodTest"
python test/run_test.py --cpp --verbose -i cpp/test_tensorexpr
fi

if [[ "${BUILD_ENVIRONMENT}" != *android* && "${BUILD_ENVIRONMENT}" != *cuda* && "${BUILD_ENVIRONMENT}" != *asan* ]]; then
# TODO: Consider to run static_runtime_test from $TORCH_BIN_DIR (may need modify build script)
"$BUILD_BIN_DIR"/static_runtime_test --gtest_output=xml:$TEST_REPORTS_DIR/static_runtime_test.xml
# NB: This test is not under TORCH_BIN_DIR but under BUILD_BIN_DIR
export CPP_TESTS_DIR="${BUILD_BIN_DIR}"
python test/run_test.py --cpp --verbose -i cpp/static_runtime_test
fi

assert_git_not_dirty
Expand All @@ -595,26 +612,21 @@ test_aot_compilation() {
ln -sf "$TORCH_LIB_DIR"/libc10* "$TORCH_BIN_DIR"
ln -sf "$TORCH_LIB_DIR"/libtorch* "$TORCH_BIN_DIR"

# Make test_reports directory
# NB: the ending test_libtorch must match the current function name for the current
# test reporting process to function as expected.
TEST_REPORTS_DIR=test/test-reports/cpp-unittest/test_aot_compilation
mkdir -p $TEST_REPORTS_DIR
if [ -f "$TORCH_BIN_DIR"/test_mobile_nnc ]; then "$TORCH_BIN_DIR"/test_mobile_nnc --gtest_output=xml:$TEST_REPORTS_DIR/test_mobile_nnc.xml; fi
# shellcheck source=test/mobile/nnc/test_aot_compile.sh
if [ -f "$TORCH_BIN_DIR"/aot_model_compiler_test ]; then source test/mobile/nnc/test_aot_compile.sh; fi
if [ -f "$TORCH_BIN_DIR"/test_mobile_nnc ]; then
CPP_TESTS_DIR="${TORCH_BIN_DIR}" python test/run_test.py --cpp --verbose -i cpp/test_mobile_nnc
fi

if [ -f "$TORCH_BIN_DIR"/aot_model_compiler_test ]; then
source test/mobile/nnc/test_aot_compile.sh
fi
}

test_vulkan() {
if [[ "$BUILD_ENVIRONMENT" == *vulkan* ]]; then
ln -sf "$TORCH_LIB_DIR"/libtorch* "$TORCH_TEST_DIR"
ln -sf "$TORCH_LIB_DIR"/libc10* "$TORCH_TEST_DIR"
export VK_ICD_FILENAMES=/var/lib/jenkins/swiftshader/swiftshader/build/Linux/vk_swiftshader_icd.json
# NB: the ending test_vulkan must match the current function name for the current
# test reporting process to function as expected.
TEST_REPORTS_DIR=test/test-reports/cpp-vulkan/test_vulkan
mkdir -p $TEST_REPORTS_DIR
LD_LIBRARY_PATH=/var/lib/jenkins/swiftshader/swiftshader/build/Linux/ "$TORCH_TEST_DIR"/vulkan_api_test --gtest_output=xml:$TEST_REPORTS_DIR/vulkan_test.xml
CPP_TESTS_DIR="${TORCH_TEST_DIR}" LD_LIBRARY_PATH=/var/lib/jenkins/swiftshader/swiftshader/build/Linux/ python test/run_test.py --cpp --verbose -i cpp/vulkan_api_test
fi
}

Expand All @@ -631,22 +643,24 @@ test_distributed() {
echo "Testing distributed C++ tests"
ln -sf "$TORCH_LIB_DIR"/libtorch* "$TORCH_BIN_DIR"
ln -sf "$TORCH_LIB_DIR"/libc10* "$TORCH_BIN_DIR"
# NB: the ending test_distributed must match the current function name for the current
# test reporting process to function as expected.
TEST_REPORTS_DIR=test/test-reports/cpp-distributed/test_distributed
mkdir -p $TEST_REPORTS_DIR
"$TORCH_BIN_DIR"/FileStoreTest --gtest_output=xml:$TEST_REPORTS_DIR/FileStoreTest.xml
"$TORCH_BIN_DIR"/HashStoreTest --gtest_output=xml:$TEST_REPORTS_DIR/HashStoreTest.xml
"$TORCH_BIN_DIR"/TCPStoreTest --gtest_output=xml:$TEST_REPORTS_DIR/TCPStoreTest.xml

export CPP_TESTS_DIR="${TORCH_BIN_DIR}"
# These are distributed tests, so let's continue running them sequentially here to avoid
# any surprise
python test/run_test.py --cpp --verbose -i cpp/FileStoreTest
python test/run_test.py --cpp --verbose -i cpp/HashStoreTest
python test/run_test.py --cpp --verbose -i cpp/TCPStoreTest

MPIEXEC=$(command -v mpiexec)
if [[ -n "$MPIEXEC" ]]; then
# NB: mpiexec only works directly with the C++ test binary here
MPICMD="${MPIEXEC} -np 2 $TORCH_BIN_DIR/ProcessGroupMPITest"
eval "$MPICMD"
fi
"$TORCH_BIN_DIR"/ProcessGroupGlooTest --gtest_output=xml:$TEST_REPORTS_DIR/ProcessGroupGlooTest.xml
"$TORCH_BIN_DIR"/ProcessGroupNCCLTest --gtest_output=xml:$TEST_REPORTS_DIR/ProcessGroupNCCLTest.xml
"$TORCH_BIN_DIR"/ProcessGroupNCCLErrorsTest --gtest_output=xml:$TEST_REPORTS_DIR/ProcessGroupNCCLErrorsTest.xml

python test/run_test.py --cpp --verbose -i cpp/ProcessGroupGlooTest
python test/run_test.py --cpp --verbose -i cpp/ProcessGroupNCCLTest
python test/run_test.py --cpp --verbose -i cpp/ProcessGroupNCCLErrorsTest
fi
}

Expand All @@ -658,9 +672,8 @@ test_rpc() {
ln -sf "$TORCH_LIB_DIR"/libtorch* "$TORCH_BIN_DIR"
ln -sf "$TORCH_LIB_DIR"/libc10* "$TORCH_BIN_DIR"
ln -sf "$TORCH_LIB_DIR"/libtbb* "$TORCH_BIN_DIR"
TEST_REPORTS_DIR=test/test-reports/cpp-rpc/test_rpc
mkdir -p $TEST_REPORTS_DIR
"$TORCH_BIN_DIR"/test_cpp_rpc --gtest_output=xml:$TEST_REPORTS_DIR/test_cpp_rpc.xml

CPP_TESTS_DIR="${TORCH_BIN_DIR}" python test/run_test.py --cpp --verbose -i cpp/test_cpp_rpc
fi
}

Expand Down
24 changes: 15 additions & 9 deletions .ci/pytorch/win-test-helpers/test_libtorch.bat
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,16 @@ if "%USE_CUDA%" == "0" IF NOT "%CUDA_VERSION%" == "cpu" exit /b 0
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
if errorlevel 1 exit /b 1

cd %TMP_DIR_WIN%\build\torch\bin
set TEST_OUT_DIR=%~dp0\..\..\..\test\test-reports\cpp-unittest
md %TEST_OUT_DIR%
:: Save the current working directory so that we can go back there
set CWD=%cd%

set CPP_TESTS_DIR=%TMP_DIR_WIN%\build\torch\bin
set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%

set TEST_API_OUT_DIR=%TEST_OUT_DIR%\test_api
md %TEST_API_OUT_DIR%
test_api.exe --gtest_filter="-IntegrationTest.MNIST*" --gtest_output=xml:%TEST_API_OUT_DIR%\test_api.xml
set TORCH_CPP_TEST_MNIST_PATH=%CWD%\test\cpp\api\mnist
python tools\download_mnist.py --quiet -d %TORCH_CPP_TEST_MNIST_PATH%

python test\run_test.py --cpp --verbose -i cpp/test_api
if errorlevel 1 exit /b 1
if not errorlevel 0 exit /b 1

Expand All @@ -25,6 +27,10 @@ for /r "." %%a in (*.exe) do (
goto :eof

:libtorch_check

cd %CWD%
set CPP_TESTS_DIR=%TMP_DIR_WIN%\build\torch\test

:: Skip verify_api_visibility as it a compile level test
if "%~1" == "verify_api_visibility" goto :eof

Expand All @@ -42,12 +48,12 @@ if "%~1" == "utility_ops_gpu_test" goto :eof

echo Running "%~2"
if "%~1" == "c10_intrusive_ptr_benchmark" (
:: NB: This is not a gtest executable file, thus couldn't be handled by pytest-cpp
call "%~2"
goto :eof
)
:: Differentiating the test report directories is crucial for test time reporting.
md %TEST_OUT_DIR%\%~n2
call "%~2" --gtest_output=xml:%TEST_OUT_DIR%\%~n2\%~1.xml

python test\run_test.py --cpp --verbose -i "cpp/%~1"
if errorlevel 1 (
echo %1 failed with exit code %errorlevel%
exit /b 1
Expand Down
79 changes: 41 additions & 38 deletions aten/tools/run_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,75 +3,78 @@ set -x
set -e

VALGRIND_SUP="${PWD}/`dirname $0`/valgrind.sup"
pushd $1
export CPP_TESTS_DIR=$1

VALGRIND=${VALGRIND:=ON}
./basic
./atest
./scalar_test
./broadcast_test
./wrapdim_test
./apply_utils_test
./dlconvertor_test
./native_test
./scalar_tensor_test
python test/run_test.py --cpp --verbose -i \
cpp/basic \
cpp/atest \
cpp/scalar_test \
cpp/broadcast_test \
cpp/wrapdim_test \
cpp/apply_utils_test \
cpp/dlconvertor_test \
cpp/native_test \
cpp/scalar_tensor_test \
cpp/undefined_tensor_test \
cpp/extension_backend_test \
cpp/lazy_tensor_test \
cpp/tensor_iterator_test \
cpp/Dimname_test \
cpp/Dict_test \
cpp/NamedTensor_test \
cpp/cpu_generator_test \
cpp/legacy_vmap_test \
cpp/operators_test

if [[ -x ./tensor_interop_test ]]; then
./tensor_interop_test
python test/run_test.py --cpp --verbose -i cpp/tensor_interop_test
fi
./undefined_tensor_test
./extension_backend_test
./lazy_tensor_test
./tensor_iterator_test
./Dimname_test
./Dict_test
./NamedTensor_test
./cpu_generator_test
./legacy_vmap_test
./operators_test
if [[ -x ./cudnn_test ]]; then
./cudnn_test
python test/run_test.py --cpp --verbose -i cpp/cudnn_test
fi
if [[ -x ./cuda_generator_test ]]; then
./cuda_generator_test
python test/run_test.py --cpp --verbose -i cpp/cuda_generator_test
fi
if [[ -x ./apply_test ]]; then
./apply_test
python test/run_test.py --cpp --verbose -i cpp/apply_test
fi
if [[ -x ./stream_test ]]; then
./stream_test
python test/run_test.py --cpp --verbose -i cpp/stream_test
fi
if [[ -x ./cuda_half_test ]]; then
./cuda_half_test
python test/run_test.py --cpp --verbose -i cpp/cuda_half_test
fi
if [[ -x ./cuda_vectorized_test ]]; then
./cuda_vectorized_test
python test/run_test.py --cpp --verbose -i cpp/cuda_vectorized_test
fi
if [[ -x ./cuda_distributions_test ]]; then
./cuda_distributions_test
python test/run_test.py --cpp --verbose -i cpp/cuda_distributions_test
fi
if [[ -x ./cuda_optional_test ]]; then
./cuda_optional_test
python test/run_test.py --cpp --verbose -i cpp/cuda_optional_test
fi
if [[ -x ./cuda_tensor_interop_test ]]; then
./cuda_tensor_interop_test
python test/run_test.py --cpp --verbose -i cpp/cuda_tensor_interop_test
fi
if [[ -x ./cuda_complex_test ]]; then
./cuda_complex_test
python test/run_test.py --cpp --verbose -i cpp/cuda_complex_test
fi
if [[ -x ./cuda_complex_math_test ]]; then
./cuda_complex_math_test
python test/run_test.py --cpp --verbose -i cpp/cuda_complex_math_test
fi
if [[ -x ./cuda_cub_test ]]; then
./cuda_cub_test
python test/run_test.py --cpp --verbose -i cpp/cuda_cub_test
fi
if [[ -x ./cuda_atomic_ops_test ]]; then
./cuda_atomic_ops_test
python test/run_test.py --cpp --verbose -i cpp/cuda_atomic_ops_test
fi

if [ "$VALGRIND" == "ON" ]; then
valgrind --suppressions="$VALGRIND_SUP" --error-exitcode=1 ./basic --gtest_filter='-*CUDA'
# NB: As these tests are invoked by valgrind, let's leave them for now as it's
# unclear if valgrind -> python -> gtest would work
valgrind --suppressions="$VALGRIND_SUP" --error-exitcode=1 "${CPP_TESTS_DIR}/basic" --gtest_filter='-*CUDA'
if [[ -x ./tensor_interop_test ]]; then
valgrind --suppressions="$VALGRIND_SUP" --error-exitcode=1 ./tensor_interop_test
valgrind --suppressions="$VALGRIND_SUP" --error-exitcode=1 "${CPP_TESTS_DIR}/tensor_interop_test"
fi
fi

popd

0 comments on commit 35834a4

Please sign in to comment.