-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[ROCm] re-enable tensorexpr and test_openmp #81367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROCm] re-enable tensorexpr and test_openmp #81367
Conversation
🔗 Helpful links
✅ No Failures (0 Pending)As of commit a801d90 (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
@pruthvistony needs merge |
|
@arindamroy-eng , |
|
find /opt/ -name libtorch_cpu.so /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so The test_tensorexpr runs fine. It actually the test_mobile_nnc, which not a python file, has problem finding the libtorch_cpu.so as its not in its default LD path. echo $LD_LIBRARY_PATH /opt/ompi/lib:/opt/rocm/lib:/usr/local/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/conda/lib/python3.7/site-packages/torch/lib/ /opt/conda/lib/python3.7/site-packages/torch/bin/test_mobile_nnc --gtest_output=xml:/test_mobile_nnc.xml [==========] Running 6 tests from 2 test suites. [----------] Global test environment set-up. |
|
@priyaramani Can you help out in what should be added to the aot_test in .jenkins/pytorch/tesh.sh. In the linuxfocal build, the aot test is passing, but it is writing its result to /, hence in the next step its failing:
|
|
The failing test is in torch dynamo. This is unrelated to the PR. |
|
@pruthvistony @jeffdaily @jithunnair-amd This can be merged now. |
|
@malfet Who would be the right person to review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert the following commits:
Then please apply this patch:
diff --git a/.jenkins/pytorch/test.sh b/.jenkins/pytorch/test.sh
index b476d25250..432656c6af 100755
--- a/.jenkins/pytorch/test.sh
+++ b/.jenkins/pytorch/test.sh
@@ -322,6 +322,9 @@ test_libtorch() {
test_aot_compilation() {
echo "Testing Ahead of Time compilation"
+ ln -sf "$TORCH_LIB_DIR"/libc10* "$TORCH_BIN_DIR"
+ ln -sf "$TORCH_LIB_DIR"/libtorch* "$TORCH_BIN_DIR"
+
if [ -f "$TORCH_BIN_DIR"/test_mobile_nnc ]; then "$TORCH_BIN_DIR"/test_mobile_nnc --gtest_output=xml:$TEST_REPORTS_DIR/test_mobile_nnc.xml; fi
# shellcheck source=test/mobile/nnc/test_aot_compile.sh
if [ -f "$TORCH_BIN_DIR"/aot_model_compiler_test ]; then source test/mobile/nnc/test_aot_compile.sh; fiThis follows the precedent elsewhere in test.sh to symlink the libs prior to running cpp tests.
The reason it had failed for rocm was due to the order of tests in the test.sh file. Targets other than rocm were symlinking as part of their steps, and those other tests were executed prior to test_aot_compilation() but rocm skipped them and therefore skipped the symlinking. And since test_aot_compilation() didn't do its own symlinking, it would fail for rocm.
|
@pytorchmergebot please merge |
|
❌ 🤖 pytorchbot command failed: Try |
|
@pytorchbot rebase |
|
@pytorchbot successfully started a rebase job. Check the current status here |
By setting this path, TestTensorExprPyBind tests in test_tensorexpr_pybind.py will be enabled for ROCM Signed-off-by: Arindam Roy <rarindam@gmail.com>
Signed-off-by: Arindam Roy <rarindam@gmail.com>
When LLVM is being enabled for ROCM, it seems that the test_mobile_nnc does not have the libtorch_cpu.so in its LD_LIBRARY_PATH. Adding this path temorarily to run the binary. Signed-off-by: Arindam Roy <rarindam@gmail.com>
This test is for mobile device, and is throwing unrelated errors once llvm is being enabled. Hence disabling for ROCM. Signed-off-by: Arindam Roy <rarindam@gmail.com>
This reverts commit 60e170a.
This reverts commit a4947e5.
This follows the precedent elsewhere in test.sh to symlink the libs prior to running cpp tests. The reason it had failed for rocm was due to the order of tests in the test.sh file. Targets other than rocm were symlinking as part of their steps, and those other tests were executed prior to test_aot_compilation() but rocm skipped them and therefore skipped the symlinking. And since test_aot_compilation() didn't do its own symlinking, it would fail for rocm. Signed-off-by: Arindam Roy <rarindam@gmail.com>
|
Successfully rebased |
a801d90 to
fcf09f1
Compare
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81367
Note: Links to docs will display an error until the docs builds have been completed. ✅ No Failures, 3 PendingAs of commit fcf09f1: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot merge -g |
|
@pytorchbot successfully started a merge job. Check the current status here. |
|
Hey @arindamroy-eng. |
The following tests are being re-enabled for ROCm: - test_openmp.py - TestTensorExprPyBind tests in test_tensorexpr_pybind.py Pull Request resolved: #81367 Approved by: https://github.com/jeffdaily, https://github.com/malfet
The following tests are being re-enabled for ROCm: