Skip to content

Conversation

@arindamroy-eng
Copy link
Contributor

@arindamroy-eng arindamroy-eng commented Jul 12, 2022

The following tests are being re-enabled for ROCm:

  • test_openmp.py
  • TestTensorExprPyBind tests in test_tensorexpr_pybind.py

@arindamroy-eng arindamroy-eng requested a review from a team as a code owner July 12, 2022 22:57
@pytorch-bot pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Jul 12, 2022
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 12, 2022

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit a801d90 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@arindamroy-eng arindamroy-eng changed the title RROCM re enable tensorexpr and test_openmp ROCM: Re enable tensorexpr and test_openmp Jul 13, 2022
@arindamroy-eng
Copy link
Contributor Author

@pruthvistony needs merge

@pruthvistony pruthvistony added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 13, 2022
@pruthvistony
Copy link
Collaborator

@arindamroy-eng ,
Check if the change is causing a problem for the libtorch_cpu.so build, due to llvm path changes.
What is the result from your local testing?

@arindamroy-eng
Copy link
Contributor Author

find /opt/ -name libtorch_cpu.so

/opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so

The test_tensorexpr runs fine. It actually the test_mobile_nnc, which not a python file, has problem finding the libtorch_cpu.so as its not in its default LD path.

echo $LD_LIBRARY_PATH

/opt/ompi/lib:/opt/rocm/lib:/usr/local/lib

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/conda/lib/python3.7/site-packages/torch/lib/

/opt/conda/lib/python3.7/site-packages/torch/bin/test_mobile_nnc --gtest_output=xml:/test_mobile_nnc.xml
Note: Google Test filter = -_CUDA:*_MultiCUDA

[==========] Running 6 tests from 2 test suites.

[----------] Global test environment set-up.

@arindamroy-eng
Copy link
Contributor Author

arindamroy-eng commented Jul 19, 2022

@priyaramani Can you help out in what should be added to the aot_test in .jenkins/pytorch/tesh.sh.

In the linuxfocal build, the aot test is passing, but it is writing its result to /, hence in the next step its failing:

  • /opt/conda/lib/python3.7/site-packages/torch/bin/test_mobile_nnc --gtest_output=xml:/test_mobile_nnc.xml
    ........
    [ FATAL ] /var/lib/jenkins/workspace/third_party/googletest/googletest/src/gtest.cc:190:: Unable to open file "/test_mobile_nnc.xml"
    69065
    .jenkins/pytorch/test.sh: line 329: 42736 Aborted (core dumped) LD_LIBRARY_PATH="$LD_LIBRARY_PATH":/opt/conda/lib/python3.7/site-packages/torch/lib/ "$TORCH_BIN_DIR"/test_mobile_nnc --gtest_output=xml:$TEST_REPORTS_DIR/test_mobile_nnc.xml

@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 20, 2022
@arindamroy-eng
Copy link
Contributor Author

The failing test is in torch dynamo.
test_bce_with_logits_has_correct_forward_grad (main.TestNN)
File "/opt/conda/lib/python3.7/site-packages/torch/fx/graph_module.py", line 267, in call
75247
return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
75248
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1186, in _call_impl
75249
return forward_call(*input, **kwargs)
75250
TypeError: forward() takes 2 positional arguments but 3 were given
75251

This is unrelated to the PR.
@bdhirsh

@arindamroy-eng
Copy link
Contributor Author

@pruthvistony @jeffdaily @jithunnair-amd This can be merged now.

@jithunnair-amd
Copy link
Collaborator

@malfet Who would be the right person to review this PR?

Copy link
Collaborator

@jeffdaily jeffdaily left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert the following commits:​

Then please apply this patch:

diff --git a/.jenkins/pytorch/test.sh b/.jenkins/pytorch/test.sh
index b476d25250..432656c6af 100755
--- a/.jenkins/pytorch/test.sh
+++ b/.jenkins/pytorch/test.sh
@@ -322,6 +322,9 @@ test_libtorch() {

 test_aot_compilation() {
   echo "Testing Ahead of Time compilation"
+  ln -sf "$TORCH_LIB_DIR"/libc10* "$TORCH_BIN_DIR"
+  ln -sf "$TORCH_LIB_DIR"/libtorch* "$TORCH_BIN_DIR"
+
   if [ -f "$TORCH_BIN_DIR"/test_mobile_nnc ]; then "$TORCH_BIN_DIR"/test_mobile_nnc --gtest_output=xml:$TEST_REPORTS_DIR/test_mobile_nnc.xml; fi
   # shellcheck source=test/mobile/nnc/test_aot_compile.sh
   if [ -f "$TORCH_BIN_DIR"/aot_model_compiler_test ]; then source test/mobile/nnc/test_aot_compile.sh; fi

This follows the precedent elsewhere in test.sh to symlink the libs prior to running cpp tests.

The reason it had failed for rocm was due to the order of tests in the test.sh file. Targets other than rocm were symlinking as part of their steps, and those other tests were executed prior to test_aot_compilation() but rocm skipped them and therefore skipped the symlinking. And since test_aot_compilation() didn't do its own symlinking, it would fail for rocm.

@jeffdaily jeffdaily changed the title ROCM: Re enable tensorexpr and test_openmp [ROCm] re-enable tensorexpr and test_openmp Aug 29, 2022
@arindamroy-eng
Copy link
Contributor Author

@pytorchmergebot please merge

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 31, 2022

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'please' (choose from 'merge', 'revert', 'rebase', 'label')

usage: @pytorchbot [-h] {merge,revert,rebase,label} ...

Try @pytorchbot --help for more info.

@malfet
Copy link
Contributor

malfet commented Sep 13, 2022

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

By setting this path, TestTensorExprPyBind
tests in test_tensorexpr_pybind.py will be
enabled for ROCM

Signed-off-by: Arindam Roy <rarindam@gmail.com>
Signed-off-by: Arindam Roy <rarindam@gmail.com>
When LLVM is being enabled for ROCM, it
seems that the test_mobile_nnc does not
have the libtorch_cpu.so in its LD_LIBRARY_PATH.
Adding this path temorarily to run the binary.

Signed-off-by: Arindam Roy <rarindam@gmail.com>
This test is for mobile device, and is
throwing unrelated errors once llvm is being
enabled.
Hence disabling for ROCM.

Signed-off-by: Arindam Roy <rarindam@gmail.com>
arindamroy-eng and others added 2 commits September 13, 2022 21:21
This follows the precedent elsewhere in test.sh
to symlink the libs prior to running cpp tests.

The reason it had failed for rocm was due to the
order of tests in the test.sh file. Targets other
than rocm were symlinking as part of their steps,
and those other tests were executed prior to
test_aot_compilation() but rocm skipped them
and therefore skipped the symlinking. And since
test_aot_compilation() didn't do its own symlinking,
it would fail for rocm.

Signed-off-by: Arindam Roy <rarindam@gmail.com>
@pytorchmergebot
Copy link
Collaborator

Successfully rebased ROCM_ReEnable_Tensorexpr onto refs/remotes/origin/master, please pull locally before adding more changes (for example, via git checkout ROCM_ReEnable_Tensorexpr && git pull --rebase)

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 13, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81367

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 3 Pending

As of commit fcf09f1:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@malfet
Copy link
Contributor

malfet commented Sep 13, 2022

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the green (-g) flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@github-actions
Copy link
Contributor

Hey @arindamroy-eng.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

mehtanirav pushed a commit that referenced this pull request Oct 4, 2022
The following tests are being re-enabled for ROCm:
- test_openmp.py
- TestTensorExprPyBind tests in test_tensorexpr_pybind.py
Pull Request resolved: #81367
Approved by: https://github.com/jeffdaily, https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed Merged module: rocm AMD GPU support for Pytorch open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants