Skip to content

Conversation

pruthvistony
Copy link
Collaborator

  • This change is required to handle the case when hipcc is
    updated to the latest using update-alternatives.
  • Update-alternatives support for few ROCm binaries is available
    from ROCm 4.1 onwards.
  • This change doesnt not affect any previous versions of ROCm.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 13, 2021

💊 CI failures summary and remediations

As of commit 652f662 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 14 18:22:53 AssertionError: False is not true : Scalars failed to compare as equal! Comparing 1 and 6 gives a difference of 5, but the allowed difference with rtol=0 and atol=0 is only 0!
Apr 14 18:22:53   test_simple_backward_same_input (__main__.TestMultithreadAutograd) ... ok (0.010s)
Apr 14 18:22:53 
Apr 14 18:22:53 ======================================================================
Apr 14 18:22:53 FAIL [0.003s]: test_record_function (__main__.TestAutograd)
Apr 14 18:22:53 ----------------------------------------------------------------------
Apr 14 18:22:53 Traceback (most recent call last):
Apr 14 18:22:53   File "test_autograd.py", line 3300, in test_record_function
Apr 14 18:22:53     self.assertEqual(idx, len(important_events))
Apr 14 18:22:53   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1371, in assertEqual
Apr 14 18:22:53     super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Apr 14 18:22:53 AssertionError: False is not true : Scalars failed to compare as equal! Comparing 1 and 6 gives a difference of 5, but the allowed difference with rtol=0 and atol=0 is only 0!
Apr 14 18:22:53 
Apr 14 18:22:53 ----------------------------------------------------------------------
Apr 14 18:22:53 Ran 1105 tests in 660.612s
Apr 14 18:22:53 
Apr 14 18:22:53 FAILED (failures=1, skipped=5, expected failures=1)
Apr 14 18:22:53 
Apr 14 18:22:53 Generating XML reports...
Apr 14 18:22:53 Generated XML report: test-reports/python-unittest/test_autograd/TEST-TestAutograd-20210414181152.xml
Apr 14 18:22:53 Generated XML report: test-reports/python-unittest/test_autograd/TEST-TestAutogradComplex-20210414181152.xml
Apr 14 18:22:53 Generated XML report: test-reports/python-unittest/test_autograd/TEST-TestAutogradDeviceTypeCUDA-20210414181152.xml

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@github-actions github-actions bot added the module: rocm AMD GPU support for Pytorch label Apr 13, 2021
@pruthvistony
Copy link
Collaborator Author

@jeffdaily @jithunnair-amd @sunway513
Please review this change.

@pruthvistony pruthvistony force-pushed the rocm_41_update_alt_hipcc branch from 319f4d3 to 8f64472 Compare April 13, 2021 22:54
@pruthvistony pruthvistony force-pushed the rocm_41_update_alt_hipcc branch from 8f64472 to 22d09b2 Compare April 13, 2021 23:11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: need a space after comma

@jbschlosser jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 14, 2021
@ezyang
Copy link
Contributor

ezyang commented Apr 14, 2021

I'm not the ROCm expert but I don't like this patch. You can't assume just because hipcc is in usr that you're on a linux distribution with update-alternatives.

Conventionally, if there is a directory associated with a compiler, there is some command line flag you can pass to the compiler to get the paths. For example, you can do gcc -print-search-dirs to get the install dir. Does hipcc have something similar?

@codecov
Copy link

codecov bot commented Apr 14, 2021

Codecov Report

Merging #55968 (22d09b2) into master (bbdb37b) will decrease coverage by 0.35%.
The diff coverage is 0.00%.

❗ Current head 22d09b2 differs from pull request most recent head 652f662. Consider uploading reports for the commit 652f662 to get more accurate results

@@            Coverage Diff             @@
##           master   #55968      +/-   ##
==========================================
- Coverage   77.14%   76.78%   -0.36%     
==========================================
  Files        1909     1909              
  Lines      189104   189110       +6     
==========================================
- Hits       145878   145215     -663     
- Misses      43226    43895     +669     

@jeffdaily
Copy link
Collaborator

I'm not the ROCm expert but I don't like this patch. You can't assume just because hipcc is in usr that you're on a linux distribution with update-alternatives.

We discussed this and decided to use readlink -f to resolve ROCM_HOME. Updates from @pruthvistony to follow.

- This change is required to handle the case when hipcc is
  updated to the latest using update-alternatives.
- This change doesnt not affect any previous versions of ROCm.
@pruthvistony pruthvistony force-pushed the rocm_41_update_alt_hipcc branch from 22d09b2 to 652f662 Compare April 14, 2021 16:56
@pruthvistony
Copy link
Collaborator Author

I'm not the ROCm expert but I don't like this patch. You can't assume just because hipcc is in usr that you're on a linux distribution with update-alternatives.

We discussed this and decided to use readlink -f to resolve ROCM_HOME. Updates from @pruthvistony to follow.

As discussed I have updated the patch.
Please review.

@pruthvistony
Copy link
Collaborator Author

I'm not the ROCm expert but I don't like this patch. You can't assume just because hipcc is in usr that you're on a linux distribution with update-alternatives.

Conventionally, if there is a directory associated with a compiler, there is some command line flag you can pass to the compiler to get the paths. For example, you can do gcc -print-search-dirs to get the install dir. Does hipcc have something similar?

Currently in hipcc we dont have any options to get the installed dirs. I have updated the commit to NOT check the 'usr' directory and use readlink to get the final binary path, this works also when update-alternatives is used.

@pruthvistony pruthvistony requested a review from ezyang April 14, 2021 22:50
Copy link
Collaborator

@jeffdaily jeffdaily left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI passes.

@pruthvistony
Copy link
Collaborator Author

The CI failure, doesnt seems to be related to this change

test_record_function - TestAutograd
test_autograd.py


Traceback (most recent call last):
  File "test_autograd.py", line 3300, in test_record_function
    self.assertEqual(idx, len(important_events))
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1371, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Scalars failed to compare as equal! Comparing 1 and 6 gives a difference of 5, but the allowed difference with rtol=0 and atol=0 is only 0!

@facebook-github-bot
Copy link
Contributor

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Collaborator

@jithunnair-amd jithunnair-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in b383b63.

krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
- This change is required to handle the case when hipcc is
  updated to the latest using update-alternatives.
- Update-alternatives support for few ROCm binaries is available
  from ROCm 4.1 onwards.
- This change doesnt not affect any previous versions of ROCm.

Pull Request resolved: pytorch#55968

Reviewed By: mruberry

Differential Revision: D27785123

Pulled By: ezyang

fbshipit-source-id: 8467e468d8d51277fab9b0c8cbd57e80bbcfc7f7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: rocm AMD GPU support for Pytorch open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants