Skip to content

Conversation

@jeffdaily
Copy link
Collaborator

@jeffdaily jeffdaily commented May 6, 2022

The env var HSA_FORCE_FINE_GRAIN_PCIE=1 enables P2P communication in RCCL without intermediate buffers. This is necessary on hosts with only PCIe and no P2P high-speed interconnect.

@jeffdaily jeffdaily added the ciflow/trunk Trigger trunk jobs on your pull request label May 6, 2022
@pytorch-bot pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label May 6, 2022
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 6, 2022

🔗 Helpful links

❌ 12 New Failures, 1 Base Failures

As of commit 0eae330 (more details on the Dr. CI page):

Expand to see more
  • 12/13 failures introduced in this PR
  • 1/13 broken upstream at merge base 2c5bf12 on May 09 from 12:12pm to 5:12pm

🕵️ 12 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge) (1/12)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-09T21:20:34.5875298Z RuntimeError: test_meta failed!
2022-05-09T21:20:33.9070577Z 
2022-05-09T21:20:33.9070691Z FAILED (errors=2, skipped=219, expected failures=17)
2022-05-09T21:20:33.9070838Z 
2022-05-09T21:20:33.9070921Z Generating XML reports...
2022-05-09T21:20:34.3308265Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaCPU-20220509211917.xml
2022-05-09T21:20:34.5871111Z Traceback (most recent call last):
2022-05-09T21:20:34.5871417Z   File "test/run_test.py", line 1072, in <module>
2022-05-09T21:20:34.5872835Z     main()
2022-05-09T21:20:34.5873108Z   File "test/run_test.py", line 1050, in main
2022-05-09T21:20:34.5875025Z     raise RuntimeError(err_message)
2022-05-09T21:20:34.5875298Z RuntimeError: test_meta failed!
2022-05-09T21:20:34.7909265Z 
2022-05-09T21:20:34.7909526Z real	34m3.816s
2022-05-09T21:20:34.7909817Z user	88m47.995s
2022-05-09T21:20:34.7910064Z sys	8m53.548s
2022-05-09T21:20:34.7910315Z + cleanup
2022-05-09T21:20:34.7910543Z + retcode=1
2022-05-09T21:20:34.7910799Z + set +x
2022-05-09T21:20:34.7943065Z ##[error]Process completed with exit code 1.
2022-05-09T21:20:34.8069501Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-09T21:20:34.8069756Z with:

See GitHub Actions build pull / linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (2/12)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-09T21:25:22.0745858Z RuntimeError: test_meta failed!
2022-05-09T21:25:21.2952909Z 
2022-05-09T21:25:21.2953025Z FAILED (errors=2, skipped=219, expected failures=17)
2022-05-09T21:25:21.2953171Z 
2022-05-09T21:25:21.2953254Z Generating XML reports...
2022-05-09T21:25:21.7519956Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaCPU-20220509212404.xml
2022-05-09T21:25:22.0739249Z Traceback (most recent call last):
2022-05-09T21:25:22.0739731Z   File "test/run_test.py", line 1072, in <module>
2022-05-09T21:25:22.0742331Z     main()
2022-05-09T21:25:22.0745099Z   File "test/run_test.py", line 1050, in main
2022-05-09T21:25:22.0745489Z     raise RuntimeError(err_message)
2022-05-09T21:25:22.0745858Z RuntimeError: test_meta failed!
2022-05-09T21:25:22.3303390Z + cleanup
2022-05-09T21:25:22.3303756Z + retcode=1
2022-05-09T21:25:22.3303931Z + set +x
2022-05-09T21:25:22.3337003Z ##[error]Process completed with exit code 1.
2022-05-09T21:25:22.3378885Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-09T21:25:22.3379120Z with:
2022-05-09T21:25:22.3379545Z   github-token: ***
2022-05-09T21:25:22.3379716Z env:
2022-05-09T21:25:22.3379857Z   IN_CI: 1
2022-05-09T21:25:22.3380015Z   IS_GHA: 1

See GitHub Actions build trunk / win-vs2019-cuda11.3-py3 / test (default, 3, 5, windows.8xlarge.nvidia.gpu) (3/12)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-09T23:49:42.9434151Z RuntimeError: test_meta failed!
2022-05-09T23:49:40.9105176Z FAILED (errors=4, skipped=430, expected failures=34)
2022-05-09T23:49:40.9105374Z 
2022-05-09T23:49:40.9105485Z Generating XML reports...
2022-05-09T23:49:40.9105897Z Generated XML report: test-reports\python-unittest\test_meta\TEST-TestMetaCPU-20220509232631.xml
2022-05-09T23:49:40.9106467Z Generated XML report: test-reports\python-unittest\test_meta\TEST-TestMetaCUDA-20220509232631.xml
2022-05-09T23:49:42.9431896Z Traceback (most recent call last):
2022-05-09T23:49:42.9432387Z   File "run_test.py", line 1072, in <module>
2022-05-09T23:49:42.9433014Z     main()
2022-05-09T23:49:42.9433280Z   File "run_test.py", line 1050, in main
2022-05-09T23:49:42.9433601Z     raise RuntimeError(err_message)
2022-05-09T23:49:42.9434151Z RuntimeError: test_meta failed!
2022-05-09T23:49:43.4122956Z 
2022-05-09T23:49:43.4123963Z (base) C:\actions-runner\_work\pytorch\pytorch\test>if ERRORLEVEL 1 goto fail 
2022-05-09T23:49:43.4127286Z 
2022-05-09T23:49:43.4127828Z (base) C:\actions-runner\_work\pytorch\pytorch\test>exit /b 1 
2022-05-09T23:49:43.4166823Z + cleanup
2022-05-09T23:49:43.4167263Z + retcode=1
2022-05-09T23:49:43.4167612Z + set +x
2022-05-09T23:49:43.4205106Z ##[error]Process completed with exit code 1.
2022-05-09T23:49:43.4798864Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-09T23:49:43.4799265Z with:

See GitHub Actions build pull / linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge) (4/12)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-09T21:20:18.7551210Z RuntimeError: test_meta failed!
2022-05-09T21:20:17.9996346Z 
2022-05-09T21:20:17.9996458Z FAILED (errors=2, skipped=219, expected failures=17)
2022-05-09T21:20:17.9996603Z 
2022-05-09T21:20:17.9996686Z Generating XML reports...
2022-05-09T21:20:18.4531024Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaCPU-20220509211900.xml
2022-05-09T21:20:18.7547311Z Traceback (most recent call last):
2022-05-09T21:20:18.7547564Z   File "test/run_test.py", line 1072, in <module>
2022-05-09T21:20:18.7548907Z     main()
2022-05-09T21:20:18.7549155Z   File "test/run_test.py", line 1050, in main
2022-05-09T21:20:18.7551002Z     raise RuntimeError(err_message)
2022-05-09T21:20:18.7551210Z RuntimeError: test_meta failed!
2022-05-09T21:20:18.9789129Z + cleanup
2022-05-09T21:20:18.9789479Z + retcode=1
2022-05-09T21:20:18.9789753Z + set +x
2022-05-09T21:20:18.9821543Z ##[error]Process completed with exit code 1.
2022-05-09T21:20:18.9878283Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-09T21:20:18.9878534Z with:
2022-05-09T21:20:18.9878967Z   github-token: ***
2022-05-09T21:20:18.9879124Z env:
2022-05-09T21:20:18.9879276Z   IN_CI: 1
2022-05-09T21:20:18.9879437Z   IS_GHA: 1

See GitHub Actions build trunk / linux-bionic-cuda10.2-py3.9-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (5/12)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-09T22:46:04.0207615Z RuntimeError: test_meta failed!
2022-05-09T22:46:02.7338952Z 
2022-05-09T22:46:02.7339173Z FAILED (errors=2, skipped=210, expected failures=17)
2022-05-09T22:46:02.7339390Z 
2022-05-09T22:46:02.7339522Z Generating XML reports...
2022-05-09T22:46:03.2891187Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaCUDA-20220509224355.xml
2022-05-09T22:46:04.0203061Z Traceback (most recent call last):
2022-05-09T22:46:04.0203607Z   File "/var/lib/jenkins/workspace/test/run_test.py", line 1072, in <module>
2022-05-09T22:46:04.0204252Z     main()
2022-05-09T22:46:04.0204578Z   File "/var/lib/jenkins/workspace/test/run_test.py", line 1050, in main
2022-05-09T22:46:04.0207273Z     raise RuntimeError(err_message)
2022-05-09T22:46:04.0207615Z RuntimeError: test_meta failed!
2022-05-09T22:46:04.6603016Z 
2022-05-09T22:46:04.6603796Z real	90m53.673s
2022-05-09T22:46:04.6604511Z user	87m44.551s
2022-05-09T22:46:04.6604987Z sys	4m49.703s
2022-05-09T22:46:04.6605239Z + cleanup
2022-05-09T22:46:04.6605484Z + retcode=1
2022-05-09T22:46:04.6605722Z + set +x
2022-05-09T22:46:04.6657415Z ##[error]Process completed with exit code 1.
2022-05-09T22:46:04.6711030Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-09T22:46:04.6711404Z with:

See GitHub Actions build trunk / win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge) (6/12)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-09T21:49:04.7998258Z RuntimeError: test_meta failed!
2022-05-09T21:49:04.4559881Z 
2022-05-09T21:49:04.4560013Z FAILED (errors=2, skipped=223, expected failures=17)
2022-05-09T21:49:04.4560174Z 
2022-05-09T21:49:04.4560264Z Generating XML reports...
2022-05-09T21:49:04.4560659Z Generated XML report: test-reports\python-unittest\test_meta\TEST-TestMetaCPU-20220509214738.xml
2022-05-09T21:49:04.7997026Z Traceback (most recent call last):
2022-05-09T21:49:04.7997373Z   File "run_test.py", line 1072, in <module>
2022-05-09T21:49:04.7997567Z     main()
2022-05-09T21:49:04.7997785Z   File "run_test.py", line 1050, in main
2022-05-09T21:49:04.7998043Z     raise RuntimeError(err_message)
2022-05-09T21:49:04.7998258Z RuntimeError: test_meta failed!
2022-05-09T21:49:05.0568352Z 
2022-05-09T21:49:05.0569008Z (base) C:\actions-runner\_work\pytorch\pytorch\test>if ERRORLEVEL 1 goto fail 
2022-05-09T21:49:05.0570887Z 
2022-05-09T21:49:05.0571158Z (base) C:\actions-runner\_work\pytorch\pytorch\test>exit /b 1 
2022-05-09T21:49:05.0598710Z + cleanup
2022-05-09T21:49:05.0598948Z + retcode=1
2022-05-09T21:49:05.0599285Z + set +x
2022-05-09T21:49:05.0629321Z ##[error]Process completed with exit code 1.
2022-05-09T21:49:05.0990058Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-09T21:49:05.0990367Z with:

See GitHub Actions build pull / linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu) (7/12)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-09T23:53:42.9695548Z RuntimeError: test_meta failed!
2022-05-09T23:53:38.8624056Z 
2022-05-09T23:53:38.8624217Z FAILED (errors=2, skipped=260, expected failures=9)
2022-05-09T23:53:38.8624442Z 
2022-05-09T23:53:38.8624576Z Generating XML reports...
2022-05-09T23:53:39.4042780Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaCUDA-20220509235113.xml
2022-05-09T23:53:42.9680752Z Traceback (most recent call last):
2022-05-09T23:53:42.9681579Z   File "test/run_test.py", line 1072, in <module>
2022-05-09T23:53:42.9687335Z     main()
2022-05-09T23:53:42.9688122Z   File "test/run_test.py", line 1050, in main
2022-05-09T23:53:42.9694695Z     raise RuntimeError(err_message)
2022-05-09T23:53:42.9695548Z RuntimeError: test_meta failed!
2022-05-09T23:53:45.0124619Z 
2022-05-09T23:53:45.0125802Z real	2m37.291s
2022-05-09T23:53:45.0126487Z user	2m32.031s
2022-05-09T23:53:45.0127046Z sys	0m18.499s
2022-05-09T23:53:45.0127606Z + cleanup
2022-05-09T23:53:45.0128140Z + retcode=1
2022-05-09T23:53:45.0128660Z + set +x
2022-05-09T23:53:45.0234602Z ##[error]Process completed with exit code 1.
2022-05-09T23:53:45.0301521Z ##[group]Run # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct
2022-05-09T23:53:45.0302466Z �[36;1m# copy test results back to the mounted workspace, needed sudo, resulting permissions were correct�[0m

See GitHub Actions build trunk / macos-11-py3-x86-64 / test (default, 1, 2, macos-11) (8/12)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-09T23:36:57.8137340Z RuntimeError: test_meta failed!
2022-05-09T23:36:57.4096020Z 
2022-05-09T23:36:57.4097320Z FAILED (errors=2, skipped=219, expected failures=17)
2022-05-09T23:36:57.4097500Z 
2022-05-09T23:36:57.4097600Z Generating XML reports...
2022-05-09T23:36:57.4098490Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaCPU-20220509233439.xml
2022-05-09T23:36:57.8136040Z Traceback (most recent call last):
2022-05-09T23:36:57.8136400Z   File "test/run_test.py", line 1072, in <module>
2022-05-09T23:36:57.8136630Z     main()
2022-05-09T23:36:57.8136840Z   File "test/run_test.py", line 1050, in main
2022-05-09T23:36:57.8137100Z     raise RuntimeError(err_message)
2022-05-09T23:36:57.8137340Z RuntimeError: test_meta failed!
2022-05-09T23:36:58.0521130Z + cleanup
2022-05-09T23:36:58.0534320Z + retcode=1
2022-05-09T23:36:58.0534810Z + set +x
2022-05-09T23:36:58.0555040Z ##[error]Process completed with exit code 1.
2022-05-09T23:36:58.0662160Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-09T23:36:58.0662480Z with:
2022-05-09T23:36:58.0663600Z   github-token: ***
2022-05-09T23:36:58.0663830Z env:
2022-05-09T23:36:58.0664000Z   IN_CI: 1
2022-05-09T23:36:58.0664180Z   IS_GHA: 1

See GitHub Actions build trunk / parallelnative-linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge) (9/12)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-09T21:21:00.9032755Z RuntimeError: test_meta failed!
2022-05-09T21:21:00.0960224Z 
2022-05-09T21:21:00.0960337Z FAILED (errors=2, skipped=219, expected failures=17)
2022-05-09T21:21:00.0960486Z 
2022-05-09T21:21:00.0960578Z Generating XML reports...
2022-05-09T21:21:00.5658645Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaCPU-20220509211940.xml
2022-05-09T21:21:00.9028573Z Traceback (most recent call last):
2022-05-09T21:21:00.9028854Z   File "test/run_test.py", line 1072, in <module>
2022-05-09T21:21:00.9030549Z     main()
2022-05-09T21:21:00.9030739Z   File "test/run_test.py", line 1050, in main
2022-05-09T21:21:00.9032537Z     raise RuntimeError(err_message)
2022-05-09T21:21:00.9032755Z RuntimeError: test_meta failed!
2022-05-09T21:21:01.1897762Z + cleanup
2022-05-09T21:21:01.1897993Z + retcode=1
2022-05-09T21:21:01.1898150Z + set +x
2022-05-09T21:21:01.1933773Z ##[error]Process completed with exit code 1.
2022-05-09T21:21:01.2102618Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-09T21:21:01.2102885Z with:
2022-05-09T21:21:01.2103322Z   github-token: ***
2022-05-09T21:21:01.2103493Z env:
2022-05-09T21:21:01.2103635Z   IN_CI: 1
2022-05-09T21:21:01.2103792Z   IS_GHA: 1

See GitHub Actions build trunk / ios-12-5-1-x86-64 / build (10/12)

Step: "Run Simulator Tests" (full log | diagnosis details | 🔁 rerun)

2022-05-09T20:56:23.7477240Z ##[error]Process completed with exit code 1.
2022-05-09T20:56:21.6089130Z [20:56:21]: �[31mExit status: 65�[0m
2022-05-09T20:56:23.7233030Z +--------------------+----+
2022-05-09T20:56:23.7233340Z |      Test Results       |
2022-05-09T20:56:23.7233660Z +--------------------+----+
2022-05-09T20:56:23.7233890Z | Number of tests    | 35 |
2022-05-09T20:56:23.7234250Z | Number of failures | �[31m1�[0m  |
2022-05-09T20:56:23.7234550Z +--------------------+----+
2022-05-09T20:56:23.7234690Z 
2022-05-09T20:56:23.7234830Z �[31m
2022-05-09T20:56:23.7235090Z [!] Tests have failed�[0m
2022-05-09T20:56:23.7477240Z ##[error]Process completed with exit code 1.
2022-05-09T20:56:23.7555170Z Post job cleanup.
2022-05-09T20:56:23.7604040Z Post job cleanup.
2022-05-09T20:56:23.8904550Z [command]/usr/local/bin/git version
2022-05-09T20:56:23.9311220Z git version 2.35.1
2022-05-09T20:56:23.9340870Z [command]/usr/local/bin/git config --local --name-only --get-regexp core\.sshCommand
2022-05-09T20:56:23.9438750Z [command]/usr/local/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :
2022-05-09T20:56:24.0391600Z Entering 'android/libs/fbjni'
2022-05-09T20:56:24.0578890Z Entering 'third_party/FP16'
2022-05-09T20:56:24.0774670Z Entering 'third_party/FXdiv'
2022-05-09T20:56:24.0944460Z Entering 'third_party/NNPACK'

See GitHub Actions build pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (11/12)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-09T21:50:19.1598235Z RuntimeError: test_meta failed!
2022-05-09T21:50:18.7256217Z 
2022-05-09T21:50:18.7256336Z FAILED (errors=2, skipped=223, expected failures=17)
2022-05-09T21:50:18.7256486Z 
2022-05-09T21:50:18.7257961Z Generating XML reports...
2022-05-09T21:50:18.7258485Z Generated XML report: test-reports\python-unittest\test_meta\TEST-TestMetaCPU-20220509214851.xml
2022-05-09T21:50:19.1597043Z Traceback (most recent call last):
2022-05-09T21:50:19.1597377Z   File "run_test.py", line 1072, in <module>
2022-05-09T21:50:19.1597581Z     main()
2022-05-09T21:50:19.1597791Z   File "run_test.py", line 1050, in main
2022-05-09T21:50:19.1598032Z     raise RuntimeError(err_message)
2022-05-09T21:50:19.1598235Z RuntimeError: test_meta failed!
2022-05-09T21:50:19.4635256Z 
2022-05-09T21:50:19.4636048Z (base) C:\actions-runner\_work\pytorch\pytorch\test>if ERRORLEVEL 1 goto fail 
2022-05-09T21:50:19.4637727Z 
2022-05-09T21:50:19.4638110Z (base) C:\actions-runner\_work\pytorch\pytorch\test>exit /b 1 
2022-05-09T21:50:19.4664446Z + cleanup
2022-05-09T21:50:19.4664652Z + retcode=1
2022-05-09T21:50:19.4664796Z + set +x
2022-05-09T21:50:19.4696655Z ##[error]Process completed with exit code 1.
2022-05-09T21:50:19.5106163Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-09T21:50:19.5106480Z with:

See GitHub Actions build pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu) (12/12)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-09T22:41:52.7620693Z RuntimeError: test_meta failed!
2022-05-09T22:41:51.5878347Z 
2022-05-09T22:41:51.5878494Z FAILED (errors=2, skipped=207, expected failures=17)
2022-05-09T22:41:51.5878698Z 
2022-05-09T22:41:51.5878822Z Generating XML reports...
2022-05-09T22:41:52.1331138Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaCUDA-20220509223936.xml
2022-05-09T22:41:52.7612494Z Traceback (most recent call last):
2022-05-09T22:41:52.7613221Z   File "test/run_test.py", line 1072, in <module>
2022-05-09T22:41:52.7616625Z     main()
2022-05-09T22:41:52.7617348Z   File "test/run_test.py", line 1050, in main
2022-05-09T22:41:52.7620108Z     raise RuntimeError(err_message)
2022-05-09T22:41:52.7620693Z RuntimeError: test_meta failed!
2022-05-09T22:41:53.3337969Z + cleanup
2022-05-09T22:41:53.3338413Z + retcode=1
2022-05-09T22:41:53.3338858Z + set +x
2022-05-09T22:41:53.3383756Z ##[error]Process completed with exit code 1.
2022-05-09T22:41:53.3429717Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-09T22:41:53.3430061Z with:
2022-05-09T22:41:53.3430602Z   github-token: ***
2022-05-09T22:41:53.3430846Z env:
2022-05-09T22:41:53.3431043Z   IN_CI: 1
2022-05-09T22:41:53.3431264Z   IS_GHA: 1

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@jeffdaily jeffdaily marked this pull request as ready for review May 9, 2022 15:16
@ezyang
Copy link
Contributor

ezyang commented May 9, 2022

@pytorchbot rebase this

@ezyang
Copy link
Contributor

ezyang commented May 9, 2022

@pytorchbot merge on green

@jeffdaily
Copy link
Collaborator Author

@pytorchbot rebase this please

@jeffdaily
Copy link
Collaborator Author

@pytorchbot merge this please

@github-actions
Copy link
Contributor

Hey @jeffdaily.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot pushed a commit that referenced this pull request May 13, 2022
Summary:
The env var HSA_FORCE_FINE_GRAIN_PCIE=1 enables P2P communication in RCCL without intermediate buffers.  This is necessary on hosts with only PCIe and no P2P high-speed interconnect.

Pull Request resolved: #76985
Approved by: https://github.com/ezyang

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/614e0459215c677bf686680454520dfaa2867359

Reviewed By: malfet

Differential Revision: D36299700

Pulled By: malfet

fbshipit-source-id: 6ba81808ba5f787370805c3f125e66fb0458a261
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed Merged module: rocm AMD GPU support for Pytorch open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants