Replace CUDA 11.1 Linux CI with CUDA 11.2 #51905

janeyx99 · 2021-02-08T22:04:41Z

Adding 11.2 to CI with BUILD_SPLIT_CUDA enabled.

Disabled the following tests as they were failing in test_optim.py:
test_adadelta
test_adam
test_adamw
test_multi_tensor_optimizers
test_rmsprop

(Issue tracking that is here: #51992)

facebook-github-bot · 2021-02-09T01:20:05Z

💊 CI failures summary and remediations

As of commit ca64aa4 (more details on the Dr. CI page):

6/6 failures possibly* introduced in this PR
- 1/6 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

binary_windows_wheel_3_9_cu102_nightly_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

echo CUDA 10.2 installed failed.


C:\w\b>set "PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\libnvvp;C:\Program Files (x86)\Windows Application Driver;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Current\Bin;C:\Program Files (x86)\Microsoft Visual Studio\Installer\;C:\tools\ruby26;C:\tools\ruby26\bin;C:\ProgramData\nvm;C:\tools\miniconda3;C:\tools\miniconda3\Library\mingw-w64\bin;C:\tools\miniconda3\Library\usr\bin;C:\tools\miniconda3\Library\bin;C:\tools\miniconda3\Scripts;C:\miniconda3\miniconda3\condabin;C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\ProgramData\GooGet;C:\Program Files\Google\Compute Engine\metadata_scripts;C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin;C:\Program Files\PowerShell\7\;C:\Program Files\Google\Compute Engine\sysprep;C:\Program Files\Docker;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Git\mingw64\bin;C:\Program Files\Git\usr\bin;C:\Program Files\Git LFS;C:\Program Files\Amazon\AWSCLI\bin\;C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code;C:\Program Files\Microsoft SDKs\Service Fabric\Tools\ServiceFabricLocalClusterManager;C:\Program Files (x86)\vim\vim80;C:\Go\bin;C:\Program Files\OpenJDK\jdk-12.0.2\bin;C:\ProgramData\nvm;C:\Program Files\nodejs;C:\Program Files\dotnet\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\;C:\Program Files (x86)\IncrediBuild;C:\Users\circleci\AppData\Local\Microsoft\WindowsApps" 

C:\w\b>set "CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2" 

C:\w\b>set "CUDA_PATH_V10_2=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2" 

C:\w\b>set "NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt" 

C:\w\b>if not exist "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\nvcc.exe" (
echo CUDA 10.2 installed failed.  
 exit /b 1 
) 

C:\w\b>echo Installing cuDNN... 
Installing cuDNN...

C:\w\b>7z x C:\w\b\windows\internal\\..\temp_build\cudnn-10.2-windows10-x64-v7.6.5.32.zip -o"C:\w\b\windows\internal\\..\temp_build\cudnn" 

7-Zip 19.00 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2019-02-21

4 failures not recognized by patterns:

Job	Step	Action
^{binary_windows_wheel_3_9_cu112_nightly_build}	^Build	🔁 rerun
^{pytorch_linux_xenial_cuda11_2_cudnn8_py3_gcc7_test}	^{Report results}	🔁 rerun
^{binary_windows_wheel_3_9_cu101_nightly_build}	^Build	🔁 rerun
^{binary_windows_wheel_3_9_cpu_nightly_build}	^Build	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

walterddr

lgtm. windows CI failures seems irrelevant?

walterddr · 2021-02-10T00:49:17Z

test/test_optim.py

                self.assertEqual(p1, p2)

-
+    @unittest.skipIf(True, "test does not pass for CUDA 11.2")


nit: Link github issue here with comments to #51598?

I added the link to the issue in the description of this PR. Interestingly, these same tests do now fail for windows

facebook-github-bot

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

samestep · 2021-02-10T16:10:37Z

Blocked by #52054.

Summary: This fixes an issue (currently blocking #51905) where the test time regression reporting step will fail if none of the most recent `master` ancestors have any reports in S3 (e.g. if a new job is added). Pull Request resolved: #52054 Test Plan: ``` python test/test_testing.py ``` Reviewed By: walterddr Differential Revision: D26369507 Pulled By: samestep fbshipit-source-id: 4c4e1e290cb943ce8fcdadacbf51d66b31c3262a

facebook-github-bot · 2021-02-12T02:00:50Z

@janeyx99 merged this pull request in a1b8f3d.

Summary: This fixes an issue (currently blocking pytorch#51905) where the test time regression reporting step will fail if none of the most recent `master` ancestors have any reports in S3 (e.g. if a new job is added). Pull Request resolved: pytorch#52054 Test Plan: ``` python test/test_testing.py ``` Reviewed By: walterddr Differential Revision: D26369507 Pulled By: samestep fbshipit-source-id: 4c4e1e290cb943ce8fcdadacbf51d66b31c3262a

Summary: Adding 11.2 to CI with BUILD_SPLIT_CUDA enabled. Disabled the following tests as they were failing in test_optim.py: test_adadelta test_adam test_adamw test_multi_tensor_optimizers test_rmsprop (Issue tracking that is here: pytorch#51992) Pull Request resolved: pytorch#51905 Reviewed By: VitalyFedyunin Differential Revision: D26368575 Pulled By: janeyx99 fbshipit-source-id: 31612c7d04d51afb3f18956e43dc7f7db8a91749

janeyx99 mentioned this pull request Feb 8, 2021

Replace CUDA 11.1 Linux CI with CUDA 11.2 #51888

Closed

janeyx99 added the ci/all label Feb 8, 2021

janeyx99 force-pushed the ci-all/replace-linux-ci-11.1-with-11.2 branch from ac30156 to e633c90 Compare February 8, 2021 22:20

facebook-github-bot added the cla signed label Feb 8, 2021

janeyx99 added 2 commits February 9, 2021 10:06

Replace 11.1 Linux CI with CUDA 11.2

2d8eacc

skip failing tests in test_optim.py for 11.2

ca64aa4

janeyx99 force-pushed the ci-all/replace-linux-ci-11.1-with-11.2 branch from 3071d7a to ca64aa4 Compare February 9, 2021 18:07

janeyx99 requested a review from a team February 9, 2021 20:33

walterddr approved these changes Feb 10, 2021

View reviewed changes

facebook-github-bot reviewed Feb 10, 2021

View reviewed changes

samestep mentioned this pull request Feb 10, 2021

Fix test time history report if no ancestor report #52054

Closed

facebook-github-bot closed this in a1b8f3d Feb 10, 2021

facebook-github-bot added the Merged label Feb 12, 2021

janeyx99 mentioned this pull request Apr 5, 2021

Support CUDA 11.2 #50232

Closed

17 tasks

Flamefire mentioned this pull request Apr 28, 2021

Add patches for PyTorch 1.7.1 avoiding failures on POWER and A100 easybuilders/easybuild-easyconfigs#12753

Merged

github-actions bot deleted the ci-all/replace-linux-ci-11.1-with-11.2 branch February 10, 2024 01:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace CUDA 11.1 Linux CI with CUDA 11.2 #51905

Replace CUDA 11.1 Linux CI with CUDA 11.2 #51905

Uh oh!

janeyx99 commented Feb 8, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Feb 9, 2021 •

edited

Loading

Uh oh!

walterddr left a comment

Uh oh!

walterddr Feb 10, 2021

Uh oh!

janeyx99 Feb 10, 2021

Uh oh!

facebook-github-bot left a comment

Uh oh!

samestep commented Feb 10, 2021

Uh oh!

facebook-github-bot commented Feb 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		self.assertEqual(p1, p2)


		@unittest.skipIf(True, "test does not pass for CUDA 11.2")

Replace CUDA 11.1 Linux CI with CUDA 11.2 #51905

Replace CUDA 11.1 Linux CI with CUDA 11.2 #51905

Uh oh!

Conversation

janeyx99 commented Feb 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Feb 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

binary_windows_wheel_3_9_cu102_nightly_build (1/1)

4 failures not recognized by patterns:

Uh oh!

walterddr left a comment

Choose a reason for hiding this comment

Uh oh!

walterddr Feb 10, 2021

Choose a reason for hiding this comment

Uh oh!

janeyx99 Feb 10, 2021

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

samestep commented Feb 10, 2021

Uh oh!

facebook-github-bot commented Feb 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

janeyx99 commented Feb 8, 2021 •

edited

Loading

facebook-github-bot commented Feb 9, 2021 •

edited

Loading