Skip to content

Conversation

janeyx99
Copy link
Contributor

@janeyx99 janeyx99 commented Feb 8, 2021

Adding 11.2 to CI with BUILD_SPLIT_CUDA enabled.

Disabled the following tests as they were failing in test_optim.py:
test_adadelta
test_adam
test_adamw
test_multi_tensor_optimizers
test_rmsprop

(Issue tracking that is here: #51992)

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Feb 9, 2021

💊 CI failures summary and remediations

As of commit ca64aa4 (more details on the Dr. CI page):


  • 6/6 failures possibly* introduced in this PR
    • 1/6 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build binary_windows_wheel_3_9_cu102_nightly_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

echo CUDA 10.2 installed failed.

C:\w\b>set "PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\libnvvp;C:\Program Files (x86)\Windows Application Driver;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Current\Bin;C:\Program Files (x86)\Microsoft Visual Studio\Installer\;C:\tools\ruby26;C:\tools\ruby26\bin;C:\ProgramData\nvm;C:\tools\miniconda3;C:\tools\miniconda3\Library\mingw-w64\bin;C:\tools\miniconda3\Library\usr\bin;C:\tools\miniconda3\Library\bin;C:\tools\miniconda3\Scripts;C:\miniconda3\miniconda3\condabin;C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\ProgramData\GooGet;C:\Program Files\Google\Compute Engine\metadata_scripts;C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin;C:\Program Files\PowerShell\7\;C:\Program Files\Google\Compute Engine\sysprep;C:\Program Files\Docker;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Git\mingw64\bin;C:\Program Files\Git\usr\bin;C:\Program Files\Git LFS;C:\Program Files\Amazon\AWSCLI\bin\;C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code;C:\Program Files\Microsoft SDKs\Service Fabric\Tools\ServiceFabricLocalClusterManager;C:\Program Files (x86)\vim\vim80;C:\Go\bin;C:\Program Files\OpenJDK\jdk-12.0.2\bin;C:\ProgramData\nvm;C:\Program Files\nodejs;C:\Program Files\dotnet\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\;C:\Program Files (x86)\IncrediBuild;C:\Users\circleci\AppData\Local\Microsoft\WindowsApps" 

C:\w\b>set "CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2" 

C:\w\b>set "CUDA_PATH_V10_2=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2" 

C:\w\b>set "NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt" 

C:\w\b>if not exist "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\nvcc.exe" (
echo CUDA 10.2 installed failed.  
 exit /b 1 
) 

C:\w\b>echo Installing cuDNN... 
Installing cuDNN...

C:\w\b>7z x C:\w\b\windows\internal\\..\temp_build\cudnn-10.2-windows10-x64-v7.6.5.32.zip -o"C:\w\b\windows\internal\\..\temp_build\cudnn" 

7-Zip 19.00 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2019-02-21


4 failures not recognized by patterns:

Job Step Action
CircleCI binary_windows_wheel_3_9_cu112_nightly_build Build 🔁 rerun
CircleCI pytorch_linux_xenial_cuda11_2_cudnn8_py3_gcc7_test Report results 🔁 rerun
CircleCI binary_windows_wheel_3_9_cu101_nightly_build Build 🔁 rerun
CircleCI binary_windows_wheel_3_9_cpu_nightly_build Build 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@janeyx99 janeyx99 force-pushed the ci-all/replace-linux-ci-11.1-with-11.2 branch from 3071d7a to ca64aa4 Compare February 9, 2021 18:07
@janeyx99 janeyx99 requested a review from a team February 9, 2021 20:33
Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. windows CI failures seems irrelevant?

self.assertEqual(p1, p2)


@unittest.skipIf(True, "test does not pass for CUDA 11.2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Link github issue here with comments to #51598?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the link to the issue in the description of this PR. Interestingly, these same tests do now fail for windows

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@samestep
Copy link
Contributor

Blocked by #52054.

facebook-github-bot pushed a commit that referenced this pull request Feb 10, 2021
Summary:
This fixes an issue (currently blocking #51905) where the test time regression reporting step will fail if none of the most recent `master` ancestors have any reports in S3 (e.g. if a new job is added).

Pull Request resolved: #52054

Test Plan:
```
python test/test_testing.py
```

Reviewed By: walterddr

Differential Revision: D26369507

Pulled By: samestep

fbshipit-source-id: 4c4e1e290cb943ce8fcdadacbf51d66b31c3262a
@facebook-github-bot
Copy link
Contributor

@janeyx99 merged this pull request in a1b8f3d.

xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
Summary:
This fixes an issue (currently blocking pytorch#51905) where the test time regression reporting step will fail if none of the most recent `master` ancestors have any reports in S3 (e.g. if a new job is added).

Pull Request resolved: pytorch#52054

Test Plan:
```
python test/test_testing.py
```

Reviewed By: walterddr

Differential Revision: D26369507

Pulled By: samestep

fbshipit-source-id: 4c4e1e290cb943ce8fcdadacbf51d66b31c3262a
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
Summary:
Adding 11.2 to CI with BUILD_SPLIT_CUDA enabled.

Disabled the following tests as they were failing in test_optim.py:
test_adadelta
test_adam
test_adamw
test_multi_tensor_optimizers
test_rmsprop

(Issue tracking that is here: pytorch#51992)

Pull Request resolved: pytorch#51905

Reviewed By: VitalyFedyunin

Differential Revision: D26368575

Pulled By: janeyx99

fbshipit-source-id: 31612c7d04d51afb3f18956e43dc7f7db8a91749
@janeyx99 janeyx99 mentioned this pull request Apr 5, 2021
17 tasks
@github-actions github-actions bot deleted the ci-all/replace-linux-ci-11.1-with-11.2 branch February 10, 2024 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants