slow_conv3d grad_weight: call gemm directly #65759

peterbell10 · 2021-09-28T18:26:32Z

Stack from ghstack:

Differential Revision: D31257873

[ghstack-poisoned]

pytorch-probot · 2021-09-28T18:26:35Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/5785ca055af445164733c823de147f057093d822/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/xla`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/win`	✅ triggered
Skipped Workflows
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	🚫 skipped
linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
puretorch-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2021-09-28T18:26:38Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/65759
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 5785ca0 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

win-vs2019-cuda11.3-py3 / build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

2021-10-05T03:33:31.1048787Z FAILED: bin/ProcessGroupGlooAsyncTest.exe

2021-10-05T03:33:30.5815843Z 
2021-10-05T03:33:30.6243324Z [6162/6338] C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\bin\sccache-cl.exe   /TP -DIDEEP_USE_MKL -DMAGMA_V2 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DUSE_C10D_GLOO -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_OPENMP_NOFORCE_MANIFEST -IC:\actions-runner\_work\pytorch\pytorch\build\aten\src -IC:\actions-runner\_work\pytorch\pytorch\aten\src -IC:\actions-runner\_work\pytorch\pytorch\build -IC:\actions-runner\_work\pytorch\pytorch -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\benchmark\include -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\cudnn_frontend\include -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\contrib\aten -IC:\actions-runner\_work\pytorch\pytorch\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\..\aten\src -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\..\aten\src\ATen -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api\include -IC:\actions-runner\_work\pytorch\pytorch\c10\.. -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\ideep\mkl-dnn\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\ideep\mkl-dnn\src\..\include -IC:\actions-runner\_work\pytorch\pytorch\c10\cuda\..\.. -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\gloo -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\gloo -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\googletest\googlemock\include -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\googletest\googletest\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\protobuf\src -IC:\actions-runner\_work\pytorch\pytorch\build\win_tmp\mkl\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\XNNPACK\include -IC:\actions-runner\_work\pytorch\pytorch\third_party -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\eigen -IC:\Jenkins\Miniconda3\include -IC:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\pybind11\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\include" -IC:\actions-runner\_work\pytorch\pytorch\build\win_tmp\magma\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\ideep\mkl-dnn\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\ideep\include -I"C:\Program Files\NVIDIA Corporation\NvToolsExt\include" -IC:\actions-runner\_work\pytorch\pytorch\third_party\googletest\googletest\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\googletest\googletest /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/actions-runner/_work/pytorch/pytorch/build/win_tmp/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -std:c++14 /showIncludes /Fotest_api\CMakeFiles\test_api.dir\tensor_options_cuda.cpp.obj /Fdtest_api\CMakeFiles\test_api.dir\ /FS -c C:\actions-runner\_work\pytorch\pytorch\test\cpp\api\tensor_options_cuda.cpp
2021-10-05T03:33:30.6256403Z Microsoft (R) C/C++ Optimizing Compiler Version 19.28.29337 for x64
2021-10-05T03:33:30.6256975Z Copyright (C) Microsoft Corporation.  All rights reserved.
2021-10-05T03:33:30.6257319Z 
2021-10-05T03:33:30.7764539Z [6163/6338] C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\bin\sccache-cl.exe   /TP -DIDEEP_USE_MKL -DMAGMA_V2 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DUSE_C10D_GLOO -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_OPENMP_NOFORCE_MANIFEST -IC:\actions-runner\_work\pytorch\pytorch\build\aten\src -IC:\actions-runner\_work\pytorch\pytorch\aten\src -IC:\actions-runner\_work\pytorch\pytorch\build -IC:\actions-runner\_work\pytorch\pytorch -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\benchmark\include -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\cudnn_frontend\include -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\contrib\aten -IC:\actions-runner\_work\pytorch\pytorch\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\..\aten\src -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\..\aten\src\ATen -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api\include -IC:\actions-runner\_work\pytorch\pytorch\c10\.. -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\ideep\mkl-dnn\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\ideep\mkl-dnn\src\..\include -IC:\actions-runner\_work\pytorch\pytorch\c10\cuda\..\.. -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\gloo -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\gloo -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\googletest\googlemock\include -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\googletest\googletest\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\protobuf\src -IC:\actions-runner\_work\pytorch\pytorch\build\win_tmp\mkl\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\XNNPACK\include -IC:\actions-runner\_work\pytorch\pytorch\third_party -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\eigen -IC:\Jenkins\Miniconda3\include -IC:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\pybind11\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\include" -IC:\actions-runner\_work\pytorch\pytorch\build\win_tmp\magma\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\ideep\mkl-dnn\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\ideep\include -I"C:\Program Files\NVIDIA Corporation\NvToolsExt\include" -IC:\actions-runner\_work\pytorch\pytorch\third_party\googletest\googletest\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\googletest\googletest /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/actions-runner/_work/pytorch/pytorch/build/win_tmp/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -std:c++14 /showIncludes /Fotest_api\CMakeFiles\test_api.dir\tensor_options.cpp.obj /Fdtest_api\CMakeFiles\test_api.dir\ /FS -c C:\actions-runner\_work\pytorch\pytorch\test\cpp\api\tensor_options.cpp
2021-10-05T03:33:30.7775465Z Microsoft (R) C/C++ Optimizing Compiler Version 19.28.29337 for x64
2021-10-05T03:33:30.7776038Z Copyright (C) Microsoft Corporation.  All rights reserved.
2021-10-05T03:33:30.7776382Z 
2021-10-05T03:33:31.1043323Z [6164/6338] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=test_cpp_c10d\CMakeFiles\ProcessGroupGlooAsyncTest.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  test_cpp_c10d\CMakeFiles\ProcessGroupGlooAsyncTest.dir\ProcessGroupGlooAsyncTest.cpp.obj  /out:bin\ProcessGroupGlooAsyncTest.exe /implib:lib\ProcessGroupGlooAsyncTest.lib /pdb:bin\ProcessGroupGlooAsyncTest.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10d_cuda_test.lib  lib\gtest_main.lib  lib\torch_cuda.lib  lib\torch_cuda_cu.lib  lib\torch_cuda_cpp.lib  -INCLUDE:?warp_size@cuda@at@@YAHXZ  lib\c10_cuda.lib  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\cudart_static.lib"  "C:\Program Files\NVIDIA Corporation\NvToolsExt\lib\x64\nvToolsExt64_1.lib"  lib\torch_cpu.lib  lib\libprotobuf.lib  lib\c10.lib  win_tmp\mkl\lib\mkl_intel_lp64.lib  win_tmp\mkl\lib\mkl_intel_thread.lib  win_tmp\mkl\lib\mkl_core.lib  win_tmp\mkl\lib\libiomp5md.lib  lib\dnnl.lib  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\cufft.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\curand.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\cublas.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\cudnn.lib"  -INCLUDE:?searchsorted_cuda@native@at@@YA?AVTensor@2@AEBV32@0_N1@Z  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2021-10-05T03:33:31.1048787Z FAILED: bin/ProcessGroupGlooAsyncTest.exe 
2021-10-05T03:33:31.1054315Z cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=test_cpp_c10d\CMakeFiles\ProcessGroupGlooAsyncTest.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  test_cpp_c10d\CMakeFiles\ProcessGroupGlooAsyncTest.dir\ProcessGroupGlooAsyncTest.cpp.obj  /out:bin\ProcessGroupGlooAsyncTest.exe /implib:lib\ProcessGroupGlooAsyncTest.lib /pdb:bin\ProcessGroupGlooAsyncTest.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10d_cuda_test.lib  lib\gtest_main.lib  lib\torch_cuda.lib  lib\torch_cuda_cu.lib  lib\torch_cuda_cpp.lib  -INCLUDE:?warp_size@cuda@at@@YAHXZ  lib\c10_cuda.lib  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\cudart_static.lib"  "C:\Program Files\NVIDIA Corporation\NvToolsExt\lib\x64\nvToolsExt64_1.lib"  lib\torch_cpu.lib  lib\libprotobuf.lib  lib\c10.lib  win_tmp\mkl\lib\mkl_intel_lp64.lib  win_tmp\mkl\lib\mkl_intel_thread.lib  win_tmp\mkl\lib\mkl_core.lib  win_tmp\mkl\lib\libiomp5md.lib  lib\dnnl.lib  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\cufft.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\curand.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\cublas.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\cudnn.lib"  -INCLUDE:?searchsorted_cuda@native@at@@YA?AVTensor@2@AEBV32@0_N1@Z  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2021-10-05T03:33:31.1060536Z MT: command "C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe /nologo /manifest bin\ProcessGroupGlooAsyncTest.exe.manifest /outputresource:bin\ProcessGroupGlooAsyncTest.exe;#1" failed (exit code 0x1f) with the following output:
2021-10-05T03:33:31.1061611Z 
2021-10-05T03:33:31.1062348Z mt.exe : general error c101008d: Failed to write the updated manifest to the resource of file "bin\ProcessGroupGlooAsyncTest.exe". Access is denied.
2021-10-05T03:33:31.1063005Z 
2021-10-05T03:33:31.4112647Z [6165/6338] C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\bin\sccache-cl.exe   /TP -DIDEEP_USE_MKL -DMAGMA_V2 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DUSE_C10D_GLOO -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_OPENMP_NOFORCE_MANIFEST -IC:\actions-runner\_work\pytorch\pytorch\build\aten\src -IC:\actions-runner\_work\pytorch\pytorch\aten\src -IC:\actions-runner\_work\pytorch\pytorch\build -IC:\actions-runner\_work\pytorch\pytorch -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\benchmark\include -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\cudnn_frontend\include -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\contrib\aten -IC:\actions-runner\_work\pytorch\pytorch\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\..\aten\src -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\..\aten\src\ATen -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api\include -IC:\actions-runner\_work\pytorch\pytorch\c10\.. -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\ideep\mkl-dnn\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\ideep\mkl-dnn\src\..\include -IC:\actions-runner\_work\pytorch\pytorch\c10\cuda\..\.. -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\gloo -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\gloo -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\googletest\googlemock\include -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\googletest\googletest\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\protobuf\src -IC:\actions-runner\_work\pytorch\pytorch\build\win_tmp\mkl\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\XNNPACK\include -IC:\actions-runner\_work\pytorch\pytorch\third_party -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\eigen -IC:\Jenkins\Miniconda3\include -IC:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\pybind11\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\include" -IC:\actions-runner\_work\pytorch\pytorch\build\win_tmp\magma\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\ideep\mkl-dnn\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\ideep\include -I"C:\Program Files\NVIDIA Corporation\NvToolsExt\include" -IC:\actions-runner\_work\pytorch\pytorch\third_party\googletest\googletest\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\googletest\googletest /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/actions-runner/_work/pytorch/pytorch/build/win_tmp/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -std:c++14 /showIncludes /Fotest_api\CMakeFiles\test_api.dir\tensor_indexing.cpp.obj /Fdtest_api\CMakeFiles\test_api.dir\ /FS -c C:\actions-runner\_work\pytorch\pytorch\test\cpp\api\tensor_indexing.cpp
2021-10-05T03:33:31.4123645Z Microsoft (R) C/C++ Optimizing Compiler Version 19.28.29337 for x64
2021-10-05T03:33:31.4124220Z Copyright (C) Microsoft Corporation.  All rights reserved.
2021-10-05T03:33:31.4124565Z 
2021-10-05T03:33:31.7499039Z [6166/6338] C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\bin\sccache-cl.exe   /TP -DIDEEP_USE_MKL -DMAGMA_V2 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DUSE_C10D_GLOO -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_OPENMP_NOFORCE_MANIFEST -IC:\actions-runner\_work\pytorch\pytorch\build\aten\src -IC:\actions-runner\_work\pytorch\pytorch\aten\src -IC:\actions-runner\_work\pytorch\pytorch\build -IC:\actions-runner\_work\pytorch\pytorch -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\benchmark\include -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\cudnn_frontend\include -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\contrib\aten -IC:\actions-runner\_work\pytorch\pytorch\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\..\aten\src -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\..\aten\src\ATen -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api\include -IC:\actions-runner\_work\pytorch\pytorch\c10\.. -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\ideep\mkl-dnn\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\ideep\mkl-dnn\src\..\include -IC:\actions-runner\_work\pytorch\pytorch\c10\cuda\..\.. -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\gloo -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\gloo -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\googletest\googlemock\include -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\googletest\googletest\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\protobuf\src -IC:\actions-runner\_work\pytorch\pytorch\build\win_tmp\mkl\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\XNNPACK\include -IC:\actions-runner\_work\pytorch\pytorch\third_party -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\eigen -IC:\Jenkins\Miniconda3\include -IC:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\pybind11\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\include" -IC:\actions-runner\_work\pytorch\pytorch\build\win_tmp\magma\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\ideep\mkl-dnn\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\ideep\include -I"C:\Program Files\NVIDIA Corporation\NvToolsExt\include" -IC:\actions-runner\_work\pytorch\pytorch\third_party\googletest\googletest\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\googletest\googletest /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/actions-runner/_work/pytorch/pytorch/build/win_tmp/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -std:c++14 /showIncludes /Fotest_api\CMakeFiles\test_api.dir\tensor.cpp.obj /Fdtest_api\CMakeFiles\test_api.dir\ /FS -c C:\actions-runner\_work\pytorch\pytorch\test\cpp\api\tensor.cpp

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ghstack-source-id: 008191d Pull Request resolved: #65759

[ghstack-poisoned]

ghstack-source-id: 9a96825 Pull Request resolved: #65759

ngimel · 2021-09-29T01:38:10Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31257873](https://our.internmc.facebook.com/intern/diff/D31257873) [ghstack-poisoned]

ghstack-source-id: 4f0fa6e Pull Request resolved: #65759

ngimel · 2021-09-30T17:10:05Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31257873](https://our.internmc.facebook.com/intern/diff/D31257873) [ghstack-poisoned]

ghstack-source-id: f5e9255 Pull Request resolved: #65759

Differential Revision: [D31257873](https://our.internmc.facebook.com/intern/diff/D31257873) [ghstack-poisoned]

ghstack-source-id: d29c103 Pull Request resolved: #65759

ngimel · 2021-10-04T22:23:28Z

aten/src/ATen/native/ConvolutionMM3d.cpp

+  const int64_t ldc = m;
+
+  for (int64_t group = 0; group < groups; ++group) {
+    at::native::cpublas::gemm(


same here, we probably need bmm path.

Differential Revision: [D31257873](https://our.internmc.facebook.com/intern/diff/D31257873) [ghstack-poisoned]

ghstack-source-id: d3a2887 Pull Request resolved: #65759

ngimel · 2021-10-05T03:12:33Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

slow_conv3d grad_weight: call gemm directly

ebab266

[ghstack-poisoned]

pytorch-probot bot added the ciflow/default label Sep 28, 2021

peterbell10 mentioned this pull request Sep 28, 2021

slow_conv2d_forward: avoid calling dispatcher in parallel region #65724

Closed

This was referenced Sep 28, 2021

slow_conv2d grad_input: avoid dispatch in parallel region #65725

Closed

slow_conv2d grad_weight: call gemm directly #65726

Closed

This was referenced Sep 28, 2021

slow_conv3d: Avoid dispatch in parallel region #65737

Closed

slow_conv3d grad_input: Avoid dispatch in parallel region #65757

Closed

slow_conv3d: Use at::sum for grad_bias accumulation #65758

Closed

peterbell10 added a commit that referenced this pull request Sep 28, 2021

slow_conv3d grad_weight: call gemm directly

a55e345

ghstack-source-id: 008191d Pull Request resolved: #65759

pytorchbot added the open source label Sep 28, 2021

facebook-github-bot added the cla signed label Sep 28, 2021

Update on "slow_conv3d grad_weight: call gemm directly"

75eda3f

[ghstack-poisoned]

peterbell10 added a commit that referenced this pull request Sep 28, 2021

slow_conv3d grad_weight: call gemm directly

2040c85

ghstack-source-id: 9a96825 Pull Request resolved: #65759

Update on "slow_conv3d grad_weight: call gemm directly"

463dc7b

Differential Revision: [D31257873](https://our.internmc.facebook.com/intern/diff/D31257873) [ghstack-poisoned]

peterbell10 added a commit that referenced this pull request Sep 29, 2021

slow_conv3d grad_weight: call gemm directly

87ed3d0

ghstack-source-id: 4f0fa6e Pull Request resolved: #65759

Update on "slow_conv3d grad_weight: call gemm directly"

7114c81

Differential Revision: [D31257873](https://our.internmc.facebook.com/intern/diff/D31257873) [ghstack-poisoned]

peterbell10 added a commit that referenced this pull request Oct 2, 2021

slow_conv3d grad_weight: call gemm directly

3ba2c85

ghstack-source-id: f5e9255 Pull Request resolved: #65759

Update on "slow_conv3d grad_weight: call gemm directly"

3fe85eb

Differential Revision: [D31257873](https://our.internmc.facebook.com/intern/diff/D31257873) [ghstack-poisoned]

peterbell10 added a commit that referenced this pull request Oct 2, 2021

slow_conv3d grad_weight: call gemm directly

ffc3d09

ghstack-source-id: d29c103 Pull Request resolved: #65759

ngimel reviewed Oct 4, 2021

View reviewed changes

Update on "slow_conv3d grad_weight: call gemm directly"

42f26da

Differential Revision: [D31257873](https://our.internmc.facebook.com/intern/diff/D31257873) [ghstack-poisoned]

Update on "slow_conv3d grad_weight: call gemm directly"

5785ca0

Differential Revision: [D31257873](https://our.internmc.facebook.com/intern/diff/D31257873) [ghstack-poisoned]

peterbell10 added a commit that referenced this pull request Oct 5, 2021

slow_conv3d grad_weight: call gemm directly

5173ea1

ghstack-source-id: d3a2887 Pull Request resolved: #65759

peterbell10 mentioned this pull request Oct 5, 2021

Replace _baddbmm_mkl_ with cpublas::gemm_batched #66165

Closed

ngimel approved these changes Oct 7, 2021

View reviewed changes

facebook-github-bot closed this in 0020a15 Oct 8, 2021

facebook-github-bot deleted the gh/peterbell10/156/head branch October 12, 2021 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

slow_conv3d grad_weight: call gemm directly #65759

slow_conv3d grad_weight: call gemm directly #65759

Uh oh!

peterbell10 commented Sep 28, 2021 •

edited

Loading

Uh oh!

pytorch-probot bot commented Sep 28, 2021 •

edited

Loading

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Sep 28, 2021 •

edited

Loading

Uh oh!

ngimel commented Sep 29, 2021

Uh oh!

ngimel commented Sep 30, 2021

Uh oh!

ngimel Oct 4, 2021

Uh oh!

ngimel commented Oct 5, 2021

Uh oh!

Uh oh!

slow_conv3d grad_weight: call gemm directly #65759

slow_conv3d grad_weight: call gemm directly #65759

Uh oh!

Conversation

peterbell10 commented Sep 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-probot bot commented Sep 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Sep 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

win-vs2019-cuda11.3-py3 / build (1/1)

Uh oh!

ngimel commented Sep 29, 2021

Uh oh!

ngimel commented Sep 30, 2021

Uh oh!

ngimel Oct 4, 2021

Choose a reason for hiding this comment

Uh oh!

ngimel commented Oct 5, 2021

Uh oh!

Uh oh!

peterbell10 commented Sep 28, 2021 •

edited

Loading

pytorch-probot bot commented Sep 28, 2021 •

edited

Loading

facebook-github-bot commented Sep 28, 2021 •

edited

Loading