Skip to content

Conversation

@dzdang
Copy link
Contributor

@dzdang dzdang commented Apr 17, 2022

Stack from ghstack (oldest at bottom):

Summary:
The previous implementation for broadcasted_bias in quantized cudnn
linear op has 2 issues.

  1. broadcasted_bias is a view of the the input bias tensor. This is not
    desired as any modifications to broadcasted_bias is also done to the
    input bias. To remedy this, we clone the input bias tensor.

  2. Calling broadcast_to doesn't affect the storage, which is problematic
    for the cudnn operations. We need to create a fully broadcasted tensor,
    rather than a view (which is what's returned by broadcast_to). To remedy
    this, we call contiguous().

Test plan:
python test/test_quantization.py -k test_linear_cudnn

Differential Revision: D35717355

…ed_bias tensor in quantized cudnn linear op

Summary:
The previous implementation for broadcasted_bias in quantized cudnn
linear op has 2 issues.
1) broadcasted_bias is a view of the the input bias tensor. This is not
desired as any modifications to broadcasted_bias is also done to the
input bias. To remedy this, we clone the input bias tensor.

2) Calling broadcast_to doesn't affect the storage, which is problematic
for the cudnn operations. We need to create a fully broadcasted tensor,
rather than a view (which is what's returned by broadcast_to). To remedy
this, we call contiguous().

Test plan:
python test/test_quantization.py -k test_linear_cudnn

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Apr 17, 2022
…ed_bias tensor in quantized cudnn linear op

Summary:
The previous implementation for broadcasted_bias in quantized cudnn
linear op has 2 issues.
1) broadcasted_bias is a view of the the input bias tensor. This is not
desired as any modifications to broadcasted_bias is also done to the
input bias. To remedy this, we clone the input bias tensor.

2) Calling broadcast_to doesn't affect the storage, which is problematic
for the cudnn operations. We need to create a fully broadcasted tensor,
rather than a view (which is what's returned by broadcast_to). To remedy
this, we call contiguous().

Test plan:
python test/test_quantization.py -k test_linear_cudnn

ghstack-source-id: 22cabed
Pull Request resolved: #75944
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 17, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 18948a3 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / win-vs2019-cuda11.3-py3 / build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

2022-04-18T13:42:58.4175014Z mt.exe : general e..."bin\c10_Synchronized_test.exe". Access is denied.
2022-04-18T13:42:57.6344824Z [4450/6284] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_C++17_test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_C++17_test.dir\util\C++17_test.cpp.obj  /out:bin\c10_C++17_test.exe /implib:lib\c10_C++17_test.lib /pdb:bin\c10_C++17_test.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:57.6498306Z [4451/6284] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_Bitset_test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_Bitset_test.dir\util\Bitset_test.cpp.obj  /out:bin\c10_Bitset_test.exe /implib:lib\c10_Bitset_test.lib /pdb:bin\c10_Bitset_test.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:58.1443942Z [4452/6284] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_ConstexprCrc_test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_ConstexprCrc_test.dir\util\ConstexprCrc_test.cpp.obj  /out:bin\c10_ConstexprCrc_test.exe /implib:lib\c10_ConstexprCrc_test.lib /pdb:bin\c10_ConstexprCrc_test.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:58.2170550Z [4453/6284] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_LeftRight_test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_LeftRight_test.dir\util\LeftRight_test.cpp.obj  /out:bin\c10_LeftRight_test.exe /implib:lib\c10_LeftRight_test.lib /pdb:bin\c10_LeftRight_test.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:58.2400366Z [4454/6284] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_Half_test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_Half_test.dir\util\Half_test.cpp.obj  /out:bin\c10_Half_test.exe /implib:lib\c10_Half_test.lib /pdb:bin\c10_Half_test.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:58.4170720Z [4455/6284] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_Synchronized_test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_Synchronized_test.dir\util\Synchronized_test.cpp.obj  /out:bin\c10_Synchronized_test.exe /implib:lib\c10_Synchronized_test.lib /pdb:bin\c10_Synchronized_test.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:58.4171855Z FAILED: bin/c10_Synchronized_test.exe 
2022-04-18T13:42:58.4173071Z cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_Synchronized_test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_Synchronized_test.dir\util\Synchronized_test.cpp.obj  /out:bin\c10_Synchronized_test.exe /implib:lib\c10_Synchronized_test.lib /pdb:bin\c10_Synchronized_test.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:58.4174386Z MT: command "C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe /nologo /manifest bin\c10_Synchronized_test.exe.manifest /outputresource:bin\c10_Synchronized_test.exe;#1" failed (exit code 0x1f) with the following output:
2022-04-18T13:42:58.4174727Z 
2022-04-18T13:42:58.4175014Z mt.exe : general error c101008d: Failed to write the updated manifest to the resource of file "bin\c10_Synchronized_test.exe". Access is denied.
2022-04-18T13:42:58.4175334Z 
2022-04-18T13:42:58.4182019Z [4456/6284] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_SmallVectorTest.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_SmallVectorTest.dir\util\SmallVectorTest.cpp.obj  /out:bin\c10_SmallVectorTest.exe /implib:lib\c10_SmallVectorTest.lib /pdb:bin\c10_SmallVectorTest.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:58.5618593Z [4457/6284] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_Metaprogramming_test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_Metaprogramming_test.dir\util\Metaprogramming_test.cpp.obj  /out:bin\c10_Metaprogramming_test.exe /implib:lib\c10_Metaprogramming_test.lib /pdb:bin\c10_Metaprogramming_test.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:58.5899190Z [4458/6284] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_ThreadLocal_test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_ThreadLocal_test.dir\util\ThreadLocal_test.cpp.obj  /out:bin\c10_ThreadLocal_test.exe /implib:lib\c10_ThreadLocal_test.lib /pdb:bin\c10_ThreadLocal_test.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:58.5912139Z [4459/6284] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_TypeIndex_test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_TypeIndex_test.dir\util\TypeIndex_test.cpp.obj  /out:bin\c10_TypeIndex_test.exe /implib:lib\c10_TypeIndex_test.lib /pdb:bin\c10_TypeIndex_test.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:58.7172929Z [4460/6284] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_TypeList_test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_TypeList_test.dir\util\TypeList_test.cpp.obj  /out:bin\c10_TypeList_test.exe /implib:lib\c10_TypeList_test.lib /pdb:bin\c10_TypeList_test.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:58.7305257Z [4461/6284] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_TypeTraits_test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_TypeTraits_test.dir\util\TypeTraits_test.cpp.obj  /out:bin\c10_TypeTraits_test.exe /implib:lib\c10_TypeTraits_test.lib /pdb:bin\c10_TypeTraits_test.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:58.7918556Z [4462/6284] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=c10\test\CMakeFiles\c10_accumulate_test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  c10\test\CMakeFiles\c10_accumulate_test.dir\util\accumulate_test.cpp.obj  /out:bin\c10_accumulate_test.exe /implib:lib\c10_accumulate_test.lib /pdb:bin\c10_accumulate_test.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /subsystem:console  lib\c10.lib  lib\gmock.lib  lib\gtest.lib  lib\gtest_main.lib  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2022-04-18T13:42:58.7919813Z ninja: build stopped: subcommand failed.
2022-04-18T13:42:58.8094348Z -- Building version 1.12.0a0+git18948a3

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…o broadcasted_bias tensor in quantized cudnn linear op"

Summary:
The previous implementation for broadcasted_bias in quantized cudnn
linear op has 2 issues.
1) broadcasted_bias is a view of the the input bias tensor. This is not
desired as any modifications to broadcasted_bias is also done to the
input bias. To remedy this, we clone the input bias tensor.

2) Calling broadcast_to doesn't affect the storage, which is problematic
for the cudnn operations. We need to create a fully broadcasted tensor,
rather than a view (which is what's returned by broadcast_to). To remedy
this, we call contiguous().

Test plan:
python test/test_quantization.py -k test_linear_cudnn

[ghstack-poisoned]
…o broadcasted_bias tensor in quantized cudnn linear op"

Summary:
The previous implementation for broadcasted_bias in quantized cudnn
linear op has 2 issues.
1) broadcasted_bias is a view of the the input bias tensor. This is not
desired as any modifications to broadcasted_bias is also done to the
input bias. To remedy this, we clone the input bias tensor.

2) Calling broadcast_to doesn't affect the storage, which is problematic
for the cudnn operations. We need to create a fully broadcasted tensor,
rather than a view (which is what's returned by broadcast_to). To remedy
this, we call contiguous().

Test plan:
python test/test_quantization.py -k test_linear_cudnn

[ghstack-poisoned]
@dzdang dzdang added release notes: quantization release notes category topic: improvements topic category labels Apr 18, 2022
@dzdang dzdang requested a review from jerryzh168 April 18, 2022 13:27
@dzdang
Copy link
Contributor Author

dzdang commented Apr 18, 2022

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broadcast_to seems to be the same as expand, https://pytorch.org/docs/stable/generated/torch.broadcast_to.html?highlight=broadcast_to#torch.broadcast_to, I feel expand might be slightly more popular than broadcast_to, maybe we can use that

@dzdang
Copy link
Contributor Author

dzdang commented Apr 18, 2022

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge this

(Initiating merge automatically since Phabricator Diff has merged)

facebook-github-bot pushed a commit that referenced this pull request Apr 18, 2022
…ed_bias tensor in quantized cudnn linear op (#75944)

Summary:
Pull Request resolved: #75944

The previous implementation for broadcasted_bias in quantized cudnn
linear op has 2 issues.
1) broadcasted_bias is a view of the the input bias tensor. This is not
desired as any modifications to broadcasted_bias is also done to the
input bias. To remedy this, we clone the input bias tensor.

2) Calling broadcast_to doesn't affect the storage, which is problematic
for the cudnn operations. We need to create a fully broadcasted tensor,
rather than a view (which is what's returned by broadcast_to). To remedy
this, we call contiguous().

(Note: this ignores all push blocking failures!)

Test Plan: python test/test_quantization.py -k test_linear_cudnn

Reviewed By: jerryzh168

Differential Revision: D35717355

Pulled By: dzdang

fbshipit-source-id: bc5e47666e4d0a8e1a544a094008520a290e5d25
malfet pushed a commit that referenced this pull request Apr 20, 2022
…ed_bias tensor in quantized cudnn linear op

Summary:
The previous implementation for broadcasted_bias in quantized cudnn
linear op has 2 issues.
1) broadcasted_bias is a view of the the input bias tensor. This is not
desired as any modifications to broadcasted_bias is also done to the
input bias. To remedy this, we clone the input bias tensor.

2) Calling broadcast_to doesn't affect the storage, which is problematic
for the cudnn operations. We need to create a fully broadcasted tensor,
rather than a view (which is what's returned by broadcast_to). To remedy
this, we call contiguous().

Test plan:
python test/test_quantization.py -k test_linear_cudnn

Pull Request resolved: #75944

Approved by: https://github.com/jerryzh168

(cherry picked from commit 381e725)
@facebook-github-bot facebook-github-bot deleted the gh/dzdang/88/head branch April 22, 2022 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants