Don't support legacy Python #2

bartvm · 2016-08-16T13:04:07Z

There is really no reason to support Python 2. Python 3 has been out for 8 years now. There are plenty of good articles written about this. Maintaining a dual codebase is a going to be a major pain and it prevents you from using a whole bunch of new Python 3 features (six only gets you so far).

The text was updated successfully, but these errors were encountered:

apaszke · 2016-09-09T01:18:47Z

This looks like a reason to me:

Windows merging

…ard function is invalidated [attempt 2]" Attempt #2 for #117875 to fix #112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

…invalidated [attempt 2]" Attempt #2 for #117875 to fix #112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

…ard function is invalidated [attempt 2]" Attempt #2 for #117875 to fix #112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

…invalidated [attempt 2]" Attempt #2 for #117875 to fix #112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

Pull Request resolved: #118835 We borrow MatMul's work to do the re-packing: https://www.internalfb.com/code/fbsource/[7e8ef1b8adeda224a736f8cc4bf870e0a659df95]/xplat/caffe2/aten/src/ATen/native/vulkan/ops/Mm.cpp?lines=20%2C50 # GLSL Change #1 - Reduce calls to `texelFetch(uKernel, ...)` by 4. In V2, this was the only change. We created an inner for-loop (which executes up to 4 times), and moved this call out. ``` for (int k = k_start; k < k_end;) { const ivec3 w_pos = ivec3(k / 4, in_c % in_group_size, out_c); const vec4 weight = texelFetch(uKernel, w_pos, 0); for (int k_off = k % 4; k_off < 4 && k < k_end; ++k, ++k_off) { int in_pos_x = in_l + k * dilation; const ivec3 in_pos = ivec3(in_pos_x, in_c, n / 4); const vec4 input_value = texelFetch(uInput, in_pos, 0); v += weight[k_off] * input_value; } } ``` However, it actually results in worse performance, because of the complex for-loop conditions, especially `int k_off = k % 4`. The compiler can't unroll this! # GLSL Change #2 - Unroll loops to `texelFetch(uInput, ...)`. The `k_start` and `k_end` "smartly" avoid computations that would result in a sum of zero. However, these theoretical gains lead to physical branching that cannot be optimized. ## W/o diff (690ms) ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {30, 20, 2} 35984 vulkan.nchw_to_image {32, 4, 3} 11128 vulkan.nchw_to_image {10, 1, 1} 6292 vulkan.conv1d {1, 10, 1} 669084 vulkan.image_to_nchw {2, 10, 2} 7748 vulkan.nchw_to_image {30, 20, 2} 31044 vulkan.nchw_to_image {32, 4, 3} 10868 vulkan.nchw_to_image {10, 1, 1} 6136 vulkan.conv1d {1, 10, 1} 671216 vulkan.image_to_nchw {2, 10, 2} 8164 vulkan.nchw_to_image {30, 20, 2} 31148 vulkan.nchw_to_image {32, 4, 3} 10920 vulkan.nchw_to_image {10, 1, 1} 6084 vulkan.conv1d {1, 10, 1} 674232 vulkan.image_to_nchw {2, 10, 2} 8008 vulkan.nchw_to_image {30, 20, 2} 31096 vulkan.nchw_to_image {32, 4, 3} 11024 vulkan.nchw_to_image {10, 1, 1} 6500 vulkan.conv1d {1, 10, 1} 671736 vulkan.image_to_nchw {2, 10, 2} 8164 vulkan.nchw_to_image {30, 20, 2} 31824 vulkan.nchw_to_image {32, 4, 3} 11284 vulkan.nchw_to_image {10, 1, 1} 6604 vulkan.conv1d {1, 10, 1} 691340 vulkan.image_to_nchw {2, 10, 2} 7644 ------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------- conv1d_op_benchmark/iterations:5/manual_time/threads:1 0.676 ms 35.0 ms 5 ``` ## W/ diff (330ms) ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {30, 20, 2} 35828 vulkan.nchw_to_image {32, 4, 3} 11024 vulkan.nchw_to_image {10, 1, 1} 6344 vulkan.convert_channels_to_width_packed {8, 4, 10} 13208 vulkan.conv1d {1, 10, 1} 326664 vulkan.image_to_nchw {2, 10, 2} 8164 vulkan.nchw_to_image {30, 20, 2} 30940 vulkan.nchw_to_image {32, 4, 3} 10972 vulkan.nchw_to_image {10, 1, 1} 6188 vulkan.convert_channels_to_width_packed {8, 4, 10} 12844 vulkan.conv1d {1, 10, 1} 326872 vulkan.image_to_nchw {2, 10, 2} 8112 vulkan.nchw_to_image {30, 20, 2} 31304 vulkan.nchw_to_image {32, 4, 3} 10972 vulkan.nchw_to_image {10, 1, 1} 6240 vulkan.convert_channels_to_width_packed {8, 4, 10} 12584 vulkan.conv1d {1, 10, 1} 323492 vulkan.image_to_nchw {2, 10, 2} 7488 vulkan.nchw_to_image {30, 20, 2} 31772 vulkan.nchw_to_image {32, 4, 3} 10868 vulkan.nchw_to_image {10, 1, 1} 6396 vulkan.convert_channels_to_width_packed {8, 4, 10} 13312 vulkan.conv1d {1, 10, 1} 332956 vulkan.image_to_nchw {2, 10, 2} 8216 vulkan.nchw_to_image {30, 20, 2} 31772 vulkan.nchw_to_image {32, 4, 3} 11024 vulkan.nchw_to_image {10, 1, 1} 6292 vulkan.convert_channels_to_width_packed {8, 4, 10} 13104 vulkan.conv1d {1, 10, 1} 330408 vulkan.image_to_nchw {2, 10, 2} 7592 ------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------- conv1d_op_benchmark/iterations:5/manual_time/threads:1 0.341 ms 41.0 ms 5 ``` ghstack-source-id: 214201402 @exported-using-ghexport Differential Revision: [D53204674](https://our.internmc.facebook.com/intern/diff/D53204674/)

…ard function is invalidated [attempt 2]" Attempt #2 for #117875 to fix #112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

…invalidated [attempt 2]" Attempt #2 for #117875 to fix #112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

…ard function is invalidated [attempt 2]" Attempt #2 for #117875 to fix #112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

…invalidated [attempt 2]" Attempt #2 for #117875 to fix #112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

…ard function is invalidated [attempt 2]" Attempt #2 for #117875 to fix #112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

…invalidated [attempt 2]" Attempt #2 for #117875 to fix #112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

… [attempt 2] (#119107) Attempt #2 for #117875 to fix #112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. Pull Request resolved: #119107 Approved by: https://github.com/jansel

Pull Request resolved: #118835 We borrow MatMul's work to do the re-packing: https://www.internalfb.com/code/fbsource/[7e8ef1b8adeda224a736f8cc4bf870e0a659df95]/xplat/caffe2/aten/src/ATen/native/vulkan/ops/Mm.cpp?lines=20%2C50 # GLSL Change #1 - Reduce calls to `texelFetch(uKernel, ...)` by 4. In V2, this was the only change. We created an inner for-loop (which executes up to 4 times), and moved this call out. ``` for (int k = k_start; k < k_end;) { const ivec3 w_pos = ivec3(k / 4, in_c % in_group_size, out_c); const vec4 weight = texelFetch(uKernel, w_pos, 0); for (int k_off = k % 4; k_off < 4 && k < k_end; ++k, ++k_off) { int in_pos_x = in_l + k * dilation; const ivec3 in_pos = ivec3(in_pos_x, in_c, n / 4); const vec4 input_value = texelFetch(uInput, in_pos, 0); v += weight[k_off] * input_value; } } ``` However, it actually results in worse performance, because of the complex for-loop conditions, especially `int k_off = k % 4`. The compiler can't unroll this! # GLSL Change #2 - Unroll loops to `texelFetch(uInput, ...)`. The `k_start` and `k_end` "smartly" avoid computations that would result in a sum of zero. However, these theoretical gains lead to physical branching that cannot be optimized. ## W/o diff (690ms) ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {30, 20, 2} 35984 vulkan.nchw_to_image {32, 4, 3} 11128 vulkan.nchw_to_image {10, 1, 1} 6292 vulkan.conv1d {1, 10, 1} 669084 vulkan.image_to_nchw {2, 10, 2} 7748 vulkan.nchw_to_image {30, 20, 2} 31044 vulkan.nchw_to_image {32, 4, 3} 10868 vulkan.nchw_to_image {10, 1, 1} 6136 vulkan.conv1d {1, 10, 1} 671216 vulkan.image_to_nchw {2, 10, 2} 8164 vulkan.nchw_to_image {30, 20, 2} 31148 vulkan.nchw_to_image {32, 4, 3} 10920 vulkan.nchw_to_image {10, 1, 1} 6084 vulkan.conv1d {1, 10, 1} 674232 vulkan.image_to_nchw {2, 10, 2} 8008 vulkan.nchw_to_image {30, 20, 2} 31096 vulkan.nchw_to_image {32, 4, 3} 11024 vulkan.nchw_to_image {10, 1, 1} 6500 vulkan.conv1d {1, 10, 1} 671736 vulkan.image_to_nchw {2, 10, 2} 8164 vulkan.nchw_to_image {30, 20, 2} 31824 vulkan.nchw_to_image {32, 4, 3} 11284 vulkan.nchw_to_image {10, 1, 1} 6604 vulkan.conv1d {1, 10, 1} 691340 vulkan.image_to_nchw {2, 10, 2} 7644 ------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------- conv1d_op_benchmark/iterations:5/manual_time/threads:1 0.676 ms 35.0 ms 5 ``` ## W/ diff (330ms) ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {30, 20, 2} 35828 vulkan.nchw_to_image {32, 4, 3} 11024 vulkan.nchw_to_image {10, 1, 1} 6344 vulkan.convert_channels_to_width_packed {8, 4, 10} 13208 vulkan.conv1d {1, 10, 1} 326664 vulkan.image_to_nchw {2, 10, 2} 8164 vulkan.nchw_to_image {30, 20, 2} 30940 vulkan.nchw_to_image {32, 4, 3} 10972 vulkan.nchw_to_image {10, 1, 1} 6188 vulkan.convert_channels_to_width_packed {8, 4, 10} 12844 vulkan.conv1d {1, 10, 1} 326872 vulkan.image_to_nchw {2, 10, 2} 8112 vulkan.nchw_to_image {30, 20, 2} 31304 vulkan.nchw_to_image {32, 4, 3} 10972 vulkan.nchw_to_image {10, 1, 1} 6240 vulkan.convert_channels_to_width_packed {8, 4, 10} 12584 vulkan.conv1d {1, 10, 1} 323492 vulkan.image_to_nchw {2, 10, 2} 7488 vulkan.nchw_to_image {30, 20, 2} 31772 vulkan.nchw_to_image {32, 4, 3} 10868 vulkan.nchw_to_image {10, 1, 1} 6396 vulkan.convert_channels_to_width_packed {8, 4, 10} 13312 vulkan.conv1d {1, 10, 1} 332956 vulkan.image_to_nchw {2, 10, 2} 8216 vulkan.nchw_to_image {30, 20, 2} 31772 vulkan.nchw_to_image {32, 4, 3} 11024 vulkan.nchw_to_image {10, 1, 1} 6292 vulkan.convert_channels_to_width_packed {8, 4, 10} 13104 vulkan.conv1d {1, 10, 1} 330408 vulkan.image_to_nchw {2, 10, 2} 7592 ------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------- conv1d_op_benchmark/iterations:5/manual_time/threads:1 0.341 ms 41.0 ms 5 ``` ghstack-source-id: 214424835 @exported-using-ghexport Differential Revision: [D53204674](https://our.internmc.facebook.com/intern/diff/D53204674/)

Pull Request resolved: #118835 We borrow MatMul's work to do the re-packing: https://www.internalfb.com/code/fbsource/[7e8ef1b8adeda224a736f8cc4bf870e0a659df95]/xplat/caffe2/aten/src/ATen/native/vulkan/ops/Mm.cpp?lines=20%2C50 # GLSL Change #1 - Reduce calls to `texelFetch(uKernel, ...)` by 4. In V2, this was the only change. We created an inner for-loop (which executes up to 4 times), and moved this call out. ``` for (int k = k_start; k < k_end;) { const ivec3 w_pos = ivec3(k / 4, in_c % in_group_size, out_c); const vec4 weight = texelFetch(uKernel, w_pos, 0); for (int k_off = k % 4; k_off < 4 && k < k_end; ++k, ++k_off) { int in_pos_x = in_l + k * dilation; const ivec3 in_pos = ivec3(in_pos_x, in_c, n / 4); const vec4 input_value = texelFetch(uInput, in_pos, 0); v += weight[k_off] * input_value; } } ``` However, it actually results in worse performance, because of the complex for-loop conditions, especially `int k_off = k % 4`. The compiler can't unroll this! # GLSL Change #2 - Unroll loops to `texelFetch(uInput, ...)`. The `k_start` and `k_end` "smartly" avoid computations that would result in a sum of zero. However, these theoretical gains lead to physical branching that cannot be optimized. ## W/o diff (690ms) ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {30, 20, 2} 35984 vulkan.nchw_to_image {32, 4, 3} 11128 vulkan.nchw_to_image {10, 1, 1} 6292 vulkan.conv1d {1, 10, 1} 669084 vulkan.image_to_nchw {2, 10, 2} 7748 vulkan.nchw_to_image {30, 20, 2} 31044 vulkan.nchw_to_image {32, 4, 3} 10868 vulkan.nchw_to_image {10, 1, 1} 6136 vulkan.conv1d {1, 10, 1} 671216 vulkan.image_to_nchw {2, 10, 2} 8164 vulkan.nchw_to_image {30, 20, 2} 31148 vulkan.nchw_to_image {32, 4, 3} 10920 vulkan.nchw_to_image {10, 1, 1} 6084 vulkan.conv1d {1, 10, 1} 674232 vulkan.image_to_nchw {2, 10, 2} 8008 vulkan.nchw_to_image {30, 20, 2} 31096 vulkan.nchw_to_image {32, 4, 3} 11024 vulkan.nchw_to_image {10, 1, 1} 6500 vulkan.conv1d {1, 10, 1} 671736 vulkan.image_to_nchw {2, 10, 2} 8164 vulkan.nchw_to_image {30, 20, 2} 31824 vulkan.nchw_to_image {32, 4, 3} 11284 vulkan.nchw_to_image {10, 1, 1} 6604 vulkan.conv1d {1, 10, 1} 691340 vulkan.image_to_nchw {2, 10, 2} 7644 ------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------- conv1d_op_benchmark/iterations:5/manual_time/threads:1 0.676 ms 35.0 ms 5 ``` ## W/ diff (330ms) ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {30, 20, 2} 35828 vulkan.nchw_to_image {32, 4, 3} 11024 vulkan.nchw_to_image {10, 1, 1} 6344 vulkan.convert_channels_to_width_packed {8, 4, 10} 13208 vulkan.conv1d {1, 10, 1} 326664 vulkan.image_to_nchw {2, 10, 2} 8164 vulkan.nchw_to_image {30, 20, 2} 30940 vulkan.nchw_to_image {32, 4, 3} 10972 vulkan.nchw_to_image {10, 1, 1} 6188 vulkan.convert_channels_to_width_packed {8, 4, 10} 12844 vulkan.conv1d {1, 10, 1} 326872 vulkan.image_to_nchw {2, 10, 2} 8112 vulkan.nchw_to_image {30, 20, 2} 31304 vulkan.nchw_to_image {32, 4, 3} 10972 vulkan.nchw_to_image {10, 1, 1} 6240 vulkan.convert_channels_to_width_packed {8, 4, 10} 12584 vulkan.conv1d {1, 10, 1} 323492 vulkan.image_to_nchw {2, 10, 2} 7488 vulkan.nchw_to_image {30, 20, 2} 31772 vulkan.nchw_to_image {32, 4, 3} 10868 vulkan.nchw_to_image {10, 1, 1} 6396 vulkan.convert_channels_to_width_packed {8, 4, 10} 13312 vulkan.conv1d {1, 10, 1} 332956 vulkan.image_to_nchw {2, 10, 2} 8216 vulkan.nchw_to_image {30, 20, 2} 31772 vulkan.nchw_to_image {32, 4, 3} 11024 vulkan.nchw_to_image {10, 1, 1} 6292 vulkan.convert_channels_to_width_packed {8, 4, 10} 13104 vulkan.conv1d {1, 10, 1} 330408 vulkan.image_to_nchw {2, 10, 2} 7592 ------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------- conv1d_op_benchmark/iterations:5/manual_time/threads:1 0.341 ms 41.0 ms 5 ``` ghstack-source-id: 214449871 @exported-using-ghexport Differential Revision: [D53204674](https://our.internmc.facebook.com/intern/diff/D53204674/)

user may not know which line of code called collectives in a big code base. When debugging, we can print python-cpp stacktrace in case user call ``ProcessGroup.reduce`` instead of ``torch.distributed.reduce`` ``` LOG(INFO) << "ProcessGroupNCCL::_allgather_base stacktrace: " << get_python_cpp_trace(); ``` output (using _allgather_base as an example): one example python-part trace is ``all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838`` ``` ProcessGroupNCCL::_allgather_base stacktrace: #0 torch::unwind::unwind() from ??:0 #1 torch::CapturedTraceback::gather(bool, bool, bool) from ??:0 #2 c10d::get_python_cpp_trace[abi:cxx11]() from :0 #3 c10d::ProcessGroupNCCL::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from ??:0 #4 c10d::ops::(anonymous namespace)::_allgather_base_CUDA(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long) from Ops.cpp:0 #5 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > > (*)(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long), std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > >, c10::guts::typelist::typelist<at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from :0 #6 torch::autograd::basicAutogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0 #7 c10d::ProcessGroup::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from :0 #8 pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}&&, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (*)(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from :0 #9 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 #10 cfunction_call from /usr/local/src/conda/python-3.10.12/Objects/methodobject.c:543 #11 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #13 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #14 all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838 #15 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #16 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #17 wrapper from /data/users/weif/pytorch/torch/distributed/c10d_logger.py:75 #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #20 _all_gather_flat_param from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1399 #21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #23 unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1308 #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #26 _unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:332 #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #29 _pre_forward_unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:448 #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #32 _pre_forward from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:413 #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #35 forward from /data/users/weif/pytorch/torch/distributed/fsdp/fully_sharded_data_parallel.py:839 #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #37 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #38 _call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1520 #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #40 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #41 _wrapped_call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1511 #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #43 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.12/Objects/call.c:431 #44 slot_tp_call from /usr/local/src/conda/python-3.10.12/Objects/typeobject.c:7494 #45 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #46 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #47 inner from /data/users/weif/pytorch/run_fsdp.py:72 #48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #50 run from /data/users/weif/pytorch/run_fsdp.py:76 #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #53 main from /data/users/weif/pytorch/run_fsdp.py:133 #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #56 <module> from /data/users/weif/pytorch/run_fsdp.py:137 #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #58 PyEval_EvalCode from /usr/local/src/conda/python-3.10.12/Python/ceval.c:1134 #59 run_eval_code_obj from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1291 #60 run_mod from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1312 #61 pyrun_file from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1208 #62 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:456 #63 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:90 #64 pymain_run_file_obj from /usr/local/src/conda/python-3.10.12/Modules/main.c:357 #65 Py_BytesMain from /usr/local/src/conda/python-3.10.12/Modules/main.c:1090 #66 __libc_start_call_main from ??:0 #67 <unwind unsupported> from ??:0 ``` Pull Request resolved: #118924 Approved by: https://github.com/kwen2501

… [attempt 2] (#119107) Attempt #2 for #117875 to fix #112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. Pull Request resolved: #119107 Approved by: https://github.com/jansel

… [attempt 2] (pytorch#119107) Attempt pytorch#2 for pytorch#117875 to fix pytorch#112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. Pull Request resolved: pytorch#119107 Approved by: https://github.com/jansel

… [attempt 2] (#119107) Attempt #2 for #117875 to fix #112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. Pull Request resolved: #119107 Approved by: https://github.com/jansel

* [Release only changes] Release only changes #2 * common+lint

* [Release only changes] Release only changes #2 * common+lint [ghstack-poisoned]

Summary: The caffe2/utils threadpool impl used to set thread name, since D8266344 https://www.internalfb.com/code/fbsource/[3ba3d30d6841]/xplat/caffe2/caffe2/utils/threadpool/WorkersPool.h?lines=271-273 But now we don't use this caffe2's own impl (since D21232894?), but use the third-party threadpool instead, which doesn't set thread name This diff is to achieve same effect as D8266344, such that we can tell which threads are pytorch threads from perfetto trace. The idea comes from https://stackoverflow.com/questions/32375034/how-to-obtain-thread-name-in-android-ndk and folly ThreadName https://www.internalfb.com/code/fbsource/[3ba3d30d6841]/xplat/folly/system/ThreadName.cpp?lines=30-41 I'm not sure if this is the right place to put this change. BTW, Pytorch thread pool caller thread is worker #0 https://www.internalfb.com/code/fbsource/[3ba3d30d6841281c140db1c8bd2f85ede310a01b]/xplat/third-party/pthreadpool/pthreadpool/src/pthreads.c?lines=289-292 Test Plan: ## Before ``` --num_cpu_threads 2 --num_pytorch_threads -1 # default to size equal to 4 cpu cores mos:/ $ ps -T -p `pidof transcribe_bin` USER PID TID PPID VSZ RSS WCHAN ADDR S CMD shell 8985 8985 8983 118576 47688 hrtimer_n+ 0 S transcribe_bin <-- main thread shell 8985 8986 8983 118576 47688 0 0 R transcribe_bin <-- pytorch thread pytorch#1 shell 8985 8987 8983 118576 47688 0 0 R transcribe_bin <-- pytorch thread pytorch#2 shell 8985 8988 8983 118576 47688 0 0 R transcribe_bin <-- pytorch thread pytorch#3 shell 8985 8989 8983 118576 47688 0 0 R CPUThreadPool0 shell 8985 8990 8983 118576 47688 futex_wai+ 0 S CPUThreadPool1 shell 8985 8991 8983 118576 47688 ep_poll 0 S IOThreadPool0 shell 8985 8992 8983 118576 47688 futex_wai+ 0 S FutureTimekeepr shell 8985 8993 8983 118576 47688 pipe_wait 0 S snapshot_thread shell 8985 8994 8983 118576 47688 hrtimer_n+ 0 S snapshot_thread shell 8985 8997 8983 118576 47688 futex_wai+ 0 S AsyncDataQueue ``` ## After ``` --num_cpu_threads 2 --num_pytorch_threads -1 mos:/ $ ps -T -p `pidof transcribe_bin` USER PID TID PPID VSZ RSS WCHAN ADDR S CMD shell 11901 11901 11899 118128 40748 futex_wai+ 0 S transcribe_bin <-- main thread serves as pytorch thread #0 shell 11901 11902 11899 118132 40748 futex_wai+ 0 S c10pthreadpool <-- pytorch thread pytorch#1 shell 11901 11903 11899 118132 40748 futex_wai+ 0 S c10pthreadpool <-- pytorch thread pytorch#2 shell 11901 11904 11899 118132 40748 futex_wai+ 0 S c10pthreadpool <-- pytorch thread pytorch#3 shell 11901 11905 11899 118152 40752 futex_wai+ 0 S CPUThreadPool0 shell 11901 11906 11899 118148 40752 0 0 R CPUThreadPool1 shell 11901 11907 11899 118148 40756 ep_poll 0 S IOThreadPool0 shell 11901 11908 11899 118152 40756 futex_wai+ 0 S FutureTimekeepr shell 11901 11909 11899 118164 40756 pipe_wait 0 S snapshot_thread shell 11901 11910 11899 118168 40756 hrtimer_n+ 0 S snapshot_thread shell 11901 11913 11899 118160 40760 futex_wai+ 0 S AsyncDataQueue ``` Example Perfetto trace: {F1483727859} Looks like the pytorch thread pool was originally created with 4 thread during ASR loading (`loadTunaFactory`), and later recreated with 3 threads during inference. Differential Revision: D55990584 Pulled By: chsivic

soumith closed this as completed Sep 10, 2016

rockcat mentioned this issue Jan 24, 2017

segfault on AMD Athlon CPUs #535

Closed

colesbury referenced this issue in colesbury/pytorch Apr 28, 2017

Lapack functions implementation #2 + fixes after review

f946552

apaszke pushed a commit that referenced this issue Apr 28, 2017

Lapack functions implementation #2 + fixes after review

ed0c11a

apaszke pushed a commit that referenced this issue May 1, 2017

Lapack functions implementation #2 + fixes after review

98d8e0b

Jiaming-Liu pushed a commit to Jiaming-Liu/pytorch that referenced this issue May 18, 2017

Lapack functions implementation pytorch#2 + fixes after review

497bf49

egborbe mentioned this issue May 25, 2017

memory leak(cpu, not gpu) in convolution layer #1272

Closed

ashmeet4293 mentioned this issue Jun 22, 2017

Segmentation fault (core dumped) on Ubuntu 14.04.5 #926

Closed

howard0su mentioned this issue Jul 16, 2017

Dropout segfaults with -OO #1848

Closed

Louis-Tian mentioned this issue Jul 30, 2017

Fork start method is susceptible to deadlocks #2245

Closed

tfriedel pushed a commit to tfriedel/pytorch that referenced this issue Aug 9, 2017

Merge pull request pytorch#2 from peterjc123/windows-new

0881771

Windows merging

tfriedel pushed a commit to tfriedel/pytorch that referenced this issue Aug 9, 2017

changed tests pytorch#2

3896268

yogi81 referenced this issue Aug 16, 2017

libshm needs static libstdc++ on binary build

1d9b10d

jiasenlu mentioned this issue Sep 6, 2017

segfault in python multithreaded setting #1868

Closed

soumith pushed a commit that referenced this issue Oct 5, 2017

Update README to link to NCCL2 #2

d66fb63

M-Eng mentioned this issue Oct 12, 2017

Segfault when using data parallel on PPC64 #3089

Closed

deepCMR mentioned this issue Oct 13, 2017

Multi-GPU K80s #1637

Closed

lightChaserX mentioned this issue Oct 14, 2017

CUDA 9 + CUDNN 7 - seg fault #3081

Closed

dmarnerides mentioned this issue Oct 16, 2017

conv2d Segmentation Fault #3141

Closed

amulyahwr mentioned this issue Dec 2, 2017

Segmentation fault, pytorch version '0.2.0_3' #3985

Closed

gesape mentioned this issue Dec 28, 2017

Segfault on "import torch" #4111

Closed

wineoffree mentioned this issue Apr 5, 2018

Segmentation fault (core dumped) after iterated for thousands of time #6319

Closed

wangyirui mentioned this issue Apr 9, 2018

Segmentation fault (core dumped) #6449

Closed

praveenkumarchandaliya mentioned this issue Apr 12, 2018

x = torch.cat([z,l,k],dim=1) #6545

Closed

mruberry mentioned this issue Apr 25, 2018

Changes incorrect "overlappingIndices" call to correct "maybeOverlappingIndices" #6953

Merged

dimilar mentioned this issue Apr 29, 2018

[caffe2] EigenTranspose problem in math_cpu.cc #6847

Closed

pytorch-bot bot pushed a commit that referenced this issue Feb 9, 2024

Remove unused TEST_WITH_ROCM import #2

484af6d

moghadas76 mentioned this issue Feb 22, 2024

watchdog exception error #120395

Open

heyfavour mentioned this issue Mar 6, 2024

c++ libtroch Exception when tensor do repeat_interleave then do torch::mm #121293

Closed

atalman added a commit to atalman/pytorch that referenced this issue Mar 13, 2024

[Release only changes] Release only changes pytorch#2

f85bd88

atalman added a commit that referenced this issue Mar 13, 2024

[RELEASE ONLY CHANGES] Apply release only changes Release 2.3 (#121813)

6a89a75

* [Release only changes] Release only changes #2 * common+lint

guangy10 mentioned this issue Mar 26, 2024

[RELEASE ONLY CHANGES] Apply release only changes Release 2.3 (#121813) #122672

Closed

guangy10 added a commit that referenced this issue Mar 26, 2024

[RELEASE ONLY CHANGES] Apply release only changes Release 2.3 (#121813)

7f57573

* [Release only changes] Release only changes #2 * common+lint [ghstack-poisoned]

htyu mentioned this issue Mar 29, 2024

a log_softmax kernel get much worse perf with padding #122840

Open

aayest mentioned this issue Apr 2, 2024

Placeholder tensor is empty! #123171

Open

int3 mentioned this issue Apr 4, 2024

int array declaration error #121814

Open

malfet mentioned this issue Apr 24, 2024

tensor.dtype.to_complex() crashes kernel after ~100 calls in ipython kernel #124868

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't support legacy Python #2

Don't support legacy Python #2

bartvm commented Aug 16, 2016

apaszke commented Sep 9, 2016 •

edited

Don't support legacy Python #2

Don't support legacy Python #2

Comments

bartvm commented Aug 16, 2016

apaszke commented Sep 9, 2016 • edited

apaszke commented Sep 9, 2016 •

edited