New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEP8 #3
Comments
If that's REALLY the convention, I'm okay with it. Though it looks uglier ;-) |
I think it's best to stick with PEP8. However, this will also require changing |
@apaszke, if you use Atom you get Facebook's linter which seems to be PEP8 compatible. One thing I noticed is that this apparently isn't accepted style in Python:
Instead the preferred style is:
or
|
We should probably set up pylint and make it run at each pull request. This way we can slowly make ourselves more pep8 compliant and eventually we'll just go over the whole codebase and fix all errors. I'm using vim, I'll get myself some linter plugin tomorrow. |
Can we get rid of "import *" statements as well? Not PEP8 but lots of people seem annoyed at it. http://stackoverflow.com/questions/2386714/why-is-import-bad |
modify cuda and cudnn dll names for win32
…ython library" This is the cheap and cheerful implementation, which is only enabled on TORCH_SHOW_CPP_STACKTRACES, because it *eagerly* symbolizes immediately at exception throw time, even if the exception will end up getting caught. It would be better to do this lazily and only symbolize when we try to print the exception, but that requires a more involved refactor of c10::Error that I don't feel like doing. Compare the output before: ``` frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x95 (0x7fa21b99d975 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #1: c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const + 0x8d (0x7fa21b951269 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #2: c10::TensorImpl::sizes_custom() const + 0x9f (0x7fa21b9770df in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #3: at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) + 0x31e (0x7fa20a202a8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #4: <unknown function> + 0x29f34de (0x7fa20b5f34de in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #5: <unknown function> + 0x2a1fd8e (0x7fa20b61fd8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #6: <unknown function> + 0x6b907b (0x7fa2142b907b in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) frame #7: <unknown function> + 0x6b6175 (0x7fa2142b6175 in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) ``` and after: ``` #1 torch::CapturedTraceback::gather(bool, bool, bool) from ??:0 #2 THPModule_initExtension(_object*, _object*)::{lambda()#1}::operator()() const [clone .constprop.0] from Module.cpp:0 #3 std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), THPModule_initExtension(_object*, _object*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Module.cpp:0 #4 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0 #5 c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const from ??:0 #6 c10::TensorImpl::sizes_custom() const [clone .localalias] from TensorImpl.cpp:0 #7 at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) from ??:0 #8 at::(anonymous namespace)::wrapper_Meta_mm_out_out(at::Tensor const&, at::Tensor const&, at::Tensor&) from RegisterMeta.cpp:0 #9 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor const&, at::Tensor const&, at::Tensor&), &at::(anonymous namespace)::wrapper_Meta_mm_out_out>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor&> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterMeta.cpp:0 ``` Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
…ython library" This is the cheap and cheerful implementation, which is only enabled on TORCH_SHOW_CPP_STACKTRACES, because it *eagerly* symbolizes immediately at exception throw time, even if the exception will end up getting caught. It would be better to do this lazily and only symbolize when we try to print the exception, but that requires a more involved refactor of c10::Error that I don't feel like doing. Compare the output before: ``` frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x95 (0x7fa21b99d975 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #1: c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const + 0x8d (0x7fa21b951269 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #2: c10::TensorImpl::sizes_custom() const + 0x9f (0x7fa21b9770df in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #3: at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) + 0x31e (0x7fa20a202a8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #4: <unknown function> + 0x29f34de (0x7fa20b5f34de in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #5: <unknown function> + 0x2a1fd8e (0x7fa20b61fd8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #6: <unknown function> + 0x6b907b (0x7fa2142b907b in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) frame #7: <unknown function> + 0x6b6175 (0x7fa2142b6175 in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) ``` and after: ``` #4 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0 #5 c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const from ??:0 #6 c10::TensorImpl::sizes_custom() const [clone .localalias] from TensorImpl.cpp:0 #7 at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) from ??:0 #8 at::(anonymous namespace)::wrapper_Meta_mm_out_out(at::Tensor const&, at::Tensor const&, at::Tensor&) from RegisterMeta.cpp:0 #9 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor const&, at::Tensor const&, at::Tensor&), &at::(anonymous namespace)::wrapper_Meta_mm_out_out>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor&> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterMeta.cpp:0 ``` Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
…ry (#113207) This is the cheap and cheerful implementation, which is only enabled on TORCH_SHOW_CPP_STACKTRACES, because it *eagerly* symbolizes immediately at exception throw time, even if the exception will end up getting caught. It would be better to do this lazily and only symbolize when we try to print the exception, but that requires a more involved refactor of c10::Error that I don't feel like doing. Compare the output before: ``` frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x95 (0x7fa21b99d975 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #1: c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const + 0x8d (0x7fa21b951269 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #2: c10::TensorImpl::sizes_custom() const + 0x9f (0x7fa21b9770df in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #3: at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) + 0x31e (0x7fa20a202a8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #4: <unknown function> + 0x29f34de (0x7fa20b5f34de in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #5: <unknown function> + 0x2a1fd8e (0x7fa20b61fd8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #6: <unknown function> + 0x6b907b (0x7fa2142b907b in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) frame #7: <unknown function> + 0x6b6175 (0x7fa2142b6175 in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) ``` and after: ``` #4 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0 #5 c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const from ??:0 #6 c10::TensorImpl::sizes_custom() const [clone .localalias] from TensorImpl.cpp:0 #7 at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) from ??:0 #8 at::(anonymous namespace)::wrapper_Meta_mm_out_out(at::Tensor const&, at::Tensor const&, at::Tensor&) from RegisterMeta.cpp:0 #9 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor const&, at::Tensor const&, at::Tensor&), &at::(anonymous namespace)::wrapper_Meta_mm_out_out>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor&> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterMeta.cpp:0 ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: #113207 Approved by: https://github.com/Skylion007
…ry (pytorch#113207) This is the cheap and cheerful implementation, which is only enabled on TORCH_SHOW_CPP_STACKTRACES, because it *eagerly* symbolizes immediately at exception throw time, even if the exception will end up getting caught. It would be better to do this lazily and only symbolize when we try to print the exception, but that requires a more involved refactor of c10::Error that I don't feel like doing. Compare the output before: ``` frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x95 (0x7fa21b99d975 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame pytorch#1: c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const + 0x8d (0x7fa21b951269 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame pytorch#2: c10::TensorImpl::sizes_custom() const + 0x9f (0x7fa21b9770df in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame pytorch#3: at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) + 0x31e (0x7fa20a202a8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame pytorch#4: <unknown function> + 0x29f34de (0x7fa20b5f34de in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame pytorch#5: <unknown function> + 0x2a1fd8e (0x7fa20b61fd8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame pytorch#6: <unknown function> + 0x6b907b (0x7fa2142b907b in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) frame pytorch#7: <unknown function> + 0x6b6175 (0x7fa2142b6175 in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) ``` and after: ``` pytorch#4 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0 pytorch#5 c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const from ??:0 pytorch#6 c10::TensorImpl::sizes_custom() const [clone .localalias] from TensorImpl.cpp:0 pytorch#7 at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) from ??:0 pytorch#8 at::(anonymous namespace)::wrapper_Meta_mm_out_out(at::Tensor const&, at::Tensor const&, at::Tensor&) from RegisterMeta.cpp:0 pytorch#9 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor const&, at::Tensor const&, at::Tensor&), &at::(anonymous namespace)::wrapper_Meta_mm_out_out>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor&> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterMeta.cpp:0 ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: pytorch#113207 Approved by: https://github.com/Skylion007
For some reason, inlining initializer list into a std::vector takes a lot of time using clang-15. But considering that there are only dozen or so distrinct tags, creating them once and pass as def argument should not affect runtime speed at all, but this significantly improves compilation time. On Mac M1 it reduces time needed to compiler RegisterSchema.cpp from 50 to 3 seconds. Before ``` % time /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -ftime-report -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/Users/nshulga/git/pytorch/pytorch/build/aten/src -I/Users/nshulga/git/pytorch/pytorch/aten/src -I/Users/nshulga/git/pytorch/pytorch/build -I/Users/nshulga/git/pytorch/pytorch -I/Users/nshulga/git/pytorch/pytorch/cmake/../third_party/benchmark/include -I/Users/nshulga/git/pytorch/pytorch/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/build/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/build/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include -I/Users/nshulga/git/pytorch/pytorch/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/../aten/src -I/Users/nshulga/git/pytorch/pytorch/torch/csrc -I/Users/nshulga/git/pytorch/pytorch/third_party/miniz-2.1.0 -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/include -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/src -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/FXdiv/include -I/Users/nshulga/git/pytorch/pytorch/c10/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/pthreadpool/include -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/deps/clog/include -I/Users/nshulga/git/pytorch/pytorch/third_party/NNPACK/include -I/Users/nshulga/git/pytorch/pytorch/third_party/FP16/include -I/Users/nshulga/git/pytorch/pytorch/third_party/fmt/include -I/Users/nshulga/git/pytorch/pytorch/third_party/flatbuffers/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googletest/include -isystem /Users/nshulga/git/pytorch/pytorch/third_party/protobuf/src -isystem /Users/nshulga/git/pytorch/pytorch/third_party/XNNPACK/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/eigen -isystem /Users/nshulga/git/pytorch/pytorch/build/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wsuggest-override -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wno-error=inconsistent-missing-override -Wno-error=inconsistent-missing-destructor-override -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -faligned-new -Werror -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces -O3 -DNDEBUG -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk -fPIC -D__NEON__ -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-unused-function -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-strict-overflow -Wno-strict-aliasing -fvisibility=hidden -O2 -Wmissing-prototypes -Werror=missing-prototypes -Xpreprocessor -fopenmp -I/Users/nshulga/miniforge3/include -std=gnu++17 -Wno-missing-prototypes -Wno-error=missing-prototypes -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSchema.cpp.o -c /Users/nshulga/git/pytorch/pytorch/build/aten/src/ATen/RegisterSchema.cpp ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 131.8054 seconds (132.5540 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 43.6364 ( 33.2%) 0.0919 ( 30.1%) 43.7282 ( 33.2%) 43.9658 ( 33.2%) 536345245380 ModuleInlinerWrapperPass 43.6291 ( 33.2%) 0.0891 ( 29.2%) 43.7182 ( 33.2%) 43.9549 ( 33.2%) 536264096394 DevirtSCCRepeatedPass 42.3766 ( 32.2%) 0.0185 ( 6.1%) 42.3951 ( 32.2%) 42.6198 ( 32.2%) 523040901767 GVNPass 0.4085 ( 0.3%) 0.0040 ( 1.3%) 0.4125 ( 0.3%) 0.4195 ( 0.3%) 4106085945 SimplifyCFGPass 0.3611 ( 0.3%) 0.0115 ( 3.8%) 0.3726 ( 0.3%) 0.3779 ( 0.3%) 4864696407 InstCombinePass 0.1607 ( 0.1%) 0.0088 ( 2.9%) 0.1695 ( 0.1%) 0.1720 ( 0.1%) 1780986175 InlinerPass 0.0865 ( 0.1%) 0.0024 ( 0.8%) 0.0889 ( 0.1%) 0.0914 ( 0.1%) 1489982961 SROAPass 0.0750 ( 0.1%) 0.0013 ( 0.4%) 0.0763 ( 0.1%) 0.0764 ( 0.1%) 620016338 SCCPPass 0.0661 ( 0.1%) 0.0040 ( 1.3%) 0.0701 ( 0.1%) 0.0735 ( 0.1%) 592027163 EarlyCSEPass 0.0554 ( 0.0%) 0.0026 ( 0.8%) 0.0580 ( 0.0%) 0.0604 ( 0.0%) 586567838 SLPVectorizerPass 0.0468 ( 0.0%) 0.0081 ( 2.7%) 0.0549 ( 0.0%) 0.0571 ( 0.0%) 486049135 BlockFrequencyAnalysis 0.0364 ( 0.0%) 0.0059 ( 1.9%) 0.0424 ( 0.0%) 0.0437 ( 0.0%) 366002196 BranchProbabilityAnalysis 0.0399 ( 0.0%) 0.0003 ( 0.1%) 0.0401 ( 0.0%) 0.0404 ( 0.0%) 324932876 OpenMPOptCGSCCPass 0.0361 ( 0.0%) 0.0022 ( 0.7%) 0.0383 ( 0.0%) 0.0385 ( 0.0%) 289493455 MemorySSAAnalysis 0.0341 ( 0.0%) 0.0017 ( 0.5%) 0.0358 ( 0.0%) 0.0360 ( 0.0%) 202039544 ADCEPass 0.0323 ( 0.0%) 0.0023 ( 0.7%) 0.0346 ( 0.0%) 0.0351 ( 0.0%) 279814836 CorrelatedValuePropagationPass 0.0318 ( 0.0%) 0.0005 ( 0.2%) 0.0324 ( 0.0%) 0.0334 ( 0.0%) 302116539 DSEPass 0.0251 ( 0.0%) 0.0032 ( 1.0%) 0.0283 ( 0.0%) 0.0290 ( 0.0%) 268768995 DominatorTreeAnalysis 0.0275 ( 0.0%) 0.0012 ( 0.4%) 0.0286 ( 0.0%) 0.0289 ( 0.0%) 335916941 HotColdSplittingPass 0.0251 ( 0.0%) 0.0031 ( 1.0%) 0.0282 ( 0.0%) 0.0286 ( 0.0%) 222934147 CGProfilePass 0.0221 ( 0.0%) 0.0009 ( 0.3%) 0.0230 ( 0.0%) 0.0255 ( 0.0%) 79855412 GlobalOptPass 0.0184 ( 0.0%) 0.0019 ( 0.6%) 0.0203 ( 0.0%) 0.0209 ( 0.0%) 205236334 JumpThreadingPass 0.0185 ( 0.0%) 0.0021 ( 0.7%) 0.0206 ( 0.0%) 0.0208 ( 0.0%) 175318325 LoopAnalysis 0.0164 ( 0.0%) 0.0030 ( 1.0%) 0.0194 ( 0.0%) 0.0199 ( 0.0%) 163560340 PostOrderFunctionAttrsPass 0.0188 ( 0.0%) 0.0004 ( 0.1%) 0.0193 ( 0.0%) 0.0194 ( 0.0%) 103197563 TailCallElimPass 0.0176 ( 0.0%) 0.0015 ( 0.5%) 0.0190 ( 0.0%) 0.0192 ( 0.0%) 130956806 MemCpyOptPass 0.0116 ( 0.0%) 0.0074 ( 2.4%) 0.0190 ( 0.0%) 0.0191 ( 0.0%) 221717778 AAManager 0.0163 ( 0.0%) 0.0013 ( 0.4%) 0.0176 ( 0.0%) 0.0178 ( 0.0%) 167126689 PostDominatorTreeAnalysis 0.0155 ( 0.0%) 0.0003 ( 0.1%) 0.0158 ( 0.0%) 0.0160 ( 0.0%) 162157524 CalledValuePropagationPass 0.0132 ( 0.0%) 0.0014 ( 0.5%) 0.0146 ( 0.0%) 0.0159 ( 0.0%) 87781235 IPSCCPPass 0.0127 ( 0.0%) 0.0008 ( 0.3%) 0.0135 ( 0.0%) 0.0140 ( 0.0%) 91128714 ReassociatePass 0.0101 ( 0.0%) 0.0009 ( 0.3%) 0.0110 ( 0.0%) 0.0111 ( 0.0%) 73124251 BDCEPass 0.0072 ( 0.0%) 0.0004 ( 0.1%) 0.0077 ( 0.0%) 0.0089 ( 0.0%) 60948332 LoopIdiomRecognizePass 0.0064 ( 0.0%) 0.0014 ( 0.5%) 0.0079 ( 0.0%) 0.0088 ( 0.0%) 80334128 LoopVectorizePass 0.0065 ( 0.0%) 0.0022 ( 0.7%) 0.0087 ( 0.0%) 0.0088 ( 0.0%) 105525946 BasicAA 0.0068 ( 0.0%) 0.0014 ( 0.5%) 0.0082 ( 0.0%) 0.0083 ( 0.0%) 86368700 LoopSimplifyPass 0.0071 ( 0.0%) 0.0005 ( 0.2%) 0.0075 ( 0.0%) 0.0077 ( 0.0%) 87195315 LICMPass 0.0052 ( 0.0%) 0.0024 ( 0.8%) 0.0076 ( 0.0%) 0.0075 ( 0.0%) 68859408 LowerMatrixIntrinsicsPass 0.0064 ( 0.0%) 0.0003 ( 0.1%) 0.0067 ( 0.0%) 0.0067 ( 0.0%) 72021939 LoopDeletionPass 0.0012 ( 0.0%) 0.0011 ( 0.4%) 0.0023 ( 0.0%) 0.0065 ( 0.0%) 28855092 TargetIRAnalysis 0.0052 ( 0.0%) 0.0006 ( 0.2%) 0.0058 ( 0.0%) 0.0058 ( 0.0%) 38197861 Float2IntPass 0.0047 ( 0.0%) 0.0009 ( 0.3%) 0.0056 ( 0.0%) 0.0056 ( 0.0%) 63722846 LoopSinkPass 0.0055 ( 0.0%) 0.0001 ( 0.0%) 0.0056 ( 0.0%) 0.0056 ( 0.0%) 61106373 LoopUnrollPass 0.0051 ( 0.0%) 0.0002 ( 0.1%) 0.0053 ( 0.0%) 0.0055 ( 0.0%) 60361028 VectorCombinePass 0.0044 ( 0.0%) 0.0002 ( 0.1%) 0.0046 ( 0.0%) 0.0049 ( 0.0%) 22674564 CallGraphAnalysis 0.0046 ( 0.0%) 0.0001 ( 0.0%) 0.0047 ( 0.0%) 0.0049 ( 0.0%) 12102487 GlobalDCEPass 0.0043 ( 0.0%) 0.0000 ( 0.0%) 0.0043 ( 0.0%) 0.0043 ( 0.0%) 48372244 InstSimplifyPass 0.0027 ( 0.0%) 0.0008 ( 0.3%) 0.0035 ( 0.0%) 0.0037 ( 0.0%) 45045562 ScalarEvolutionAnalysis 0.0030 ( 0.0%) 0.0003 ( 0.1%) 0.0033 ( 0.0%) 0.0036 ( 0.0%) 29145265 IndVarSimplifyPass 0.0025 ( 0.0%) 0.0002 ( 0.1%) 0.0027 ( 0.0%) 0.0032 ( 0.0%) 16671955 RequireAnalysisPass<llvm::GlobalsAA, llvm::Module, llvm::AnalysisManager<Module>> 0.0025 ( 0.0%) 0.0002 ( 0.1%) 0.0027 ( 0.0%) 0.0032 ( 0.0%) 16651504 GlobalsAA 0.0006 ( 0.0%) 0.0005 ( 0.2%) 0.0011 ( 0.0%) 0.0029 ( 0.0%) 8186724 OpenMPOptPass 0.0027 ( 0.0%) 0.0001 ( 0.0%) 0.0028 ( 0.0%) 0.0028 ( 0.0%) 12998003 ReversePostOrderFunctionAttrsPass 0.0019 ( 0.0%) 0.0006 ( 0.2%) 0.0025 ( 0.0%) 0.0028 ( 0.0%) 11967259 LowerExpectIntrinsicPass 0.0024 ( 0.0%) 0.0003 ( 0.1%) 0.0028 ( 0.0%) 0.0028 ( 0.0%) 19995960 LowerConstantIntrinsicsPass 0.0022 ( 0.0%) 0.0001 ( 0.0%) 0.0023 ( 0.0%) 0.0023 ( 0.0%) 19367864 LibCallsShrinkWrapPass 0.0019 ( 0.0%) 0.0001 ( 0.0%) 0.0020 ( 0.0%) 0.0021 ( 0.0%) 24061124 LoopLoadEliminationPass 0.0011 ( 0.0%) 0.0004 ( 0.1%) 0.0016 ( 0.0%) 0.0018 ( 0.0%) 35505583 LCSSAPass 0.0009 ( 0.0%) 0.0008 ( 0.3%) 0.0016 ( 0.0%) 0.0016 ( 0.0%) 22693970 MemoryDependenceAnalysis 0.0013 ( 0.0%) 0.0001 ( 0.0%) 0.0014 ( 0.0%) 0.0016 ( 0.0%) 9251166 InjectTLIMappings 0.0010 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) 0.0011 ( 0.0%) 2782049 AlwaysInlinerPass 0.0010 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) 0.0010 ( 0.0%) 5709095 DivRemPairsPass 0.0009 ( 0.0%) 0.0001 ( 0.0%) 0.0010 ( 0.0%) 0.0010 ( 0.0%) 12138843 MergedLoadStoreMotionPass 0.0007 ( 0.0%) 0.0001 ( 0.0%) 0.0009 ( 0.0%) 0.0010 ( 0.0%) 12095182 LoopFullUnrollPass 0.0004 ( 0.0%) 0.0002 ( 0.1%) 0.0007 ( 0.0%) 0.0009 ( 0.0%) 15168801 LoopRotatePass 0.0005 ( 0.0%) 0.0002 ( 0.1%) 0.0007 ( 0.0%) 0.0008 ( 0.0%) 18714381 TargetLibraryAnalysis 0.0006 ( 0.0%) 0.0001 ( 0.0%) 0.0007 ( 0.0%) 0.0007 ( 0.0%) 9991748 LoopInstSimplifyPass 0.0004 ( 0.0%) 0.0004 ( 0.1%) 0.0007 ( 0.0%) 0.0007 ( 0.0%) 10149528 LoopDistributePass 0.0003 ( 0.0%) 0.0002 ( 0.1%) 0.0004 ( 0.0%) 0.0007 ( 0.0%) 1096854 DeadArgumentEliminationPass 0.0006 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) 5367319 RecomputeGlobalsAAPass 0.0004 ( 0.0%) 0.0001 ( 0.0%) 0.0005 ( 0.0%) 0.0006 ( 0.0%) 8937323 PromotePass 0.0004 ( 0.0%) 0.0001 ( 0.0%) 0.0005 ( 0.0%) 0.0006 ( 0.0%) 9579538 SimpleLoopUnswitchPass 0.0004 ( 0.0%) 0.0002 ( 0.1%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) 16129558 DemandedBitsAnalysis 0.0005 ( 0.0%) 0.0001 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) 11233413 FunctionAnalysisManagerCGSCCProxy 0.0002 ( 0.0%) 0.0002 ( 0.1%) 0.0004 ( 0.0%) 0.0006 ( 0.0%) 11872487 RequireAnalysisPass<llvm::OptimizationRemarkEmitterAnalysis, llvm::Function, llvm::AnalysisManager<Function>> 0.0003 ( 0.0%) 0.0002 ( 0.1%) 0.0005 ( 0.0%) 0.0006 ( 0.0%) 16910811 LazyValueAnalysis 0.0004 ( 0.0%) 0.0001 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) 9314494 LoopSimplifyCFGPass 0.0003 ( 0.0%) 0.0002 ( 0.1%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) 13019354 AssumptionAnalysis 0.0002 ( 0.0%) 0.0002 ( 0.1%) 0.0004 ( 0.0%) 0.0005 ( 0.0%) 12099715 OptimizationRemarkEmitterAnalysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0004 ( 0.0%) 8403351 InvalidateAnalysisPass<llvm::AAManager> 0.0002 ( 0.0%) 0.0002 ( 0.1%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) 12032802 TypeBasedAA 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) 12031548 ScopedNoAliasAA 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0004 ( 0.0%) 8582619 CoroSplitPass 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0003 ( 0.0%) 1358379 InferFunctionAttrsPass 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 8383272 CoroElidePass 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 8467353 PhiValuesAnalysis 0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 4092920 ConstantMergePass 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 8279547 SpeculativeExecutionPass 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 8368351 ShouldNotRunFunctionPassesAnalysis 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 1312838 LazyCallGraphAnalysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 4855087 WarnMissedTransformationsPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 130368 CoroEarlyPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 3625888 AlignmentFromAssumptionsPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 3704343 LoopAccessAnalysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 111237 Annotation2MetadataPass 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 3574289 AnnotationRemarksPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 3611080 InvalidateAnalysisPass<llvm::ShouldNotRunFunctionPassesAnalysis> 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 47163 EliminateAvailableExternallyPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 17908 CoroCleanupPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 14976 RelLookupTableConverterPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 13763 ProfileSummaryAnalysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 12483 RequireAnalysisPass<llvm::ProfileSummaryAnalysis, llvm::Module, llvm::AnalysisManager<Module>> 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 12411 ForceFunctionAttrsPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 12678 InlineAdvisorAnalysis 131.5002 (100.0%) 0.3052 (100.0%) 131.8054 (100.0%) 132.5540 (100.0%) 1615901391352 Total ===-------------------------------------------------------------------------=== Miscellaneous Ungrouped Timers ===-------------------------------------------------------------------------=== ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 46.0915 ( 99.8%) 0.7497 ( 98.5%) 46.8412 ( 99.8%) 47.1692 ( 99.7%) 567401093834 Code Generation Time 0.0923 ( 0.2%) 0.0116 ( 1.5%) 0.1039 ( 0.2%) 0.1258 ( 0.3%) 1088790744 LLVM IR Generation Time 46.1838 (100.0%) 0.7613 (100.0%) 46.9451 (100.0%) 47.2950 (100.0%) 568489884578 Total ===-------------------------------------------------------------------------=== Register Allocation ===-------------------------------------------------------------------------=== Total Execution Time: 0.0021 seconds (0.0021 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 0.0020 (100.0%) 0.0001 (100.0%) 0.0021 (100.0%) 0.0021 (100.0%) 12292396 Seed Live Regs 0.0020 (100.0%) 0.0001 (100.0%) 0.0021 (100.0%) 0.0021 (100.0%) 12292396 Total ===-------------------------------------------------------------------------=== Instruction Selection and Scheduling ===-------------------------------------------------------------------------=== Total Execution Time: 0.4432 seconds (0.4524 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 0.1275 ( 32.3%) 0.0056 ( 11.6%) 0.1331 ( 30.0%) 0.1363 ( 30.1%) 1438634389 DAG Combining 1 0.0702 ( 17.8%) 0.0047 ( 9.7%) 0.0749 ( 16.9%) 0.0751 ( 16.6%) 1027837820 DAG Combining 2 0.0548 ( 13.9%) 0.0054 ( 11.1%) 0.0601 ( 13.6%) 0.0636 ( 14.1%) 791659261 Instruction Selection 0.0438 ( 11.1%) 0.0060 ( 12.5%) 0.0499 ( 11.3%) 0.0509 ( 11.2%) 712994861 Instruction Scheduling 0.0345 ( 8.7%) 0.0073 ( 15.1%) 0.0418 ( 9.4%) 0.0420 ( 9.3%) 654102488 Instruction Creation 0.0228 ( 5.8%) 0.0047 ( 9.8%) 0.0276 ( 6.2%) 0.0278 ( 6.2%) 481250135 DAG Legalization 0.0175 ( 4.4%) 0.0048 ( 9.9%) 0.0223 ( 5.0%) 0.0231 ( 5.1%) 455645073 Type Legalization 0.0092 ( 2.3%) 0.0047 ( 9.7%) 0.0139 ( 3.1%) 0.0137 ( 3.0%) 388554644 Instruction Scheduling Cleanup 0.0057 ( 1.4%) 0.0047 ( 9.8%) 0.0104 ( 2.4%) 0.0107 ( 2.4%) 326297296 Vector Legalization 0.0089 ( 2.2%) 0.0004 ( 0.8%) 0.0092 ( 2.1%) 0.0093 ( 2.0%) 98001723 DAG Combining after legalize types 0.3949 (100.0%) 0.0483 (100.0%) 0.4432 (100.0%) 0.4524 (100.0%) 6374977690 Total ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 2.4318 seconds (2.4717 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 0.6326 ( 32.9%) 0.2596 ( 50.9%) 0.8922 ( 36.7%) 0.9075 ( 36.7%) 9093031759 AArch64 Instruction Selection 0.1319 ( 6.9%) 0.2043 ( 40.0%) 0.3361 ( 13.8%) 0.3398 ( 13.7%) 3764363631 AArch64 Assembly Printer 0.2016 ( 10.5%) 0.0005 ( 0.1%) 0.2021 ( 8.3%) 0.2036 ( 8.2%) 2487079531 Branch Probability Basic Block Placement 0.1485 ( 7.7%) 0.0004 ( 0.1%) 0.1489 ( 6.1%) 0.1497 ( 6.1%) 1184297842 Control Flow Optimizer 0.0899 ( 4.7%) 0.0060 ( 1.2%) 0.0960 ( 3.9%) 0.0971 ( 3.9%) 1123119540 Merge disjoint stack slots 0.0566 ( 2.9%) 0.0017 ( 0.3%) 0.0582 ( 2.4%) 0.0592 ( 2.4%) 581010640 Greedy Register Allocator 0.0446 ( 2.3%) 0.0018 ( 0.3%) 0.0464 ( 1.9%) 0.0477 ( 1.9%) 398700449 CodeGen Prepare 0.0440 ( 2.3%) 0.0004 ( 0.1%) 0.0444 ( 1.8%) 0.0454 ( 1.8%) 320770210 Simple Register Coalescing 0.0375 ( 2.0%) 0.0008 ( 0.2%) 0.0384 ( 1.6%) 0.0384 ( 1.6%) 514716387 Live Variable Analysis 0.0324 ( 1.7%) 0.0012 ( 0.2%) 0.0336 ( 1.4%) 0.0337 ( 1.4%) 193160032 Live Interval Analysis 0.0311 ( 1.6%) 0.0004 ( 0.1%) 0.0316 ( 1.3%) 0.0330 ( 1.3%) 371458250 Machine Instruction Scheduler 0.0267 ( 1.4%) 0.0001 ( 0.0%) 0.0269 ( 1.1%) 0.0270 ( 1.1%) 331502370 AArch64 load / store optimization pass 0.0202 ( 1.0%) 0.0002 ( 0.0%) 0.0204 ( 0.8%) 0.0208 ( 0.8%) 130127378 Prologue/Epilogue Insertion & Frame Finalization 0.0159 ( 0.8%) 0.0003 ( 0.0%) 0.0162 ( 0.7%) 0.0169 ( 0.7%) 108527868 Machine code sinking 0.0150 ( 0.8%) 0.0011 ( 0.2%) 0.0162 ( 0.7%) 0.0162 ( 0.7%) 125256424 Memory SSA 0.0146 ( 0.8%) 0.0002 ( 0.0%) 0.0148 ( 0.6%) 0.0149 ( 0.6%) 157745290 Remove dead machine instructions 0.0120 ( 0.6%) 0.0003 ( 0.1%) 0.0123 ( 0.5%) 0.0126 ( 0.5%) 69240869 Virtual Register Rewriter 0.0119 ( 0.6%) 0.0003 ( 0.1%) 0.0122 ( 0.5%) 0.0124 ( 0.5%) 105492803 Machine Common Subexpression Elimination 0.0097 ( 0.5%) 0.0001 ( 0.0%) 0.0097 ( 0.4%) 0.0098 ( 0.4%) 62131793 Branch Probability Analysis #2 0.0092 ( 0.5%) 0.0002 ( 0.0%) 0.0094 ( 0.4%) 0.0097 ( 0.4%) 96000604 Two-Address instruction pass 0.0092 ( 0.5%) 0.0002 ( 0.0%) 0.0094 ( 0.4%) 0.0095 ( 0.4%) 113744830 Peephole Optimizations 0.0078 ( 0.4%) 0.0004 ( 0.1%) 0.0082 ( 0.3%) 0.0089 ( 0.4%) 103346285 Loop Strength Reduction 0.0083 ( 0.4%) 0.0002 ( 0.0%) 0.0085 ( 0.4%) 0.0085 ( 0.3%) 61189281 Branch Probability Analysis 0.0081 ( 0.4%) 0.0001 ( 0.0%) 0.0082 ( 0.3%) 0.0084 ( 0.3%) 100283314 Machine Copy Propagation Pass 0.0071 ( 0.4%) 0.0009 ( 0.2%) 0.0080 ( 0.3%) 0.0083 ( 0.3%) 56202830 Eliminate PHI nodes for register allocation 0.0070 ( 0.4%) 0.0005 ( 0.1%) 0.0075 ( 0.3%) 0.0080 ( 0.3%) 54314737 MachinePostDominator Tree Construction 0.0068 ( 0.4%) 0.0010 ( 0.2%) 0.0077 ( 0.3%) 0.0078 ( 0.3%) 44633924 Slot index numbering 0.0072 ( 0.4%) 0.0002 ( 0.0%) 0.0074 ( 0.3%) 0.0076 ( 0.3%) 87766406 Early Tail Duplication 0.0074 ( 0.4%) 0.0001 ( 0.0%) 0.0076 ( 0.3%) 0.0076 ( 0.3%) 80626051 Remove dead machine instructions #2 0.0069 ( 0.4%) 0.0005 ( 0.1%) 0.0074 ( 0.3%) 0.0074 ( 0.3%) 41014285 Slot index numbering #2 0.0060 ( 0.3%) 0.0007 ( 0.1%) 0.0067 ( 0.3%) 0.0072 ( 0.3%) 41140942 MachineDominator Tree Construction 0.0070 ( 0.4%) 0.0002 ( 0.0%) 0.0072 ( 0.3%) 0.0072 ( 0.3%) 73907009 Simplify the CFG 0.0068 ( 0.4%) 0.0001 ( 0.0%) 0.0069 ( 0.3%) 0.0069 ( 0.3%) 84586206 Machine Copy Propagation Pass #2 0.0061 ( 0.3%) 0.0004 ( 0.1%) 0.0065 ( 0.3%) 0.0065 ( 0.3%) 54340145 MachinePostDominator Tree Construction #2 0.0057 ( 0.3%) 0.0006 ( 0.1%) 0.0063 ( 0.3%) 0.0064 ( 0.3%) 54059079 Post-Dominator Tree Construction #2 0.0058 ( 0.3%) 0.0001 ( 0.0%) 0.0059 ( 0.2%) 0.0059 ( 0.2%) 46145979 AArch64 Collect Linker Optimization Hint (LOH) 0.0051 ( 0.3%) 0.0006 ( 0.1%) 0.0057 ( 0.2%) 0.0057 ( 0.2%) 54005700 Post-Dominator Tree Construction 0.0050 ( 0.3%) 0.0006 ( 0.1%) 0.0056 ( 0.2%) 0.0056 ( 0.2%) 44647405 MachinePostDominator Tree Construction #3 0.0048 ( 0.2%) 0.0003 ( 0.1%) 0.0050 ( 0.2%) 0.0056 ( 0.2%) 47534346 Machine InstCombiner 0.0044 ( 0.2%) 0.0004 ( 0.1%) 0.0049 ( 0.2%) 0.0049 ( 0.2%) 40128980 MachineDominator Tree Construction #4 0.0045 ( 0.2%) 0.0002 ( 0.0%) 0.0047 ( 0.2%) 0.0049 ( 0.2%) 42290173 AArch64 pseudo instruction expansion pass 0.0045 ( 0.2%) 0.0003 ( 0.1%) 0.0048 ( 0.2%) 0.0048 ( 0.2%) 48064278 Block Frequency Analysis 0.0044 ( 0.2%) 0.0004 ( 0.1%) 0.0048 ( 0.2%) 0.0048 ( 0.2%) 40080835 MachineDominator Tree Construction #2 0.0042 ( 0.2%) 0.0005 ( 0.1%) 0.0047 ( 0.2%) 0.0047 ( 0.2%) 41236504 MachineDominator Tree Construction #5 0.0038 ( 0.2%) 0.0002 ( 0.0%) 0.0040 ( 0.2%) 0.0047 ( 0.2%) 37338288 Constant Hoisting 0.0043 ( 0.2%) 0.0003 ( 0.1%) 0.0046 ( 0.2%) 0.0046 ( 0.2%) 39083275 Dominator Tree Construction #8 0.0044 ( 0.2%) 0.0001 ( 0.0%) 0.0046 ( 0.2%) 0.0045 ( 0.2%) 15237924 ObjC ARC contraction 0.0041 ( 0.2%) 0.0004 ( 0.1%) 0.0044 ( 0.2%) 0.0045 ( 0.2%) 39207224 Dominator Tree Construction #4 0.0037 ( 0.2%) 0.0003 ( 0.1%) 0.0040 ( 0.2%) 0.0044 ( 0.2%) 50164445 Induction Variable Users 0.0039 ( 0.2%) 0.0005 ( 0.1%) 0.0044 ( 0.2%) 0.0043 ( 0.2%) 38877096 Dominator Tree Construction 0.0038 ( 0.2%) 0.0003 ( 0.1%) 0.0042 ( 0.2%) 0.0041 ( 0.2%) 40417867 MachineDominator Tree Construction #3 0.0037 ( 0.2%) 0.0004 ( 0.1%) 0.0041 ( 0.2%) 0.0041 ( 0.2%) 39442007 Dominator Tree Construction #5 0.0039 ( 0.2%) 0.0001 ( 0.0%) 0.0040 ( 0.2%) 0.0041 ( 0.2%) 15783281 AArch64 Compress Jump Tables 0.0035 ( 0.2%) 0.0005 ( 0.1%) 0.0040 ( 0.2%) 0.0040 ( 0.2%) 34129315 MachineDominator Tree Construction #6 0.0026 ( 0.1%) 0.0014 ( 0.3%) 0.0039 ( 0.2%) 0.0040 ( 0.2%) 32983814 Free MachineFunction 0.0034 ( 0.2%) 0.0005 ( 0.1%) 0.0039 ( 0.2%) 0.0039 ( 0.2%) 38705492 Dominator Tree Construction #2 0.0035 ( 0.2%) 0.0002 ( 0.0%) 0.0036 ( 0.1%) 0.0039 ( 0.2%) 39711609 Local Stack Slot Allocation 0.0037 ( 0.2%) 0.0002 ( 0.0%) 0.0038 ( 0.2%) 0.0038 ( 0.2%) 26998014 Machine Block Frequency Analysis #5 0.0037 ( 0.2%) 0.0001 ( 0.0%) 0.0038 ( 0.2%) 0.0038 ( 0.2%) 14187857 Finalize ISel and expand pseudo-instructions 0.0034 ( 0.2%) 0.0005 ( 0.1%) 0.0038 ( 0.2%) 0.0038 ( 0.2%) 39547991 Dominator Tree Construction #3 0.0035 ( 0.2%) 0.0003 ( 0.1%) 0.0038 ( 0.2%) 0.0038 ( 0.2%) 39124746 Dominator Tree Construction #6 0.0035 ( 0.2%) 0.0001 ( 0.0%) 0.0037 ( 0.2%) 0.0038 ( 0.2%) 18626552 AArch64 Condition Optimizer 0.0037 ( 0.2%) 0.0001 ( 0.0%) 0.0037 ( 0.2%) 0.0038 ( 0.2%) 28787069 AArch64 Dead register definitions 0.0034 ( 0.2%) 0.0002 ( 0.0%) 0.0036 ( 0.1%) 0.0038 ( 0.2%) 15302878 Branch relaxation pass 0.0033 ( 0.2%) 0.0003 ( 0.1%) 0.0036 ( 0.1%) 0.0037 ( 0.1%) 39363543 Dominator Tree Construction #7 0.0032 ( 0.2%) 0.0001 ( 0.0%) 0.0034 ( 0.1%) 0.0036 ( 0.1%) 21702873 Post-RA pseudo instruction expansion pass 0.0033 ( 0.2%) 0.0001 ( 0.0%) 0.0034 ( 0.1%) 0.0034 ( 0.1%) 31528840 Machine Block Frequency Analysis #3 0.0030 ( 0.2%) 0.0002 ( 0.0%) 0.0031 ( 0.1%) 0.0033 ( 0.1%) 31375217 Machine Block Frequency Analysis 0.0030 ( 0.2%) 0.0001 ( 0.0%) 0.0031 ( 0.1%) 0.0031 ( 0.1%) 13939713 Interleaved Load Combine Pass 0.0029 ( 0.2%) 0.0001 ( 0.0%) 0.0030 ( 0.1%) 0.0031 ( 0.1%) 31374222 Machine Block Frequency Analysis #2 0.0026 ( 0.1%) 0.0002 ( 0.0%) 0.0028 ( 0.1%) 0.0030 ( 0.1%) 22842835 Shrink Wrapping analysis 0.0029 ( 0.2%) 0.0001 ( 0.0%) 0.0030 ( 0.1%) 0.0030 ( 0.1%) 8921850 AArch64 Conditional Branch Tuning 0.0028 ( 0.1%) 0.0001 ( 0.0%) 0.0029 ( 0.1%) 0.0029 ( 0.1%) 7404709 Unpack machine instruction bundles 0.0027 ( 0.1%) 0.0001 ( 0.0%) 0.0028 ( 0.1%) 0.0028 ( 0.1%) 31289526 Machine Block Frequency Analysis #4 0.0024 ( 0.1%) 0.0001 ( 0.0%) 0.0026 ( 0.1%) 0.0027 ( 0.1%) 16579584 PostRA Machine Sink 0.0026 ( 0.1%) 0.0001 ( 0.0%) 0.0027 ( 0.1%) 0.0027 ( 0.1%) 20830194 Natural Loop Information #6 0.0022 ( 0.1%) 0.0004 ( 0.1%) 0.0027 ( 0.1%) 0.0027 ( 0.1%) 39019060 Natural Loop Information 0.0017 ( 0.1%) 0.0002 ( 0.0%) 0.0019 ( 0.1%) 0.0026 ( 0.1%) 16821219 Tail Duplication 0.0024 ( 0.1%) 0.0002 ( 0.0%) 0.0026 ( 0.1%) 0.0026 ( 0.1%) 32596316 Canonicalize Freeze Instructions in Loops 0.0024 ( 0.1%) 0.0001 ( 0.0%) 0.0026 ( 0.1%) 0.0026 ( 0.1%) 17441685 Lower constant intrinsics 0.0022 ( 0.1%) 0.0002 ( 0.0%) 0.0024 ( 0.1%) 0.0025 ( 0.1%) 18700525 Machine Natural Loop Construction 0.0022 ( 0.1%) 0.0001 ( 0.0%) 0.0023 ( 0.1%) 0.0023 ( 0.1%) 14093543 Remove unreachable machine basic blocks 0.0021 ( 0.1%) 0.0001 ( 0.0%) 0.0022 ( 0.1%) 0.0022 ( 0.1%) 11657502 AArch64 MI Peephole Optimization pass 0.0021 ( 0.1%) 0.0001 ( 0.0%) 0.0022 ( 0.1%) 0.0022 ( 0.1%) 10808188 Insert stack protectors 0.0021 ( 0.1%) 0.0001 ( 0.0%) 0.0022 ( 0.1%) 0.0022 ( 0.1%) 18979256 Expand memcmp() to load/stores 0.0021 ( 0.1%) 0.0001 ( 0.0%) 0.0022 ( 0.1%) 0.0022 ( 0.1%) 20817342 Natural Loop Information #5 0.0020 ( 0.1%) 0.0001 ( 0.0%) 0.0022 ( 0.1%) 0.0022 ( 0.1%) 20738170 Natural Loop Information #3 0.0020 ( 0.1%) 0.0001 ( 0.0%) 0.0021 ( 0.1%) 0.0021 ( 0.1%) 19900880 Natural Loop Information #4 0.0019 ( 0.1%) 0.0000 ( 0.0%) 0.0019 ( 0.1%) 0.0021 ( 0.1%) 7976838 AArch64 Promote Constant 0.0019 ( 0.1%) 0.0001 ( 0.0%) 0.0020 ( 0.1%) 0.0020 ( 0.1%) 9966904 AArch64 Store Pair Suppression 0.0019 ( 0.1%) 0.0001 ( 0.0%) 0.0020 ( 0.1%) 0.0020 ( 0.1%) 15096748 Type Promotion 0.0019 ( 0.1%) 0.0001 ( 0.0%) 0.0020 ( 0.1%) 0.0020 ( 0.1%) 9099038 AArch64 Stack Tagging PreRA 0.0017 ( 0.1%) 0.0001 ( 0.0%) 0.0018 ( 0.1%) 0.0020 ( 0.1%) 10014588 Expand large div/rem 0.0018 ( 0.1%) 0.0001 ( 0.0%) 0.0019 ( 0.1%) 0.0020 ( 0.1%) 18664096 Machine Natural Loop Construction #3 0.0018 ( 0.1%) 0.0001 ( 0.0%) 0.0019 ( 0.1%) 0.0019 ( 0.1%) 18156000 Machine Cycle Info Analysis 0.0019 ( 0.1%) 0.0001 ( 0.0%) 0.0020 ( 0.1%) 0.0019 ( 0.1%) 19852274 Natural Loop Information #2 0.0018 ( 0.1%) 0.0001 ( 0.0%) 0.0019 ( 0.1%) 0.0019 ( 0.1%) 13589190 Remove unreachable blocks from the CFG 0.0017 ( 0.1%) 0.0001 ( 0.0%) 0.0018 ( 0.1%) 0.0019 ( 0.1%) 18533280 Machine Natural Loop Construction #2 0.0018 ( 0.1%) 0.0001 ( 0.0%) 0.0019 ( 0.1%) 0.0019 ( 0.1%) 9133019 Process Implicit Definitions 0.0018 ( 0.1%) 0.0001 ( 0.0%) 0.0018 ( 0.1%) 0.0019 ( 0.1%) 16950641 Machine Natural Loop Construction #4 0.0018 ( 0.1%) 0.0001 ( 0.0%) 0.0019 ( 0.1%) 0.0019 ( 0.1%) 11227404 Interleaved Access Pass 0.0017 ( 0.1%) 0.0001 ( 0.0%) 0.0018 ( 0.1%) 0.0018 ( 0.1%) 9472616 Debug Variable Analysis 0.0017 ( 0.1%) 0.0001 ( 0.0%) 0.0018 ( 0.1%) 0.0018 ( 0.1%) 18265577 Partially inline calls to library functions 0.0012 ( 0.1%) 0.0002 ( 0.0%) 0.0014 ( 0.1%) 0.0016 ( 0.1%) 18316073 Early Machine Loop Invariant Code Motion 0.0013 ( 0.1%) 0.0001 ( 0.0%) 0.0014 ( 0.1%) 0.0014 ( 0.1%) 8077346 AArch64 Expand Hardened Pseudos 0.0013 ( 0.1%) 0.0001 ( 0.0%) 0.0014 ( 0.1%) 0.0014 ( 0.1%) 12465953 Early If-Conversion 0.0011 ( 0.1%) 0.0001 ( 0.0%) 0.0012 ( 0.1%) 0.0014 ( 0.1%) 8907759 AArch64 Redundant Copy Elimination 0.0010 ( 0.0%) 0.0001 ( 0.0%) 0.0011 ( 0.0%) 0.0014 ( 0.1%) 11883955 AArch64 Conditional Compares 0.0012 ( 0.1%) 0.0001 ( 0.0%) 0.0013 ( 0.1%) 0.0013 ( 0.1%) 9928839 Replace intrinsics with calls to vector library 0.0012 ( 0.1%) 0.0001 ( 0.0%) 0.0013 ( 0.1%) 0.0013 ( 0.1%) 11218060 Expand Atomic instructions 0.0010 ( 0.1%) 0.0001 ( 0.0%) 0.0011 ( 0.0%) 0.0011 ( 0.0%) 11063391 Scalarize Masked Memory Intrinsics 0.0009 ( 0.0%) 0.0001 ( 0.0%) 0.0010 ( 0.0%) 0.0011 ( 0.0%) 10129231 Expand vector predication intrinsics 0.0008 ( 0.0%) 0.0001 ( 0.0%) 0.0009 ( 0.0%) 0.0010 ( 0.0%) 13439385 Scalar Evolution Analysis 0.0007 ( 0.0%) 0.0002 ( 0.0%) 0.0008 ( 0.0%) 0.0010 ( 0.0%) 7808228 Optimize machine instruction PHIs 0.0002 ( 0.0%) 0.0003 ( 0.1%) 0.0004 ( 0.0%) 0.0010 ( 0.0%) 7225458 AArch64 SIMD instructions optimization pass 0.0009 ( 0.0%) 0.0001 ( 0.0%) 0.0010 ( 0.0%) 0.0009 ( 0.0%) 10030927 Expand reduction intrinsics 0.0007 ( 0.0%) 0.0001 ( 0.0%) 0.0009 ( 0.0%) 0.0009 ( 0.0%) 10509325 Exception handling preparation 0.0006 ( 0.0%) 0.0001 ( 0.0%) 0.0007 ( 0.0%) 0.0009 ( 0.0%) 10756261 Loop Data Prefetch 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0008 ( 0.0%) 399932 Stack Safety Analysis 0.0006 ( 0.0%) 0.0001 ( 0.0%) 0.0007 ( 0.0%) 0.0007 ( 0.0%) 9552554 Bundle Machine CFG Edges 0.0005 ( 0.0%) 0.0001 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) 8151432 Spill Code Placement Analysis 0.0003 ( 0.0%) 0.0002 ( 0.0%) 0.0005 ( 0.0%) 0.0006 ( 0.0%) 8592314 Canonicalize natural loops 0.0004 ( 0.0%) 0.0002 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) 7705783 Machine Trace Metrics 0.0003 ( 0.0%) 0.0001 ( 0.0%) 0.0005 ( 0.0%) 0.0006 ( 0.0%) 8462909 Basic Alias Analysis (stateless AA impl) 0.0005 ( 0.0%) 0.0001 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) 8259233 Merge contiguous icmps into a memcmp 0.0004 ( 0.0%) 0.0001 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) 6654996 AArch64 sls hardening pass 0.0004 ( 0.0%) 0.0001 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) 8165062 Function Alias Analysis Results #5 0.0004 ( 0.0%) 0.0001 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) 8290975 Machine Loop Invariant Code Motion 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0005 ( 0.0%) 382929 Machine Outliner 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0003 ( 0.0%) 0.0004 ( 0.0%) 6516167 Remove Redundant DEBUG_VALUE analysis 0.0003 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0004 ( 0.0%) 7040456 Basic Alias Analysis (stateless AA impl) #5 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 7042205 Live Register Matrix 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 7919675 Function Alias Analysis Results #3 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 7914823 Function Alias Analysis Results #2 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 7427618 Falkor HW Prefetch Fix 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 7919671 Function Alias Analysis Results #4 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 7001940 Basic Alias Analysis (stateless AA impl) #4 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 7897532 Function Alias Analysis Results 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 6738461 Machine Trace Metrics #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0003 ( 0.0%) 6432875 Insert CFI remember/restore state instructions 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6552466 Virtual Register Map 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0002 ( 0.0%) 6857488 Lazy Branch Probability Analysis #3 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0002 ( 0.0%) 6980404 Basic Alias Analysis (stateless AA impl) #3 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6992533 Basic Alias Analysis (stateless AA impl) #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6567037 Live DEBUG_VALUE analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6546165 Insert KCFI indirect call checks 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 7115379 Canonicalize natural loops #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6705679 SME ABI Pass 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0002 ( 0.0%) 6901143 Lazy Branch Probability Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6902561 Lazy Branch Probability Analysis #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6400876 Lazy Machine Block Frequency Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6435520 Insert fentry calls 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6403950 Lazy Machine Block Frequency Analysis #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6402121 Lazy Machine Block Frequency Analysis #6 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6405330 Falkor HW Prefetch Fix Late Phase 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6402810 AArch64 Branch Targets 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6502009 Insert XRay ops 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6408743 TLS Variable Hoist 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6465573 Implement the 'patchable-function' attribute 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6411075 SME Peephole Optimization pass 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6425271 PostRA Machine Instruction Scheduler 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6485534 Machine Optimization Remark Emitter #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6403937 Rename Disconnected Subregister Components 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6409365 Live Stack Slot Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6485019 Machine Optimization Remark Emitter #3 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6401670 Register Allocation Pass Scoring 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6485531 Machine Optimization Remark Emitter 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6403950 AArch64 speculation hardening pass 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6440444 Stack Slot Coloring 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6412128 Fixup Statepoint Caller Saved 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6376590 Lazy Block Frequency Analysis #3 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6403950 Lazy Machine Block Frequency Analysis #5 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6367954 Safe Stack instrumentation pass 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6403685 StackMap Liveness Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6402217 Analyze Machine Code For Garbage Collection 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6412517 A57 FP Anti-dependency breaker 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6376590 Lazy Block Frequency Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6365760 AArch64 Stack Tagging 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6418882 Contiguously Lay Out Funclets 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6362055 Lower Garbage Collection Instructions 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6420842 AArch64 Indirect Thunks 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6364901 Shadow Stack GC Lowering 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6401666 Workaround A53 erratum 835769 pass 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6408311 Lazy Machine Block Frequency Analysis #3 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6360056 Merge internal globals 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6453640 Optimization Remark Emitter 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6376621 Lazy Block Frequency Analysis #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6401955 Detect Dead Lanes 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 6403961 Lazy Machine Block Frequency Analysis #4 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 495128 Assumption Cache Tracker 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 34630 Pre-ISel Intrinsic Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 11153 Create Garbage Collector Module Metadata 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 11153 Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 11153 Machine Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 11153 Profile summary info 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 11153 Default Regalloc Priority Advisor 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 11153 Default Regalloc Eviction Advisor 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 11153 Target Transform Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 11153 Type-Based Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 22473 Lower @llvm.global_dtors via `__cxa_atexit` 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 11153 Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 11153 Scoped NoAlias Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 11153 Target Pass Configuration 1.9215 (100.0%) 0.5103 (100.0%) 2.4318 (100.0%) 2.4717 (100.0%) 24676503454 Total ===-------------------------------------------------------------------------=== DWARF Emission ===-------------------------------------------------------------------------=== Total Execution Time: 0.0681 seconds (0.0690 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 0.0323 (100.0%) 0.0358 (100.0%) 0.0681 (100.0%) 0.0690 (100.0%) 2375980112 DWARF Exception Writer 0.0323 (100.0%) 0.0358 (100.0%) 0.0681 (100.0%) 0.0690 (100.0%) 2375980112 Total ===-------------------------------------------------------------------------=== Clang front-end time report ===-------------------------------------------------------------------------=== Total Execution Time: 48.2802 seconds (48.8638 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 47.3865 (100.0%) 0.8937 (100.0%) 48.2802 (100.0%) 48.8638 (100.0%) 578082259552 Clang front-end timer 47.3865 (100.0%) 0.8937 (100.0%) 48.2802 (100.0%) 48.8638 (100.0%) 578082259552 Total -ftime-report -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB 47.40s user 0.93s system 98% cpu 49.062 total ``` After ``` % time /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -ftime-report -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/Users/nshulga/git/pytorch/pytorch/build/aten/src -I/Users/nshulga/git/pytorch/pytorch/aten/src -I/Users/nshulga/git/pytorch/pytorch/build -I/Users/nshulga/git/pytorch/pytorch -I/Users/nshulga/git/pytorch/pytorch/cmake/../third_party/benchmark/include -I/Users/nshulga/git/pytorch/pytorch/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/build/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/build/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include -I/Users/nshulga/git/pytorch/pytorch/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/../aten/src -I/Users/nshulga/git/pytorch/pytorch/torch/csrc -I/Users/nshulga/git/pytorch/pytorch/third_party/miniz-2.1.0 -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/include -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/src -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/FXdiv/include -I/Users/nshulga/git/pytorch/pytorch/c10/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/pthreadpool/include -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/deps/clog/include -I/Users/nshulga/git/pytorch/pytorch/third_party/NNPACK/include -I/Users/nshulga/git/pytorch/pytorch/third_party/FP16/include -I/Users/nshulga/git/pytorch/pytorch/third_party/fmt/include -I/Users/nshulga/git/pytorch/pytorch/third_party/flatbuffers/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googletest/include -isystem /Users/nshulga/git/pytorch/pytorch/third_party/protobuf/src -isystem /Users/nshulga/git/pytorch/pytorch/third_party/XNNPACK/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/eigen -isystem /Users/nshulga/git/pytorch/pytorch/build/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wsuggest-override -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wno-error=inconsistent-missing-override -Wno-error=inconsistent-missing-destructor-override -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -faligned-new -Werror -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces -O3 -DNDEBUG -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk -fPIC -D__NEON__ -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-unused-function -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-strict-overflow -Wno-strict-aliasing -fvisibility=hidden -O2 -Wmissing-prototypes -Werror=missing-prototypes -Xpreprocessor -fopenmp -I/Users/nshulga/miniforge3/include -std=gnu++17 -Wno-missing-prototypes -Wno-error=missing-prototypes -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSchema.cpp.o -c /Users/nshulga/git/pytorch/pytorch/build/aten/src/ATen/RegisterSchema.cpp ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 1.2920 seconds (1.3187 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 0.3070 ( 27.6%) 0.0547 ( 30.2%) 0.3617 ( 28.0%) 0.3654 ( 27.7%) 3719690895 ModuleInlinerWrapperPass 0.3024 ( 27.2%) 0.0525 ( 29.0%) 0.3549 ( 27.5%) 0.3585 ( 27.2%) 3653363330 DevirtSCCRepeatedPass 0.0619 ( 5.6%) 0.0073 ( 4.0%) 0.0692 ( 5.4%) 0.0711 ( 5.4%) 868136227 InstCombinePass 0.0601 ( 5.4%) 0.0065 ( 3.6%) 0.0666 ( 5.2%) 0.0679 ( 5.1%) 696430647 InlinerPass 0.0363 ( 3.3%) 0.0033 ( 1.8%) 0.0396 ( 3.1%) 0.0425 ( 3.2%) 535426974 SimplifyCFGPass 0.0280 ( 2.5%) 0.0069 ( 3.8%) 0.0348 ( 2.7%) 0.0358 ( 2.7%) 378716394 BlockFrequencyAnalysis 0.0208 ( 1.9%) 0.0049 ( 2.7%) 0.0257 ( 2.0%) 0.0262 ( 2.0%) 283689627 BranchProbabilityAnalysis 0.0239 ( 2.1%) 0.0002 ( 0.1%) 0.0241 ( 1.9%) 0.0241 ( 1.8%) 219122704 OpenMPOptCGSCCPass 0.0174 ( 1.6%) 0.0015 ( 0.8%) 0.0189 ( 1.5%) 0.0192 ( 1.5%) 215583965 GVNPass 0.0153 ( 1.4%) 0.0025 ( 1.4%) 0.0178 ( 1.4%) 0.0187 ( 1.4%) 184232295 EarlyCSEPass 0.0079 ( 0.7%) 0.0064 ( 3.5%) 0.0143 ( 1.1%) 0.0145 ( 1.1%) 192415300 AAManager 0.0116 ( 1.0%) 0.0019 ( 1.0%) 0.0134 ( 1.0%) 0.0135 ( 1.0%) 153354488 JumpThreadingPass 0.0099 ( 0.9%) 0.0023 ( 1.3%) 0.0122 ( 0.9%) 0.0131 ( 1.0%) 128911185 CGProfilePass 0.0081 ( 0.7%) 0.0022 ( 1.2%) 0.0103 ( 0.8%) 0.0128 ( 1.0%) 112266933 SLPVectorizerPass 0.0119 ( 1.1%) 0.0005 ( 0.3%) 0.0124 ( 1.0%) 0.0125 ( 0.9%) 131510939 MemorySSAAnalysis 0.0122 ( 1.1%) 0.0002 ( 0.1%) 0.0124 ( 1.0%) 0.0124 ( 0.9%) 129264559 DSEPass 0.0108 ( 1.0%) 0.0010 ( 0.6%) 0.0118 ( 0.9%) 0.0119 ( 0.9%) 158891693 DominatorTreeAnalysis 0.0116 ( 1.0%) 0.0002 ( 0.1%) 0.0119 ( 0.9%) 0.0119 ( 0.9%) 118946130 CorrelatedValuePropagationPass 0.0082 ( 0.7%) 0.0017 ( 0.9%) 0.0099 ( 0.8%) 0.0100 ( 0.8%) 120247256 LoopAnalysis 0.0090 ( 0.8%) 0.0008 ( 0.5%) 0.0099 ( 0.8%) 0.0099 ( 0.8%) 84784225 ADCEPass 0.0076 ( 0.7%) 0.0014 ( 0.8%) 0.0090 ( 0.7%) 0.0098 ( 0.7%) 111411449 SROAPass 0.0080 ( 0.7%) 0.0005 ( 0.3%) 0.0085 ( 0.7%) 0.0085 ( 0.6%) 109824455 PostDominatorTreeAnalysis 0.0063 ( 0.6%) 0.0012 ( 0.7%) 0.0076 ( 0.6%) 0.0079 ( 0.6%) 80323239 LoopVectorizePass 0.0068 ( 0.6%) 0.0003 ( 0.2%) 0.0071 ( 0.6%) 0.0076 ( 0.6%) 60675565 LoopIdiomRecognizePass 0.0068 ( 0.6%) 0.0004 ( 0.2%) 0.0072 ( 0.6%) 0.0071 ( 0.5%) 87177852 LICMPass 0.0046 ( 0.4%) 0.0021 ( 1.1%) 0.0067 ( 0.5%) 0.0069 ( 0.5%) 74829034 PostOrderFunctionAttrsPass 0.0064 ( 0.6%) 0.0001 ( 0.1%) 0.0065 ( 0.5%) 0.0065 ( 0.5%) 48619557 SCCPPass 0.0063 ( 0.6%) 0.0001 ( 0.1%) 0.0064 ( 0.5%) 0.0064 ( 0.5%) 71987307 LoopDeletionPass 0.0058 ( 0.5%) 0.0000 ( 0.0%) 0.0059 ( 0.5%) 0.0059 ( 0.4%) 71423762 HotColdSplittingPass 0.0050 ( 0.5%) 0.0006 ( 0.3%) 0.0057 ( 0.4%) 0.0056 ( 0.4%) 57327860 MemCpyOptPass 0.0043 ( 0.4%) 0.0013 ( 0.7%) 0.0056 ( 0.4%) 0.0056 ( 0.4%) 73868907 LoopSimplifyPass 0.0054 ( 0.5%) 0.0000 ( 0.0%) 0.0055 ( 0.4%) 0.0055 ( 0.4%) 61231613 LoopUnrollPass 0.0045 ( 0.4%) 0.0009 ( 0.5%) 0.0054 ( 0.4%) 0.0054 ( 0.4%) 63427035 LoopSinkPass 0.0031 ( 0.3%) 0.0022 ( 1.2%) 0.0053 ( 0.4%) 0.0053 ( 0.4%) 60661182 LowerMatrixIntrinsicsPass 0.0039 ( 0.3%) 0.0003 ( 0.2%) 0.0042 ( 0.3%) 0.0053 ( 0.4%) 37913352 GlobalOptPass 0.0037 ( 0.3%) 0.0010 ( 0.6%) 0.0047 ( 0.4%) 0.0050 ( 0.4%) 40405305 IPSCCPPass 0.0031 ( 0.3%) 0.0014 ( 0.8%) 0.0045 ( 0.3%) 0.0046 ( 0.3%) 76160561 BasicAA 0.0036 ( 0.3%) 0.0007 ( 0.4%) 0.0043 ( 0.3%) 0.0043 ( 0.3%) 40024164 BDCEPass 0.0011 ( 0.1%) 0.0009 ( 0.5%) 0.0020 ( 0.2%) 0.0036 ( 0.3%) 27093400 TargetIRAnalysis 0.0033 ( 0.3%) 0.0002 ( 0.1%) 0.0035 ( 0.3%) 0.0035 ( 0.3%) 39935174 TailCallElimPass 0.0026 ( 0.2%) 0.0007 ( 0.4%) 0.0033 ( 0.3%) 0.0033 ( 0.3%) 44962489 ScalarEvolutionAnalysis 0.0028 ( 0.3%) 0.0002 ( 0.1%) 0.0030 ( 0.2%) 0.0032 ( 0.2%) 30018982 ReassociatePass 0.0028 ( 0.3%) 0.0002 ( 0.1%) 0.0030 ( 0.2%) 0.0032 ( 0.2%) 28955128 IndVarSimplifyPass 0.0030 ( 0.3%) 0.0001 ( 0.0%) 0.0031 ( 0.2%) 0.0031 ( 0.2%) 31205149 CalledValuePropagationPass 0.0018 ( 0.2%) 0.0004 ( 0.2%) 0.0022 ( 0.2%) 0.0022 ( 0.2%) 22045025 Float2IntPass 0.0020 ( 0.2%) 0.0001 ( 0.0%) 0.0020 ( 0.2%) 0.0020 ( 0.2%) 23867545 LoopLoadEliminationPass 0.0006 ( 0.1%) 0.0005 ( 0.3%) 0.0011 ( 0.1%) 0.0020 ( 0.2%) 7821972 OpenMPOptPass 0.0011 ( 0.1%) 0.0004 ( 0.2%) 0.0015 ( 0.1%) 0.0017 ( 0.1%) 35512421 LCSSAPass 0.0015 ( 0.1%) 0.0002 ( 0.1%) 0.0017 ( 0.1%) 0.0017 ( 0.1%) 28268765 VectorCombinePass 0.0009 ( 0.1%) 0.0007 ( 0.4%) 0.0016 ( 0.1%) 0.0016 ( 0.1%) 23018362 MemoryDependenceAnalysis 0.0014 ( 0.1%) 0.0000 ( 0.0%) 0.0015 ( 0.1%) 0.0015 ( 0.1%) 9265818 GlobalDCEPass 0.0013 ( 0.1%) 0.0000 ( 0.0%) 0.0013 ( 0.1%) 0.0013 ( 0.1%) 17548240 InstSimplifyPass 0.0009 ( 0.1%) 0.0004 ( 0.2%) 0.0013 ( 0.1%) 0.0013 ( 0.1%) 15122797 LowerConstantIntrinsicsPass 0.0011 ( 0.1%) 0.0000 ( 0.0%) 0.0011 ( 0.1%) 0.0011 ( 0.1%) 8506690 CallGraphAnalysis 0.0008 ( 0.1%) 0.0000 ( 0.0%) 0.0009 ( 0.1%) 0.0009 ( 0.1%) 7505976 RequireAnalysisPass<llvm::GlobalsAA, llvm::Module, llvm::AnalysisManager<Module>> 0.0008 ( 0.1%) 0.0000 ( 0.0%) 0.0009 ( 0.1%) 0.0009 ( 0.1%) 7485525 GlobalsAA 0.0005 ( 0.0%) 0.0002 ( 0.1%) 0.0007 ( 0.1%) 0.0009 ( 0.1%) 9580105 LowerExpectIntrinsicPass 0.0007 ( 0.1%) 0.0001 ( 0.1%) 0.0008 ( 0.1%) 0.0008 ( 0.1%) 12017197 LoopFullUnrollPass 0.0006 ( 0.1%) 0.0001 ( 0.1%) 0.0007 ( 0.1%) 0.0007 ( 0.1%) 11381083 MergedLoadStoreMotionPass 0.0004 ( 0.0%) 0.0003 ( 0.2%) 0.0007 ( 0.1%) 0.0007 ( 0.1%) 10150222 LoopDistributePass 0.0007 ( 0.1%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) 0.0007 ( 0.1%) 5649265 ReversePostOrderFunctionAttrsPass 0.0005 ( 0.0%) 0.0002 ( 0.1%) 0.0007 ( 0.1%) 0.0007 ( 0.1%) 18702545 TargetLibraryAnalysis 0.0006 ( 0.0%) 0.0001 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) 9964138 LoopInstSimplifyPass 0.0004 ( 0.0%) 0.0002 ( 0.1%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) 15049482 LoopRotatePass 0.0005 ( 0.0%) 0.0001 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) 10571955 LibCallsShrinkWrapPass 0.0004 ( 0.0%) 0.0002 ( 0.1%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) 16184249 DemandedBitsAnalysis 0.0004 ( 0.0%) 0.0001 ( 0.1%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) 11227136 FunctionAnalysisManagerCGSCCProxy 0.0002 ( 0.0%) 0.0002 ( 0.1%) 0.0004 ( 0.0%) 0.0005 ( 0.0%) 11871494 RequireAnalysisPass<llvm::OptimizationRemarkEmitterAnalysis, llvm::Function, llvm::AnalysisManager<Function>> 0.0003 ( 0.0%) 0.0002 ( 0.1%) 0.0006 ( 0.0%) 0.0005 ( 0.0%) 16911686 LazyValueAnalysis 0.0004 ( 0.0%) 0.0001 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) 9333915 LoopSimplifyCFGPass 0.0003 ( 0.0%) 0.0002 ( 0.1%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) 13022664 AssumptionAnalysis 0.0003 ( 0.0%) 0.0001 ( 0.1%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) 9524395 SimpleLoopUnswitchPass 0.0002 ( 0.0%) 0.0001 ( 0.1%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) 12094779 OptimizationRemarkEmitterAnalysis 0.0002 ( 0.0%) 0.0002 ( 0.1%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) 12032778 ScopedNoAliasAA 0.0002 ( 0.0%) 0.0001 ( 0.1%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) 12032220 TypeBasedAA 0.0002 ( 0.0%) 0.0001 ( 0.1%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 8581050 CoroSplitPass 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 5126709 InjectTLIMappings 0.0002 ( 0.0%) 0.0001 ( 0.1%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 8379445 CoroElidePass 0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 3890082 RecomputeGlobalsAAPass 0.0002 ( 0.0%) 0.0001 ( 0.1%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 8281975 SpeculativeExecutionPass 0.0002 ( 0.0%) 0.0001 ( 0.1%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 8468516 PhiValuesAnalysis 0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 4100685 ConstantMergePass 0.0002 ( 0.0%) 0.0001 ( 0.1%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 8462530 PromotePass 0.0001 ( 0.0%) 0.0001 ( 0.1%) 0.0002 ( 0.0%) 0.0003 ( 0.0%) 8345373 InvalidateAnalysisPass<llvm::AAManager> 0.0002 ( 0.0%) 0.0001 ( 0.1%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) 8368732 ShouldNotRunFunctionPassesAnalysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 1308997 InferFunctionAttrsPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 4283689 DivRemPairsPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 4855512 WarnMissedTransformationsPass 0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) 1157640 LazyCallGraphAnalysis 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0001 ( 0.0%) 444866 DeadArgumentEliminationPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 3627306 AlignmentFromAssumptionsPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 3706342 LoopAccessAnalysis 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 3573986 AnnotationRemarksPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 632159 AlwaysInlinerPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 3611080 InvalidateAnalysisPass<llvm::ShouldNotRunFunctionPassesAnalysis> 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 47153 EliminateAvailableExternallyPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 56285 Annotation2MetadataPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 58150 CoroEarlyPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 14016 CoroCleanupPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 13044 RelLookupTableConverterPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 13763 ProfileSummaryAnalysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 12678 InlineAdvisorAnalysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 12411 ForceFunctionAttrsPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 12483 RequireAnalysisPass<llvm::ProfileSummaryAnalysis, llvm::Module, llvm::AnalysisManager<Module>> 1.1105 (100.0%) 0.1815 (100.0%) 1.2920 (100.0%) 1.3187 (100.0%) 14047165388 Total ===-------------------------------------------------------------------------=== Miscellaneous Ungrouped Timers ===-------------------------------------------------------------------------=== ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 1.1296 ( 94.4%) 0.4425 ( 98.2%) 1.5720 ( 95.4%) 1.6099 ( 94.9%) 16626483869 Code Generation Time 0.0670 ( 5.6%) 0.0081 ( 1.8%) 0.0751 ( 4.6%) 0.0858 ( 5.1%) 806754444 LLVM IR Generation Time 1.1965 (100.0%) 0.4506 (100.0%) 1.6471 (100.0%) 1.6957 (100.0%) 17433238313 Total ===-------------------------------------------------------------------------=== Register Allocation ===-------------------------------------------------------------------------=== Total Execution Time: 0.0007 seconds (0.0007 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 0.0006 (100.0%) 0.0000 (100.0%) 0.0007 (100.0%) 0.0007 (100.0%) 7870431 Seed Live Regs 0.0006 (100.0%) 0.0000 (100.0%) 0.0007 (100.0%) 0.0007 (100.0%) 7870431 Total ===-------------------------------------------------------------------------=== Instruction Selection and Scheduling ===-------------------------------------------------------------------------=== Total Execution Time: 0.1793 seconds (0.1846 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 0.0382 ( 24.1%) 0.0025 ( 11.8%) 0.0406 ( 22.7%) 0.0427 ( 23.1%) 449731195 DAG Combining 1 0.0222 ( 14.0%) 0.0035 ( 16.6%) 0.0257 ( 14.3%) 0.0260 ( 14.1%) 323350124 Instruction Scheduling 0.0207 ( 13.1%) 0.0024 ( 11.5%) 0.0231 ( 12.9%) 0.0257 ( 13.9%) 305541313 Instruction Selection 0.0234 ( 14.8%) 0.0019 ( 8.9%) 0.0252 ( 14.1%) 0.0255 ( 13.8%) 386744618 DAG Combining 2 0.0171 ( 10.8%) 0.0026 ( 12.4%) 0.0197 ( 11.0%) 0.0199 ( 10.8%) 304585428 Instruction Creation 0.0108 ( 6.8%) 0.0019 ( 9.1%) 0.0127 ( 7.1%) 0.0128 ( 6.9%) 213503986 DAG Legalization 0.0107 ( 6.7%) 0.0019 ( 9.3%) 0.0126 ( 7.0%) 0.0124 ( 6.7%) 217202416 Type Legalization 0.0089 ( 5.6%) 0.0003 ( 1.7%) 0.0093 ( 5.2%) 0.0092 ( 5.0%) 98375640 DAG Combining after legalize types 0.0041 ( 2.6%) 0.0020 ( 9.3%) 0.0061 ( 3.4%) 0.0061 ( 3.3%) 175213222 Instruction Scheduling Cleanup 0.0023 ( 1.5%) 0.0020 ( 9.4%) 0.0043 ( 2.4%) 0.0043 ( 2.4%) 143306060 Vector Legalization 0.1584 (100.0%) 0.0209 (100.0%) 0.1793 (100.0%) 0.1846 (100.0%) 2617554002 Total ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 0.8706 seconds (0.8844 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 0.2523 ( 41.0%) 0.1142 ( 44.8%) 0.3665 ( 42.1%) 0.3729 ( 42.2%) 3751975511 AArch64 Instruction Selection 0.0769 ( 12.5%) 0.1178 ( 46.2%) 0.1947 ( 22.4%) 0.1954 ( 22.1%) 2284494832 AArch64 Assembly Printer 0.0199 ( 3.2%) 0.0006 ( 0.2%) 0.0205 ( 2.4%) 0.0205 ( 2.3%) 208860244 Greedy Register Allocator 0.0169 ( 2.8%) 0.0002 ( 0.1%) 0.0172 ( 2.0%) 0.0171 ( 1.9%) 247073374 Live Variable Analysis 0.0129 ( 2.1%) 0.0003 ( 0.1%) 0.0132 ( 1.5%) 0.0139 ( 1.6%) 165651494 CodeGen Prepare 0.0133 ( 2.2%) 0.0003 ( 0.1%) 0.0136 ( 1.6%) 0.0139 ( 1.6%) 153339584 Machine Instruction Scheduler 0.0105 ( 1.7%) 0.0001 ( 0.0%) 0.0106 ( 1.2%) 0.0106 ( 1.2%) 122934084 AArch64 load / store optimization pass 0.0084 ( 1.4%) 0.0003 ( 0.1%) 0.0087 ( 1.0%) 0.0091 ( 1.0%) 81985504 Simple Register Coalescing 0.0082 ( 1.3%) 0.0004 ( 0.2%) 0.0086 ( 1.0%) 0.0086 ( 1.0%) 76550569 Live Interval Analysis 0.0078 ( 1.3%) 0.0003 ( 0.1%) 0.0081 ( 0.9%) 0.0083 ( 0.9%) 103543246 Loop Strength Reduction 0.0077 ( 1.3%) 0.0002 ( 0.1%) 0.0079 ( 0.9%) 0.0079 ( 0.9%) 76599592 Prologue/Epilogue Insertion & Frame Finalization 0.0064 ( 1.0%) 0.0005 ( 0.2%) 0.0069 ( 0.8%) 0.0077 ( 0.9%) 65721168 Merge disjoint stack slots 0.0067 ( 1.1%) …
… to hang (#115124) Let's see if it helps #114913 The issues on llvm are at llvm/llvm-project#55530 and llvm/llvm-project#69369. In my CI test, I saw the following process hanged: ``` /pytorch/pytorch/.lintbin/clang-tidy -p=/pytorch/pytorch/build --extra-arg -I/usr/lib/llvm-11/include/openmp --extra-arg -I/opt/conda/envs/py_3.9/include/python3.9 --extra-arg -I/pytorch/pytorch/third_party/pybind11/include --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward --extra-arg -I/usr/lib/llvm-14/lib/clang/14.0.0/include --extra-arg -I/usr/local/include --extra-arg -I/usr/include/x86_64-linux-gnu --extra-arg -I/usr/include /pytorch/pytorch/torch/csrc/autograd/python_nested_functions_manual.cpp ``` and the core dump matches the description found in llvm/llvm-project#69369 showing the stuck in `clang::tidy::bugprone::UncheckedOptionalAccessCheck::check`: ``` #0 0x00000000030c7420 in clang::dataflow::WatchedLiteralsSolverImpl::updateWatchedLiterals() () #1 0x00000000030c6c2a in clang::dataflow::WatchedLiteralsSolverImpl::solve() && () #2 0x00000000030c6572 in clang::dataflow::WatchedLiteralsSolver::solve(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () #3 0x00000000030b3bd3 in clang::dataflow::DataflowAnalysisContext::querySolver(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () #4 0x00000000030b3ca5 in clang::dataflow::DataflowAnalysisContext::flowConditionImplies(clang::dataflow::AtomicBoolValue&, clang::dataflow::BoolValue&) () #5 0x00000000030b1213 in clang::dataflow::(anonymous namespace)::diagnoseUnwrapCall(clang::Expr const*, clang::Expr const*, clang::dataflow::Environment const&) () #6 0x00000000030b1357 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::(anonymous namespace)::buildDiagnoseMatchSwitch(clang::dataflow::UncheckedOptionalAccessModelOptions const&)::$_7>::_M_invoke(std::_Any_data const&, clang::CallExpr const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () #7 0x00000000030b1292 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::CaseOf<clang::CallExpr>(clang::ast_matchers::internal::Matcher<clang::Stmt>, std::function<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)>) &&::{lambda(clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () #8 0x00000000030b1995 in clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)#1}::operator()(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) const () #9 0x00000000030b170c in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) () #10 0x00000000030a7c27 in clang::dataflow::UncheckedOptionalAccessDiagnoser::diagnose(clang::ASTContext&, clang::Stmt const*, clang::dataflow::Environment const&) () #11 0x0000000002931286 in std::_Function_handler<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&), clang::tidy::bugprone::analyzeFunction(clang::FunctionDecl const&, clang::ASTContext&)::$_0>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&) () #12 0x0000000002930b41 in clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>)::{lambda(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)#1}::operator()(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&) const () #13 0x00000000030c18cc in std::_Function_handler<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&), clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>)::$_1>::_M_invoke(std::_Any_data const&, clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&) () #14 0x00000000030bf069 in clang::dataflow::transferBlock(clang::dataflow::ControlFlowContext const&, std::vector<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState>, std::allocator<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState> > >&, clang::CFGBlock const&, clang::dataflow::Environment const&, clang::dataflow::TypeErasedDataflowAnalysis&, std::function<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () #15 0x00000000030bfaa5 in clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () #16 0x00000000029301b3 in llvm::Expected<std::vector<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> >, std::allocator<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> > > > > clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>) () #17 0x000000000292fbe8 in clang::tidy::bugprone::UncheckedOptionalAccessCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) () #18 0x00000000022e1572 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::MatchVisitor::visitMatch(clang::ast_matchers::BoundNodes const&) () #19 0x0000000002797a1c in clang::ast_matchers::internal::BoundNodesTreeBuilder::visitMatches(clang::ast_matchers::internal::BoundNodesTreeBuilder::Visitor*) () #20 0x00000000022e0dc6 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::matchWithFilter(clang::DynTypedNode const&) () #21 0x00000000022e3b57 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #22 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #23 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #24 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #25 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #26 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #27 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #28 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #29 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #30 0x00000000022e8791 in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #31 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #32 0x00000000022c017a in clang::ast_matchers::MatchFinder::matchAST(clang::ASTContext&) () #33 0x000000000370ad3c in clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) () #34 0x00000000038ed4bb in clang::ParseAST(clang::Sema&, bool, bool) () #35 0x000000000369eda7 in clang::FrontendAction::Execute() () #36 0x000000000360d3f6 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () #37 0x00000000027c475c in clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) () #38 0x00000000022ad486 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef)::ActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) () #39 0x00000000027c44c6 in clang::tooling::ToolInvocation::runInvocation(char const*, clang::driver::Compilation*, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainerOperations>) () #40 0x00000000027c360b in clang::tooling::ToolInvocation::run() () #41 0x00000000027c5bb1 in clang::tooling::ClangTool::run(clang::tooling::ToolAction*) () #42 0x00000000022a90c7 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef) () #43 0x0000000001ebc7f2 in clang::tidy::clangTidyMain(int, char const**) () #44 0x0000000004c54ba0 in __libc_start_main () #45 0x0000000001eb76ae in _start () ``` Another note is that clang-tidy is CPU-bound. So we could consider running lintrunner job on 4xlarge if needed. Pull Request resolved: #115124 Approved by: https://github.com/kit1980, https://github.com/Skylion007, https://github.com/malfet
… to hang (pytorch#115124) Let's see if it helps pytorch#114913 The issues on llvm are at llvm/llvm-project#55530 and llvm/llvm-project#69369. In my CI test, I saw the following process hanged: ``` /pytorch/pytorch/.lintbin/clang-tidy -p=/pytorch/pytorch/build --extra-arg -I/usr/lib/llvm-11/include/openmp --extra-arg -I/opt/conda/envs/py_3.9/include/python3.9 --extra-arg -I/pytorch/pytorch/third_party/pybind11/include --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward --extra-arg -I/usr/lib/llvm-14/lib/clang/14.0.0/include --extra-arg -I/usr/local/include --extra-arg -I/usr/include/x86_64-linux-gnu --extra-arg -I/usr/include /pytorch/pytorch/torch/csrc/autograd/python_nested_functions_manual.cpp ``` and the core dump matches the description found in llvm/llvm-project#69369 showing the stuck in `clang::tidy::bugprone::UncheckedOptionalAccessCheck::check`: ``` #0 0x00000000030c7420 in clang::dataflow::WatchedLiteralsSolverImpl::updateWatchedLiterals() () pytorch#1 0x00000000030c6c2a in clang::dataflow::WatchedLiteralsSolverImpl::solve() && () pytorch#2 0x00000000030c6572 in clang::dataflow::WatchedLiteralsSolver::solve(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () pytorch#3 0x00000000030b3bd3 in clang::dataflow::DataflowAnalysisContext::querySolver(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () pytorch#4 0x00000000030b3ca5 in clang::dataflow::DataflowAnalysisContext::flowConditionImplies(clang::dataflow::AtomicBoolValue&, clang::dataflow::BoolValue&) () pytorch#5 0x00000000030b1213 in clang::dataflow::(anonymous namespace)::diagnoseUnwrapCall(clang::Expr const*, clang::Expr const*, clang::dataflow::Environment const&) () pytorch#6 0x00000000030b1357 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::(anonymous namespace)::buildDiagnoseMatchSwitch(clang::dataflow::UncheckedOptionalAccessModelOptions const&)::$_7>::_M_invoke(std::_Any_data const&, clang::CallExpr const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () pytorch#7 0x00000000030b1292 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::CaseOf<clang::CallExpr>(clang::ast_matchers::internal::Matcher<clang::Stmt>, std::function<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)>) &&::{lambda(clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)pytorch#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () pytorch#8 0x00000000030b1995 in clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)pytorch#1}::operator()(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) const () pytorch#9 0x00000000030b170c in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)pytorch#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) () pytorch#10 0x00000000030a7c27 in clang::dataflow::UncheckedOptionalAccessDiagnoser::diagnose(clang::ASTContext&, clang::Stmt const*, clang::dataflow::Environment const&) () pytorch#11 0x0000000002931286 in std::_Function_handler<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&), clang::tidy::bugprone::analyzeFunction(clang::FunctionDecl const&, clang::ASTContext&)::$_0>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&) () pytorch#12 0x0000000002930b41 in clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>)::{lambda(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)pytorch#1}::operator()(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&) const () pytorch#13 0x00000000030c18cc in std::_Function_handler<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&), clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>)::$_1>::_M_invoke(std::_Any_data const&, clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&) () pytorch#14 0x00000000030bf069 in clang::dataflow::transferBlock(clang::dataflow::ControlFlowContext const&, std::vector<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState>, std::allocator<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState> > >&, clang::CFGBlock const&, clang::dataflow::Environment const&, clang::dataflow::TypeErasedDataflowAnalysis&, std::function<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () pytorch#15 0x00000000030bfaa5 in clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () pytorch#16 0x00000000029301b3 in llvm::Expected<std::vector<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> >, std::allocator<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> > > > > clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>) () pytorch#17 0x000000000292fbe8 in clang::tidy::bugprone::UncheckedOptionalAccessCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) () pytorch#18 0x00000000022e1572 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::MatchVisitor::visitMatch(clang::ast_matchers::BoundNodes const&) () pytorch#19 0x0000000002797a1c in clang::ast_matchers::internal::BoundNodesTreeBuilder::visitMatches(clang::ast_matchers::internal::BoundNodesTreeBuilder::Visitor*) () pytorch#20 0x00000000022e0dc6 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::matchWithFilter(clang::DynTypedNode const&) () pytorch#21 0x00000000022e3b57 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#22 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#23 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#24 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#25 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#26 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#27 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#28 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#29 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#30 0x00000000022e8791 in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#31 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#32 0x00000000022c017a in clang::ast_matchers::MatchFinder::matchAST(clang::ASTContext&) () pytorch#33 0x000000000370ad3c in clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) () pytorch#34 0x00000000038ed4bb in clang::ParseAST(clang::Sema&, bool, bool) () pytorch#35 0x000000000369eda7 in clang::FrontendAction::Execute() () pytorch#36 0x000000000360d3f6 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () pytorch#37 0x00000000027c475c in clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) () pytorch#38 0x00000000022ad486 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef)::ActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) () pytorch#39 0x00000000027c44c6 in clang::tooling::ToolInvocation::runInvocation(char const*, clang::driver::Compilation*, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainerOperations>) () pytorch#40 0x00000000027c360b in clang::tooling::ToolInvocation::run() () pytorch#41 0x00000000027c5bb1 in clang::tooling::ClangTool::run(clang::tooling::ToolAction*) () pytorch#42 0x00000000022a90c7 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef) () pytorch#43 0x0000000001ebc7f2 in clang::tidy::clangTidyMain(int, char const**) () pytorch#44 0x0000000004c54ba0 in __libc_start_main () pytorch#45 0x0000000001eb76ae in _start () ``` Another note is that clang-tidy is CPU-bound. So we could consider running lintrunner job on 4xlarge if needed. Pull Request resolved: pytorch#115124 Approved by: https://github.com/kit1980, https://github.com/Skylion007, https://github.com/malfet
… to hang (pytorch#115124) Let's see if it helps pytorch#114913 The issues on llvm are at llvm/llvm-project#55530 and llvm/llvm-project#69369. In my CI test, I saw the following process hanged: ``` /pytorch/pytorch/.lintbin/clang-tidy -p=/pytorch/pytorch/build --extra-arg -I/usr/lib/llvm-11/include/openmp --extra-arg -I/opt/conda/envs/py_3.9/include/python3.9 --extra-arg -I/pytorch/pytorch/third_party/pybind11/include --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward --extra-arg -I/usr/lib/llvm-14/lib/clang/14.0.0/include --extra-arg -I/usr/local/include --extra-arg -I/usr/include/x86_64-linux-gnu --extra-arg -I/usr/include /pytorch/pytorch/torch/csrc/autograd/python_nested_functions_manual.cpp ``` and the core dump matches the description found in llvm/llvm-project#69369 showing the stuck in `clang::tidy::bugprone::UncheckedOptionalAccessCheck::check`: ``` #0 0x00000000030c7420 in clang::dataflow::WatchedLiteralsSolverImpl::updateWatchedLiterals() () pytorch#1 0x00000000030c6c2a in clang::dataflow::WatchedLiteralsSolverImpl::solve() && () pytorch#2 0x00000000030c6572 in clang::dataflow::WatchedLiteralsSolver::solve(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () pytorch#3 0x00000000030b3bd3 in clang::dataflow::DataflowAnalysisContext::querySolver(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () pytorch#4 0x00000000030b3ca5 in clang::dataflow::DataflowAnalysisContext::flowConditionImplies(clang::dataflow::AtomicBoolValue&, clang::dataflow::BoolValue&) () pytorch#5 0x00000000030b1213 in clang::dataflow::(anonymous namespace)::diagnoseUnwrapCall(clang::Expr const*, clang::Expr const*, clang::dataflow::Environment const&) () pytorch#6 0x00000000030b1357 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::(anonymous namespace)::buildDiagnoseMatchSwitch(clang::dataflow::UncheckedOptionalAccessModelOptions const&)::$_7>::_M_invoke(std::_Any_data const&, clang::CallExpr const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () pytorch#7 0x00000000030b1292 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::CaseOf<clang::CallExpr>(clang::ast_matchers::internal::Matcher<clang::Stmt>, std::function<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)>) &&::{lambda(clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)pytorch#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () pytorch#8 0x00000000030b1995 in clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)pytorch#1}::operator()(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) const () pytorch#9 0x00000000030b170c in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)pytorch#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) () pytorch#10 0x00000000030a7c27 in clang::dataflow::UncheckedOptionalAccessDiagnoser::diagnose(clang::ASTContext&, clang::Stmt const*, clang::dataflow::Environment const&) () pytorch#11 0x0000000002931286 in std::_Function_handler<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&), clang::tidy::bugprone::analyzeFunction(clang::FunctionDecl const&, clang::ASTContext&)::$_0>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&) () pytorch#12 0x0000000002930b41 in clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>)::{lambda(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)pytorch#1}::operator()(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&) const () pytorch#13 0x00000000030c18cc in std::_Function_handler<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&), clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>)::$_1>::_M_invoke(std::_Any_data const&, clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&) () pytorch#14 0x00000000030bf069 in clang::dataflow::transferBlock(clang::dataflow::ControlFlowContext const&, std::vector<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState>, std::allocator<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState> > >&, clang::CFGBlock const&, clang::dataflow::Environment const&, clang::dataflow::TypeErasedDataflowAnalysis&, std::function<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () pytorch#15 0x00000000030bfaa5 in clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () pytorch#16 0x00000000029301b3 in llvm::Expected<std::vector<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> >, std::allocator<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> > > > > clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>) () pytorch#17 0x000000000292fbe8 in clang::tidy::bugprone::UncheckedOptionalAccessCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) () pytorch#18 0x00000000022e1572 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::MatchVisitor::visitMatch(clang::ast_matchers::BoundNodes const&) () pytorch#19 0x0000000002797a1c in clang::ast_matchers::internal::BoundNodesTreeBuilder::visitMatches(clang::ast_matchers::internal::BoundNodesTreeBuilder::Visitor*) () pytorch#20 0x00000000022e0dc6 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::matchWithFilter(clang::DynTypedNode const&) () pytorch#21 0x00000000022e3b57 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#22 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#23 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#24 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#25 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#26 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#27 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#28 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#29 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#30 0x00000000022e8791 in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#31 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#32 0x00000000022c017a in clang::ast_matchers::MatchFinder::matchAST(clang::ASTContext&) () pytorch#33 0x000000000370ad3c in clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) () pytorch#34 0x00000000038ed4bb in clang::ParseAST(clang::Sema&, bool, bool) () pytorch#35 0x000000000369eda7 in clang::FrontendAction::Execute() () pytorch#36 0x000000000360d3f6 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () pytorch#37 0x00000000027c475c in clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) () pytorch#38 0x00000000022ad486 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef)::ActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) () pytorch#39 0x00000000027c44c6 in clang::tooling::ToolInvocation::runInvocation(char const*, clang::driver::Compilation*, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainerOperations>) () pytorch#40 0x00000000027c360b in clang::tooling::ToolInvocation::run() () pytorch#41 0x00000000027c5bb1 in clang::tooling::ClangTool::run(clang::tooling::ToolAction*) () pytorch#42 0x00000000022a90c7 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef) () pytorch#43 0x0000000001ebc7f2 in clang::tidy::clangTidyMain(int, char const**) () pytorch#44 0x0000000004c54ba0 in __libc_start_main () pytorch#45 0x0000000001eb76ae in _start () ``` Another note is that clang-tidy is CPU-bound. So we could consider running lintrunner job on 4xlarge if needed. Pull Request resolved: pytorch#115124 Approved by: https://github.com/kit1980, https://github.com/Skylion007, https://github.com/malfet
As [`newFunctionWithName:`](https://developer.apple.com/documentation/metal/mtllibrary/1515524-newfunctionwithname) does not accept error argument, do not attempt to print it as it'll be guaranteed `nil` at that point, that results in a classic null pointer dereference, when `TORCH_CHECK` will attempt to construct `std::string` from it. See below backtrace for example: ``` thread #1, queue = 'metal gpu stream', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000018a316dc4 libsystem_platform.dylib`_platform_strlen + 4 frame #1: 0x00000001471011bc libtorch_cpu.dylib`std::__1::__constexpr_strlen[abi:v160006](__str=0x0000000000000000) at cstring:114:10 frame #2: 0x0000000147100c24 libtorch_cpu.dylib`std::__1::char_traits<char>::length(__s=0x0000000000000000) at char_traits.h:220:12 * frame #3: 0x0000000147100bf0 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& std::__1::operator<<[abi:v160006]<std::__1::char_traits<char>>(__os=0x000000016fdfb3a0, __str=0x0000000000000000) at ostream:901:57 frame #4: 0x0000000147100bb4 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb5d0) at StringUtil.h:55:6 frame #5: 0x00000001471007ac libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10 frame #6: 0x0000000147101444 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10 frame #7: 0x0000000147101404 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10 frame #8: 0x000000014710137c libtorch_cpu.dylib`c10::detail::_str_wrapper<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, char const*, char const* const&>::call(args=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:75:5 frame #9: 0x0000000147101310 libtorch_cpu.dylib`decltype(auto) c10::str<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>(args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at StringUtil.h:111:10 frame #10: 0x0000000147100210 libtorch_cpu.dylib`decltype(auto) c10::detail::torchCheckMsgImpl<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>((null)="Expected indexFunction to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)", args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at Exception.h:453:10 frame #11: 0x00000001470fffe8 libtorch_cpu.dylib`at::mps::MPSDevice::metalIndexingPSO(this=0x0000600000381670, kernel="index_select_32bit_idx32") at MPSDevice.mm:62:3 ```
As [`newFunctionWithName:`](https://developer.apple.com/documentation/metal/mtllibrary/1515524-newfunctionwithname) does not accept error argument, do not attempt to print it as it'll be guaranteed `nil` at that point, that results in a classic null pointer dereference, when `TORCH_CHECK` will attempt to construct `std::string` from it. See below backtrace for example: ``` thread #1, queue = 'metal gpu stream', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000018a316dc4 libsystem_platform.dylib`_platform_strlen + 4 frame #1: 0x00000001471011bc libtorch_cpu.dylib`std::__1::__constexpr_strlen[abi:v160006](__str=0x0000000000000000) at cstring:114:10 frame #2: 0x0000000147100c24 libtorch_cpu.dylib`std::__1::char_traits<char>::length(__s=0x0000000000000000) at char_traits.h:220:12 * frame #3: 0x0000000147100bf0 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& std::__1::operator<<[abi:v160006]<std::__1::char_traits<char>>(__os=0x000000016fdfb3a0, __str=0x0000000000000000) at ostream:901:57 frame #4: 0x0000000147100bb4 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb5d0) at StringUtil.h:55:6 frame #5: 0x00000001471007ac libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10 frame #6: 0x0000000147101444 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10 frame #7: 0x0000000147101404 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10 frame #8: 0x000000014710137c libtorch_cpu.dylib`c10::detail::_str_wrapper<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, char const*, char const* const&>::call(args=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:75:5 frame #9: 0x0000000147101310 libtorch_cpu.dylib`decltype(auto) c10::str<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>(args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at StringUtil.h:111:10 frame #10: 0x0000000147100210 libtorch_cpu.dylib`decltype(auto) c10::detail::torchCheckMsgImpl<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>((null)="Expected indexFunction to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)", args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at Exception.h:453:10 frame #11: 0x00000001470fffe8 libtorch_cpu.dylib`at::mps::MPSDevice::metalIndexingPSO(this=0x0000600000381670, kernel="index_select_32bit_idx32") at MPSDevice.mm:62:3 ``` This was introduced by #99855 that replaced `newFunctionWithName:constantValues:error:` with `newFunctionWithName:` Pull Request resolved: #116938 Approved by: https://github.com/Skylion007
repro find inf bug with unit test
user may not know which line of code called collectives in a big code base. When debugging, we can print python-cpp stacktrace in case user call ``ProcessGroup.reduce`` instead of ``torch.distributed.reduce`` ``` LOG(INFO) << "ProcessGroupNCCL::_allgather_base stacktrace: " << get_python_cpp_trace(); ``` output (using _allgather_base as an example): one example python-part trace is ``all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838`` ``` ProcessGroupNCCL::_allgather_base stacktrace: #0 torch::unwind::unwind() from ??:0 #1 torch::CapturedTraceback::gather(bool, bool, bool) from ??:0 #2 c10d::get_python_cpp_trace[abi:cxx11]() from :0 #3 c10d::ProcessGroupNCCL::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from ??:0 #4 c10d::ops::(anonymous namespace)::_allgather_base_CUDA(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long) from Ops.cpp:0 #5 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > > (*)(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long), std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > >, c10::guts::typelist::typelist<at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from :0 #6 torch::autograd::basicAutogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0 #7 c10d::ProcessGroup::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from :0 #8 pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}&&, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (*)(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from :0 #9 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 #10 cfunction_call from /usr/local/src/conda/python-3.10.12/Objects/methodobject.c:543 #11 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #13 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #14 all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838 #15 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #16 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #17 wrapper from /data/users/weif/pytorch/torch/distributed/c10d_logger.py:75 #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #20 _all_gather_flat_param from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1399 #21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #23 unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1308 #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #26 _unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:332 #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #29 _pre_forward_unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:448 #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #32 _pre_forward from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:413 #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #35 forward from /data/users/weif/pytorch/torch/distributed/fsdp/fully_sharded_data_parallel.py:839 #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #37 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #38 _call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1520 #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #40 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #41 _wrapped_call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1511 #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #43 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.12/Objects/call.c:431 #44 slot_tp_call from /usr/local/src/conda/python-3.10.12/Objects/typeobject.c:7494 #45 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #46 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #47 inner from /data/users/weif/pytorch/run_fsdp.py:72 #48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #50 run from /data/users/weif/pytorch/run_fsdp.py:76 #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #53 main from /data/users/weif/pytorch/run_fsdp.py:133 #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #56 <module> from /data/users/weif/pytorch/run_fsdp.py:137 #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #58 PyEval_EvalCode from /usr/local/src/conda/python-3.10.12/Python/ceval.c:1134 #59 run_eval_code_obj from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1291 #60 run_mod from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1312 #61 pyrun_file from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1208 #62 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:456 #63 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:90 #64 pymain_run_file_obj from /usr/local/src/conda/python-3.10.12/Modules/main.c:357 #65 Py_BytesMain from /usr/local/src/conda/python-3.10.12/Modules/main.c:1090 #66 __libc_start_call_main from ??:0 #67 <unwind unsupported> from ??:0 ``` Pull Request resolved: #118924 Approved by: https://github.com/kwen2501
user may not know which line of code called collectives in a big code base. When debugging, we can print python-cpp stacktrace in case user call ``ProcessGroup.reduce`` instead of ``torch.distributed.reduce`` ``` LOG(INFO) << "ProcessGroupNCCL::_allgather_base stacktrace: " << get_python_cpp_trace(); ``` output (using _allgather_base as an example): one example python-part trace is ``all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838`` ``` ProcessGroupNCCL::_allgather_base stacktrace: #0 torch::unwind::unwind() from ??:0 #1 torch::CapturedTraceback::gather(bool, bool, bool) from ??:0 #2 c10d::get_python_cpp_trace[abi:cxx11]() from :0 #3 c10d::ProcessGroupNCCL::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from ??:0 #4 c10d::ops::(anonymous namespace)::_allgather_base_CUDA(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long) from Ops.cpp:0 #5 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > > (*)(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long), std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > >, c10::guts::typelist::typelist<at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from :0 #6 torch::autograd::basicAutogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0 #7 c10d::ProcessGroup::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from :0 #8 pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}&&, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (*)(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from :0 #9 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 #10 cfunction_call from /usr/local/src/conda/python-3.10.12/Objects/methodobject.c:543 #11 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #13 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #14 all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838 #15 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #16 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #17 wrapper from /data/users/weif/pytorch/torch/distributed/c10d_logger.py:75 #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #20 _all_gather_flat_param from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1399 #21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #23 unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1308 #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #26 _unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:332 #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #29 _pre_forward_unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:448 #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #32 _pre_forward from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:413 #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #35 forward from /data/users/weif/pytorch/torch/distributed/fsdp/fully_sharded_data_parallel.py:839 #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #37 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #38 _call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1520 #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #40 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #41 _wrapped_call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1511 #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #43 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.12/Objects/call.c:431 #44 slot_tp_call from /usr/local/src/conda/python-3.10.12/Objects/typeobject.c:7494 #45 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #46 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #47 inner from /data/users/weif/pytorch/run_fsdp.py:72 #48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #50 run from /data/users/weif/pytorch/run_fsdp.py:76 #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #53 main from /data/users/weif/pytorch/run_fsdp.py:133 #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #56 <module> from /data/users/weif/pytorch/run_fsdp.py:137 #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #58 PyEval_EvalCode from /usr/local/src/conda/python-3.10.12/Python/ceval.c:1134 #59 run_eval_code_obj from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1291 #60 run_mod from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1312 #61 pyrun_file from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1208 #62 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:456 #63 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:90 #64 pymain_run_file_obj from /usr/local/src/conda/python-3.10.12/Modules/main.c:357 #65 Py_BytesMain from /usr/local/src/conda/python-3.10.12/Modules/main.c:1090 #66 __libc_start_call_main from ??:0 #67 <unwind unsupported> from ??:0 ``` Pull Request resolved: #118924 Approved by: https://github.com/kwen2501
Summary: The caffe2/utils threadpool impl used to set thread name, since D8266344 https://www.internalfb.com/code/fbsource/[3ba3d30d6841]/xplat/caffe2/caffe2/utils/threadpool/WorkersPool.h?lines=271-273 But now we don't use this caffe2's own impl (since D21232894?), but use the third-party threadpool instead, which doesn't set thread name This diff is to achieve same effect as D8266344, such that we can tell which threads are pytorch threads from perfetto trace. The idea comes from https://stackoverflow.com/questions/32375034/how-to-obtain-thread-name-in-android-ndk and folly ThreadName https://www.internalfb.com/code/fbsource/[3ba3d30d6841]/xplat/folly/system/ThreadName.cpp?lines=30-41 I'm not sure if this is the right place to put this change. BTW, Pytorch thread pool caller thread is worker #0 https://www.internalfb.com/code/fbsource/[3ba3d30d6841281c140db1c8bd2f85ede310a01b]/xplat/third-party/pthreadpool/pthreadpool/src/pthreads.c?lines=289-292 Test Plan: ## Before ``` --num_cpu_threads 2 --num_pytorch_threads -1 # default to size equal to 4 cpu cores mos:/ $ ps -T -p `pidof transcribe_bin` USER PID TID PPID VSZ RSS WCHAN ADDR S CMD shell 8985 8985 8983 118576 47688 hrtimer_n+ 0 S transcribe_bin <-- main thread shell 8985 8986 8983 118576 47688 0 0 R transcribe_bin <-- pytorch thread pytorch#1 shell 8985 8987 8983 118576 47688 0 0 R transcribe_bin <-- pytorch thread pytorch#2 shell 8985 8988 8983 118576 47688 0 0 R transcribe_bin <-- pytorch thread pytorch#3 shell 8985 8989 8983 118576 47688 0 0 R CPUThreadPool0 shell 8985 8990 8983 118576 47688 futex_wai+ 0 S CPUThreadPool1 shell 8985 8991 8983 118576 47688 ep_poll 0 S IOThreadPool0 shell 8985 8992 8983 118576 47688 futex_wai+ 0 S FutureTimekeepr shell 8985 8993 8983 118576 47688 pipe_wait 0 S snapshot_thread shell 8985 8994 8983 118576 47688 hrtimer_n+ 0 S snapshot_thread shell 8985 8997 8983 118576 47688 futex_wai+ 0 S AsyncDataQueue ``` ## After ``` --num_cpu_threads 2 --num_pytorch_threads -1 mos:/ $ ps -T -p `pidof transcribe_bin` USER PID TID PPID VSZ RSS WCHAN ADDR S CMD shell 11901 11901 11899 118128 40748 futex_wai+ 0 S transcribe_bin <-- main thread serves as pytorch thread #0 shell 11901 11902 11899 118132 40748 futex_wai+ 0 S c10pthreadpool <-- pytorch thread pytorch#1 shell 11901 11903 11899 118132 40748 futex_wai+ 0 S c10pthreadpool <-- pytorch thread pytorch#2 shell 11901 11904 11899 118132 40748 futex_wai+ 0 S c10pthreadpool <-- pytorch thread pytorch#3 shell 11901 11905 11899 118152 40752 futex_wai+ 0 S CPUThreadPool0 shell 11901 11906 11899 118148 40752 0 0 R CPUThreadPool1 shell 11901 11907 11899 118148 40756 ep_poll 0 S IOThreadPool0 shell 11901 11908 11899 118152 40756 futex_wai+ 0 S FutureTimekeepr shell 11901 11909 11899 118164 40756 pipe_wait 0 S snapshot_thread shell 11901 11910 11899 118168 40756 hrtimer_n+ 0 S snapshot_thread shell 11901 11913 11899 118160 40760 futex_wai+ 0 S AsyncDataQueue ``` Example Perfetto trace: {F1483727859} Looks like the pytorch thread pool was originally created with 4 thread during ASR loading (`loadTunaFactory`), and later recreated with 3 threads during inference. Differential Revision: D55990584 Pulled By: chsivic
I have an unhealthy obsession with PEP8... Could
viewAs
,expandAs
be renamed toview_as
,expand_as
, etc.?I might even volunteer to make everything pass flake8 if you guys are okay with accepting a PR that does that.
The text was updated successfully, but these errors were encountered: