Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP8 #3

Closed
bartvm opened this issue Aug 16, 2016 · 5 comments · Fixed by #616
Closed

PEP8 #3

bartvm opened this issue Aug 16, 2016 · 5 comments · Fixed by #616
Labels
todo Not as important as medium or high priority tasks, but we will work on these.

Comments

@bartvm
Copy link
Contributor

bartvm commented Aug 16, 2016

I have an unhealthy obsession with PEP8... Could viewAs, expandAs be renamed to view_as, expand_as, etc.?

I might even volunteer to make everything pass flake8 if you guys are okay with accepting a PR that does that.

@soumith
Copy link
Member

soumith commented Aug 16, 2016

If that's REALLY the convention, I'm okay with it. Though it looks uglier ;-)
@apaszke wdyt?

@apaszke
Copy link
Contributor

apaszke commented Aug 16, 2016

I think it's best to stick with PEP8. However, this will also require changing index*, masked* and possibly some other methods.
Also, it would be nice to set up a linter.

@colesbury
Copy link
Member

@apaszke, if you use Atom you get Facebook's linter which seems to be PEP8 compatible.

One thing I noticed is that this apparently isn't accepted style in Python:

# Arguments on first line forbidden when not using vertical alignment.
foo = long_function_name(var_one, var_two,
    var_three, var_four)

Instead the preferred style is:

# Hanging indents should add a level.
foo = long_function_name(
    var_one, var_two,
    var_three, var_four)

or

foo = long_function_name(var_one, var_two,
                         var_three, var_four)

@apaszke
Copy link
Contributor

apaszke commented Sep 16, 2016

We should probably set up pylint and make it run at each pull request. This way we can slowly make ourselves more pep8 compliant and eventually we'll just go over the whole codebase and fix all errors. I'm using vim, I'll get myself some linter plugin tomorrow.

@apaszke apaszke modified the milestones: Public, Public release Sep 16, 2016
@apaszke apaszke added the todo Not as important as medium or high priority tasks, but we will work on these. label Sep 16, 2016
@ebetica
Copy link
Contributor

ebetica commented Sep 20, 2016

Can we get rid of "import *" statements as well? Not PEP8 but lots of people seem annoyed at it.

http://stackoverflow.com/questions/2386714/why-is-import-bad

colesbury referenced this issue in colesbury/pytorch-old Sep 30, 2016
This was referenced Jan 24, 2017
tfriedel pushed a commit to tfriedel/pytorch that referenced this issue Aug 9, 2017
modify cuda and cudnn dll names for win32
ezyang added a commit that referenced this issue Nov 8, 2023
…ython library"


This is the cheap and cheerful implementation, which is only enabled on TORCH_SHOW_CPP_STACKTRACES, because it *eagerly* symbolizes immediately at exception throw time, even if the exception will end up getting caught. It would be better to do this lazily and only symbolize when we try to print the exception, but that requires a more involved refactor of c10::Error that I don't feel like doing.

Compare the output before:

```
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x95 (0x7fa21b99d975 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so)                                                                                                                                                                                                         
frame #1: c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const + 0x8d (0x7fa21b951269 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so)                                                  
frame #2: c10::TensorImpl::sizes_custom() const + 0x9f (0x7fa21b9770df in /data/users/ezyang/c/pytorch/torch/lib/libc10.so)                                                                                
frame #3: at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) + 0x31e (0x7fa20a202a8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so)                                           
frame #4: <unknown function> + 0x29f34de (0x7fa20b5f34de in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so)                                                                                        
frame #5: <unknown function> + 0x2a1fd8e (0x7fa20b61fd8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so)                                                                                        
frame #6: <unknown function> + 0x6b907b (0x7fa2142b907b in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so)                                                                                      
frame #7: <unknown function> + 0x6b6175 (0x7fa2142b6175 in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) 
```

and after:

```
#1 torch::CapturedTraceback::gather(bool, bool, bool) from ??:0                                                                                                                                            
#2 THPModule_initExtension(_object*, _object*)::{lambda()#1}::operator()() const [clone .constprop.0] from Module.cpp:0                                                                                    
#3 std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), THPModule_initExtension(_object*, _object*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Module.cpp:0                                                                                                                                                                                          
#4 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0                                                                       
#5 c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const from ??:0
#6 c10::TensorImpl::sizes_custom() const [clone .localalias] from TensorImpl.cpp:0
#7 at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) from ??:0
#8 at::(anonymous namespace)::wrapper_Meta_mm_out_out(at::Tensor const&, at::Tensor const&, at::Tensor&) from RegisterMeta.cpp:0
#9 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor const&, at::Tensor const&, at::Tensor&), &at::(anonymous namespace)::wrapper_Meta_mm_out_out>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor&> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterMeta.cpp:0
```

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue Nov 8, 2023
…ython library"


This is the cheap and cheerful implementation, which is only enabled on TORCH_SHOW_CPP_STACKTRACES, because it *eagerly* symbolizes immediately at exception throw time, even if the exception will end up getting caught. It would be better to do this lazily and only symbolize when we try to print the exception, but that requires a more involved refactor of c10::Error that I don't feel like doing.

Compare the output before:

```
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x95 (0x7fa21b99d975 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so)                                                                                                                                                                                                         
frame #1: c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const + 0x8d (0x7fa21b951269 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so)                                                  
frame #2: c10::TensorImpl::sizes_custom() const + 0x9f (0x7fa21b9770df in /data/users/ezyang/c/pytorch/torch/lib/libc10.so)                                                                                
frame #3: at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) + 0x31e (0x7fa20a202a8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so)                                           
frame #4: <unknown function> + 0x29f34de (0x7fa20b5f34de in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so)                                                                                        
frame #5: <unknown function> + 0x2a1fd8e (0x7fa20b61fd8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so)                                                                                        
frame #6: <unknown function> + 0x6b907b (0x7fa2142b907b in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so)                                                                                      
frame #7: <unknown function> + 0x6b6175 (0x7fa2142b6175 in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) 
```

and after:

```                                                                                                                                                                               
#4 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0                                                                       
#5 c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const from ??:0
#6 c10::TensorImpl::sizes_custom() const [clone .localalias] from TensorImpl.cpp:0
#7 at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) from ??:0
#8 at::(anonymous namespace)::wrapper_Meta_mm_out_out(at::Tensor const&, at::Tensor const&, at::Tensor&) from RegisterMeta.cpp:0
#9 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor const&, at::Tensor const&, at::Tensor&), &at::(anonymous namespace)::wrapper_Meta_mm_out_out>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor&> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterMeta.cpp:0
```

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this issue Nov 9, 2023
…ry (#113207)

This is the cheap and cheerful implementation, which is only enabled on TORCH_SHOW_CPP_STACKTRACES, because it *eagerly* symbolizes immediately at exception throw time, even if the exception will end up getting caught. It would be better to do this lazily and only symbolize when we try to print the exception, but that requires a more involved refactor of c10::Error that I don't feel like doing.

Compare the output before:

```
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x95 (0x7fa21b99d975 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so)
frame #1: c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const + 0x8d (0x7fa21b951269 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so)
frame #2: c10::TensorImpl::sizes_custom() const + 0x9f (0x7fa21b9770df in /data/users/ezyang/c/pytorch/torch/lib/libc10.so)
frame #3: at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) + 0x31e (0x7fa20a202a8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x29f34de (0x7fa20b5f34de in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x2a1fd8e (0x7fa20b61fd8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x6b907b (0x7fa2142b907b in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x6b6175 (0x7fa2142b6175 in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so)
```

and after:

```
#4 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
#5 c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const from ??:0
#6 c10::TensorImpl::sizes_custom() const [clone .localalias] from TensorImpl.cpp:0
#7 at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) from ??:0
#8 at::(anonymous namespace)::wrapper_Meta_mm_out_out(at::Tensor const&, at::Tensor const&, at::Tensor&) from RegisterMeta.cpp:0
#9 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor const&, at::Tensor const&, at::Tensor&), &at::(anonymous namespace)::wrapper_Meta_mm_out_out>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor&> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterMeta.cpp:0
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: #113207
Approved by: https://github.com/Skylion007
Skylion007 pushed a commit to Skylion007/pytorch that referenced this issue Nov 14, 2023
…ry (pytorch#113207)

This is the cheap and cheerful implementation, which is only enabled on TORCH_SHOW_CPP_STACKTRACES, because it *eagerly* symbolizes immediately at exception throw time, even if the exception will end up getting caught. It would be better to do this lazily and only symbolize when we try to print the exception, but that requires a more involved refactor of c10::Error that I don't feel like doing.

Compare the output before:

```
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x95 (0x7fa21b99d975 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so)
frame pytorch#1: c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const + 0x8d (0x7fa21b951269 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so)
frame pytorch#2: c10::TensorImpl::sizes_custom() const + 0x9f (0x7fa21b9770df in /data/users/ezyang/c/pytorch/torch/lib/libc10.so)
frame pytorch#3: at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) + 0x31e (0x7fa20a202a8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so)
frame pytorch#4: <unknown function> + 0x29f34de (0x7fa20b5f34de in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so)
frame pytorch#5: <unknown function> + 0x2a1fd8e (0x7fa20b61fd8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so)
frame pytorch#6: <unknown function> + 0x6b907b (0x7fa2142b907b in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so)
frame pytorch#7: <unknown function> + 0x6b6175 (0x7fa2142b6175 in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so)
```

and after:

```
pytorch#4 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
pytorch#5 c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const from ??:0
pytorch#6 c10::TensorImpl::sizes_custom() const [clone .localalias] from TensorImpl.cpp:0
pytorch#7 at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) from ??:0
pytorch#8 at::(anonymous namespace)::wrapper_Meta_mm_out_out(at::Tensor const&, at::Tensor const&, at::Tensor&) from RegisterMeta.cpp:0
pytorch#9 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor const&, at::Tensor const&, at::Tensor&), &at::(anonymous namespace)::wrapper_Meta_mm_out_out>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor&> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterMeta.cpp:0
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: pytorch#113207
Approved by: https://github.com/Skylion007
malfet added a commit that referenced this issue Nov 23, 2023
For some reason, inlining initializer list into a std::vector takes a lot of time using clang-15.
But considering that there are only dozen or so distrinct tags, creating them once and pass as def argument should not affect runtime speed at all, but this significantly improves compilation time.
On Mac M1 it reduces time needed to compiler RegisterSchema.cpp from 50 to 3 seconds.

Before
```
% time /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -ftime-report -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/Users/nshulga/git/pytorch/pytorch/build/aten/src -I/Users/nshulga/git/pytorch/pytorch/aten/src -I/Users/nshulga/git/pytorch/pytorch/build -I/Users/nshulga/git/pytorch/pytorch -I/Users/nshulga/git/pytorch/pytorch/cmake/../third_party/benchmark/include -I/Users/nshulga/git/pytorch/pytorch/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/build/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/build/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include -I/Users/nshulga/git/pytorch/pytorch/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/../aten/src -I/Users/nshulga/git/pytorch/pytorch/torch/csrc -I/Users/nshulga/git/pytorch/pytorch/third_party/miniz-2.1.0 -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/include -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/src -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/FXdiv/include -I/Users/nshulga/git/pytorch/pytorch/c10/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/pthreadpool/include -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/deps/clog/include -I/Users/nshulga/git/pytorch/pytorch/third_party/NNPACK/include -I/Users/nshulga/git/pytorch/pytorch/third_party/FP16/include -I/Users/nshulga/git/pytorch/pytorch/third_party/fmt/include -I/Users/nshulga/git/pytorch/pytorch/third_party/flatbuffers/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googletest/include -isystem /Users/nshulga/git/pytorch/pytorch/third_party/protobuf/src -isystem /Users/nshulga/git/pytorch/pytorch/third_party/XNNPACK/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/eigen -isystem /Users/nshulga/git/pytorch/pytorch/build/include  -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wsuggest-override -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wno-error=inconsistent-missing-override -Wno-error=inconsistent-missing-destructor-override -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -faligned-new -Werror -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces -O3 -DNDEBUG -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk -fPIC -D__NEON__ -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-unused-function -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-strict-overflow -Wno-strict-aliasing -fvisibility=hidden -O2 -Wmissing-prototypes -Werror=missing-prototypes -Xpreprocessor -fopenmp -I/Users/nshulga/miniforge3/include -std=gnu++17 -Wno-missing-prototypes -Wno-error=missing-prototypes -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSchema.cpp.o -c /Users/nshulga/git/pytorch/pytorch/build/aten/src/ATen/RegisterSchema.cpp
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 131.8054 seconds (132.5540 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
  43.6364 ( 33.2%)   0.0919 ( 30.1%)  43.7282 ( 33.2%)  43.9658 ( 33.2%)  536345245380  ModuleInlinerWrapperPass
  43.6291 ( 33.2%)   0.0891 ( 29.2%)  43.7182 ( 33.2%)  43.9549 ( 33.2%)  536264096394  DevirtSCCRepeatedPass
  42.3766 ( 32.2%)   0.0185 (  6.1%)  42.3951 ( 32.2%)  42.6198 ( 32.2%)  523040901767  GVNPass
   0.4085 (  0.3%)   0.0040 (  1.3%)   0.4125 (  0.3%)   0.4195 (  0.3%)  4106085945  SimplifyCFGPass
   0.3611 (  0.3%)   0.0115 (  3.8%)   0.3726 (  0.3%)   0.3779 (  0.3%)  4864696407  InstCombinePass
   0.1607 (  0.1%)   0.0088 (  2.9%)   0.1695 (  0.1%)   0.1720 (  0.1%)  1780986175  InlinerPass
   0.0865 (  0.1%)   0.0024 (  0.8%)   0.0889 (  0.1%)   0.0914 (  0.1%)  1489982961  SROAPass
   0.0750 (  0.1%)   0.0013 (  0.4%)   0.0763 (  0.1%)   0.0764 (  0.1%)  620016338  SCCPPass
   0.0661 (  0.1%)   0.0040 (  1.3%)   0.0701 (  0.1%)   0.0735 (  0.1%)  592027163  EarlyCSEPass
   0.0554 (  0.0%)   0.0026 (  0.8%)   0.0580 (  0.0%)   0.0604 (  0.0%)  586567838  SLPVectorizerPass
   0.0468 (  0.0%)   0.0081 (  2.7%)   0.0549 (  0.0%)   0.0571 (  0.0%)  486049135  BlockFrequencyAnalysis
   0.0364 (  0.0%)   0.0059 (  1.9%)   0.0424 (  0.0%)   0.0437 (  0.0%)  366002196  BranchProbabilityAnalysis
   0.0399 (  0.0%)   0.0003 (  0.1%)   0.0401 (  0.0%)   0.0404 (  0.0%)  324932876  OpenMPOptCGSCCPass
   0.0361 (  0.0%)   0.0022 (  0.7%)   0.0383 (  0.0%)   0.0385 (  0.0%)  289493455  MemorySSAAnalysis
   0.0341 (  0.0%)   0.0017 (  0.5%)   0.0358 (  0.0%)   0.0360 (  0.0%)  202039544  ADCEPass
   0.0323 (  0.0%)   0.0023 (  0.7%)   0.0346 (  0.0%)   0.0351 (  0.0%)  279814836  CorrelatedValuePropagationPass
   0.0318 (  0.0%)   0.0005 (  0.2%)   0.0324 (  0.0%)   0.0334 (  0.0%)  302116539  DSEPass
   0.0251 (  0.0%)   0.0032 (  1.0%)   0.0283 (  0.0%)   0.0290 (  0.0%)  268768995  DominatorTreeAnalysis
   0.0275 (  0.0%)   0.0012 (  0.4%)   0.0286 (  0.0%)   0.0289 (  0.0%)  335916941  HotColdSplittingPass
   0.0251 (  0.0%)   0.0031 (  1.0%)   0.0282 (  0.0%)   0.0286 (  0.0%)  222934147  CGProfilePass
   0.0221 (  0.0%)   0.0009 (  0.3%)   0.0230 (  0.0%)   0.0255 (  0.0%)   79855412  GlobalOptPass
   0.0184 (  0.0%)   0.0019 (  0.6%)   0.0203 (  0.0%)   0.0209 (  0.0%)  205236334  JumpThreadingPass
   0.0185 (  0.0%)   0.0021 (  0.7%)   0.0206 (  0.0%)   0.0208 (  0.0%)  175318325  LoopAnalysis
   0.0164 (  0.0%)   0.0030 (  1.0%)   0.0194 (  0.0%)   0.0199 (  0.0%)  163560340  PostOrderFunctionAttrsPass
   0.0188 (  0.0%)   0.0004 (  0.1%)   0.0193 (  0.0%)   0.0194 (  0.0%)  103197563  TailCallElimPass
   0.0176 (  0.0%)   0.0015 (  0.5%)   0.0190 (  0.0%)   0.0192 (  0.0%)  130956806  MemCpyOptPass
   0.0116 (  0.0%)   0.0074 (  2.4%)   0.0190 (  0.0%)   0.0191 (  0.0%)  221717778  AAManager
   0.0163 (  0.0%)   0.0013 (  0.4%)   0.0176 (  0.0%)   0.0178 (  0.0%)  167126689  PostDominatorTreeAnalysis
   0.0155 (  0.0%)   0.0003 (  0.1%)   0.0158 (  0.0%)   0.0160 (  0.0%)  162157524  CalledValuePropagationPass
   0.0132 (  0.0%)   0.0014 (  0.5%)   0.0146 (  0.0%)   0.0159 (  0.0%)   87781235  IPSCCPPass
   0.0127 (  0.0%)   0.0008 (  0.3%)   0.0135 (  0.0%)   0.0140 (  0.0%)   91128714  ReassociatePass
   0.0101 (  0.0%)   0.0009 (  0.3%)   0.0110 (  0.0%)   0.0111 (  0.0%)   73124251  BDCEPass
   0.0072 (  0.0%)   0.0004 (  0.1%)   0.0077 (  0.0%)   0.0089 (  0.0%)   60948332  LoopIdiomRecognizePass
   0.0064 (  0.0%)   0.0014 (  0.5%)   0.0079 (  0.0%)   0.0088 (  0.0%)   80334128  LoopVectorizePass
   0.0065 (  0.0%)   0.0022 (  0.7%)   0.0087 (  0.0%)   0.0088 (  0.0%)  105525946  BasicAA
   0.0068 (  0.0%)   0.0014 (  0.5%)   0.0082 (  0.0%)   0.0083 (  0.0%)   86368700  LoopSimplifyPass
   0.0071 (  0.0%)   0.0005 (  0.2%)   0.0075 (  0.0%)   0.0077 (  0.0%)   87195315  LICMPass
   0.0052 (  0.0%)   0.0024 (  0.8%)   0.0076 (  0.0%)   0.0075 (  0.0%)   68859408  LowerMatrixIntrinsicsPass
   0.0064 (  0.0%)   0.0003 (  0.1%)   0.0067 (  0.0%)   0.0067 (  0.0%)   72021939  LoopDeletionPass
   0.0012 (  0.0%)   0.0011 (  0.4%)   0.0023 (  0.0%)   0.0065 (  0.0%)   28855092  TargetIRAnalysis
   0.0052 (  0.0%)   0.0006 (  0.2%)   0.0058 (  0.0%)   0.0058 (  0.0%)   38197861  Float2IntPass
   0.0047 (  0.0%)   0.0009 (  0.3%)   0.0056 (  0.0%)   0.0056 (  0.0%)   63722846  LoopSinkPass
   0.0055 (  0.0%)   0.0001 (  0.0%)   0.0056 (  0.0%)   0.0056 (  0.0%)   61106373  LoopUnrollPass
   0.0051 (  0.0%)   0.0002 (  0.1%)   0.0053 (  0.0%)   0.0055 (  0.0%)   60361028  VectorCombinePass
   0.0044 (  0.0%)   0.0002 (  0.1%)   0.0046 (  0.0%)   0.0049 (  0.0%)   22674564  CallGraphAnalysis
   0.0046 (  0.0%)   0.0001 (  0.0%)   0.0047 (  0.0%)   0.0049 (  0.0%)   12102487  GlobalDCEPass
   0.0043 (  0.0%)   0.0000 (  0.0%)   0.0043 (  0.0%)   0.0043 (  0.0%)   48372244  InstSimplifyPass
   0.0027 (  0.0%)   0.0008 (  0.3%)   0.0035 (  0.0%)   0.0037 (  0.0%)   45045562  ScalarEvolutionAnalysis
   0.0030 (  0.0%)   0.0003 (  0.1%)   0.0033 (  0.0%)   0.0036 (  0.0%)   29145265  IndVarSimplifyPass
   0.0025 (  0.0%)   0.0002 (  0.1%)   0.0027 (  0.0%)   0.0032 (  0.0%)   16671955  RequireAnalysisPass<llvm::GlobalsAA, llvm::Module, llvm::AnalysisManager<Module>>
   0.0025 (  0.0%)   0.0002 (  0.1%)   0.0027 (  0.0%)   0.0032 (  0.0%)   16651504  GlobalsAA
   0.0006 (  0.0%)   0.0005 (  0.2%)   0.0011 (  0.0%)   0.0029 (  0.0%)    8186724  OpenMPOptPass
   0.0027 (  0.0%)   0.0001 (  0.0%)   0.0028 (  0.0%)   0.0028 (  0.0%)   12998003  ReversePostOrderFunctionAttrsPass
   0.0019 (  0.0%)   0.0006 (  0.2%)   0.0025 (  0.0%)   0.0028 (  0.0%)   11967259  LowerExpectIntrinsicPass
   0.0024 (  0.0%)   0.0003 (  0.1%)   0.0028 (  0.0%)   0.0028 (  0.0%)   19995960  LowerConstantIntrinsicsPass
   0.0022 (  0.0%)   0.0001 (  0.0%)   0.0023 (  0.0%)   0.0023 (  0.0%)   19367864  LibCallsShrinkWrapPass
   0.0019 (  0.0%)   0.0001 (  0.0%)   0.0020 (  0.0%)   0.0021 (  0.0%)   24061124  LoopLoadEliminationPass
   0.0011 (  0.0%)   0.0004 (  0.1%)   0.0016 (  0.0%)   0.0018 (  0.0%)   35505583  LCSSAPass
   0.0009 (  0.0%)   0.0008 (  0.3%)   0.0016 (  0.0%)   0.0016 (  0.0%)   22693970  MemoryDependenceAnalysis
   0.0013 (  0.0%)   0.0001 (  0.0%)   0.0014 (  0.0%)   0.0016 (  0.0%)    9251166  InjectTLIMappings
   0.0010 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)   0.0011 (  0.0%)    2782049  AlwaysInlinerPass
   0.0010 (  0.0%)   0.0000 (  0.0%)   0.0010 (  0.0%)   0.0010 (  0.0%)    5709095  DivRemPairsPass
   0.0009 (  0.0%)   0.0001 (  0.0%)   0.0010 (  0.0%)   0.0010 (  0.0%)   12138843  MergedLoadStoreMotionPass
   0.0007 (  0.0%)   0.0001 (  0.0%)   0.0009 (  0.0%)   0.0010 (  0.0%)   12095182  LoopFullUnrollPass
   0.0004 (  0.0%)   0.0002 (  0.1%)   0.0007 (  0.0%)   0.0009 (  0.0%)   15168801  LoopRotatePass
   0.0005 (  0.0%)   0.0002 (  0.1%)   0.0007 (  0.0%)   0.0008 (  0.0%)   18714381  TargetLibraryAnalysis
   0.0006 (  0.0%)   0.0001 (  0.0%)   0.0007 (  0.0%)   0.0007 (  0.0%)    9991748  LoopInstSimplifyPass
   0.0004 (  0.0%)   0.0004 (  0.1%)   0.0007 (  0.0%)   0.0007 (  0.0%)   10149528  LoopDistributePass
   0.0003 (  0.0%)   0.0002 (  0.1%)   0.0004 (  0.0%)   0.0007 (  0.0%)    1096854  DeadArgumentEliminationPass
   0.0006 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)    5367319  RecomputeGlobalsAAPass
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0006 (  0.0%)    8937323  PromotePass
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0006 (  0.0%)    9579538  SimpleLoopUnswitchPass
   0.0004 (  0.0%)   0.0002 (  0.1%)   0.0006 (  0.0%)   0.0006 (  0.0%)   16129558  DemandedBitsAnalysis
   0.0005 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)   11233413  FunctionAnalysisManagerCGSCCProxy
   0.0002 (  0.0%)   0.0002 (  0.1%)   0.0004 (  0.0%)   0.0006 (  0.0%)   11872487  RequireAnalysisPass<llvm::OptimizationRemarkEmitterAnalysis, llvm::Function, llvm::AnalysisManager<Function>>
   0.0003 (  0.0%)   0.0002 (  0.1%)   0.0005 (  0.0%)   0.0006 (  0.0%)   16910811  LazyValueAnalysis
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)    9314494  LoopSimplifyCFGPass
   0.0003 (  0.0%)   0.0002 (  0.1%)   0.0005 (  0.0%)   0.0005 (  0.0%)   13019354  AssumptionAnalysis
   0.0002 (  0.0%)   0.0002 (  0.1%)   0.0004 (  0.0%)   0.0005 (  0.0%)   12099715  OptimizationRemarkEmitterAnalysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)    8403351  InvalidateAnalysisPass<llvm::AAManager>
   0.0002 (  0.0%)   0.0002 (  0.1%)   0.0004 (  0.0%)   0.0004 (  0.0%)   12032802  TypeBasedAA
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0004 (  0.0%)   0.0004 (  0.0%)   12031548  ScopedNoAliasAA
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)    8582619  CoroSplitPass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0003 (  0.0%)    1358379  InferFunctionAttrsPass
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8383272  CoroElidePass
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8467353  PhiValuesAnalysis
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    4092920  ConstantMergePass
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8279547  SpeculativeExecutionPass
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8368351  ShouldNotRunFunctionPassesAnalysis
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    1312838  LazyCallGraphAnalysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    4855087  WarnMissedTransformationsPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)     130368  CoroEarlyPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3625888  AlignmentFromAssumptionsPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3704343  LoopAccessAnalysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)     111237  Annotation2MetadataPass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3574289  AnnotationRemarksPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3611080  InvalidateAnalysisPass<llvm::ShouldNotRunFunctionPassesAnalysis>
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      47163  EliminateAvailableExternallyPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      17908  CoroCleanupPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      14976  RelLookupTableConverterPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      13763  ProfileSummaryAnalysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      12483  RequireAnalysisPass<llvm::ProfileSummaryAnalysis, llvm::Module, llvm::AnalysisManager<Module>>
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      12411  ForceFunctionAttrsPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      12678  InlineAdvisorAnalysis
  131.5002 (100.0%)   0.3052 (100.0%)  131.8054 (100.0%)  132.5540 (100.0%)  1615901391352  Total

===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
  46.0915 ( 99.8%)   0.7497 ( 98.5%)  46.8412 ( 99.8%)  47.1692 ( 99.7%)  567401093834  Code Generation Time
   0.0923 (  0.2%)   0.0116 (  1.5%)   0.1039 (  0.2%)   0.1258 (  0.3%)  1088790744  LLVM IR Generation Time
  46.1838 (100.0%)   0.7613 (100.0%)  46.9451 (100.0%)  47.2950 (100.0%)  568489884578  Total

===-------------------------------------------------------------------------===
                              Register Allocation
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0021 seconds (0.0021 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.0020 (100.0%)   0.0001 (100.0%)   0.0021 (100.0%)   0.0021 (100.0%)   12292396  Seed Live Regs
   0.0020 (100.0%)   0.0001 (100.0%)   0.0021 (100.0%)   0.0021 (100.0%)   12292396  Total

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.4432 seconds (0.4524 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.1275 ( 32.3%)   0.0056 ( 11.6%)   0.1331 ( 30.0%)   0.1363 ( 30.1%)  1438634389  DAG Combining 1
   0.0702 ( 17.8%)   0.0047 (  9.7%)   0.0749 ( 16.9%)   0.0751 ( 16.6%)  1027837820  DAG Combining 2
   0.0548 ( 13.9%)   0.0054 ( 11.1%)   0.0601 ( 13.6%)   0.0636 ( 14.1%)  791659261  Instruction Selection
   0.0438 ( 11.1%)   0.0060 ( 12.5%)   0.0499 ( 11.3%)   0.0509 ( 11.2%)  712994861  Instruction Scheduling
   0.0345 (  8.7%)   0.0073 ( 15.1%)   0.0418 (  9.4%)   0.0420 (  9.3%)  654102488  Instruction Creation
   0.0228 (  5.8%)   0.0047 (  9.8%)   0.0276 (  6.2%)   0.0278 (  6.2%)  481250135  DAG Legalization
   0.0175 (  4.4%)   0.0048 (  9.9%)   0.0223 (  5.0%)   0.0231 (  5.1%)  455645073  Type Legalization
   0.0092 (  2.3%)   0.0047 (  9.7%)   0.0139 (  3.1%)   0.0137 (  3.0%)  388554644  Instruction Scheduling Cleanup
   0.0057 (  1.4%)   0.0047 (  9.8%)   0.0104 (  2.4%)   0.0107 (  2.4%)  326297296  Vector Legalization
   0.0089 (  2.2%)   0.0004 (  0.8%)   0.0092 (  2.1%)   0.0093 (  2.0%)   98001723  DAG Combining after legalize types
   0.3949 (100.0%)   0.0483 (100.0%)   0.4432 (100.0%)   0.4524 (100.0%)  6374977690  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 2.4318 seconds (2.4717 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.6326 ( 32.9%)   0.2596 ( 50.9%)   0.8922 ( 36.7%)   0.9075 ( 36.7%)  9093031759  AArch64 Instruction Selection
   0.1319 (  6.9%)   0.2043 ( 40.0%)   0.3361 ( 13.8%)   0.3398 ( 13.7%)  3764363631  AArch64 Assembly Printer
   0.2016 ( 10.5%)   0.0005 (  0.1%)   0.2021 (  8.3%)   0.2036 (  8.2%)  2487079531  Branch Probability Basic Block Placement
   0.1485 (  7.7%)   0.0004 (  0.1%)   0.1489 (  6.1%)   0.1497 (  6.1%)  1184297842  Control Flow Optimizer
   0.0899 (  4.7%)   0.0060 (  1.2%)   0.0960 (  3.9%)   0.0971 (  3.9%)  1123119540  Merge disjoint stack slots
   0.0566 (  2.9%)   0.0017 (  0.3%)   0.0582 (  2.4%)   0.0592 (  2.4%)  581010640  Greedy Register Allocator
   0.0446 (  2.3%)   0.0018 (  0.3%)   0.0464 (  1.9%)   0.0477 (  1.9%)  398700449  CodeGen Prepare
   0.0440 (  2.3%)   0.0004 (  0.1%)   0.0444 (  1.8%)   0.0454 (  1.8%)  320770210  Simple Register Coalescing
   0.0375 (  2.0%)   0.0008 (  0.2%)   0.0384 (  1.6%)   0.0384 (  1.6%)  514716387  Live Variable Analysis
   0.0324 (  1.7%)   0.0012 (  0.2%)   0.0336 (  1.4%)   0.0337 (  1.4%)  193160032  Live Interval Analysis
   0.0311 (  1.6%)   0.0004 (  0.1%)   0.0316 (  1.3%)   0.0330 (  1.3%)  371458250  Machine Instruction Scheduler
   0.0267 (  1.4%)   0.0001 (  0.0%)   0.0269 (  1.1%)   0.0270 (  1.1%)  331502370  AArch64 load / store optimization pass
   0.0202 (  1.0%)   0.0002 (  0.0%)   0.0204 (  0.8%)   0.0208 (  0.8%)  130127378  Prologue/Epilogue Insertion & Frame Finalization
   0.0159 (  0.8%)   0.0003 (  0.0%)   0.0162 (  0.7%)   0.0169 (  0.7%)  108527868  Machine code sinking
   0.0150 (  0.8%)   0.0011 (  0.2%)   0.0162 (  0.7%)   0.0162 (  0.7%)  125256424  Memory SSA
   0.0146 (  0.8%)   0.0002 (  0.0%)   0.0148 (  0.6%)   0.0149 (  0.6%)  157745290  Remove dead machine instructions
   0.0120 (  0.6%)   0.0003 (  0.1%)   0.0123 (  0.5%)   0.0126 (  0.5%)   69240869  Virtual Register Rewriter
   0.0119 (  0.6%)   0.0003 (  0.1%)   0.0122 (  0.5%)   0.0124 (  0.5%)  105492803  Machine Common Subexpression Elimination
   0.0097 (  0.5%)   0.0001 (  0.0%)   0.0097 (  0.4%)   0.0098 (  0.4%)   62131793  Branch Probability Analysis #2
   0.0092 (  0.5%)   0.0002 (  0.0%)   0.0094 (  0.4%)   0.0097 (  0.4%)   96000604  Two-Address instruction pass
   0.0092 (  0.5%)   0.0002 (  0.0%)   0.0094 (  0.4%)   0.0095 (  0.4%)  113744830  Peephole Optimizations
   0.0078 (  0.4%)   0.0004 (  0.1%)   0.0082 (  0.3%)   0.0089 (  0.4%)  103346285  Loop Strength Reduction
   0.0083 (  0.4%)   0.0002 (  0.0%)   0.0085 (  0.4%)   0.0085 (  0.3%)   61189281  Branch Probability Analysis
   0.0081 (  0.4%)   0.0001 (  0.0%)   0.0082 (  0.3%)   0.0084 (  0.3%)  100283314  Machine Copy Propagation Pass
   0.0071 (  0.4%)   0.0009 (  0.2%)   0.0080 (  0.3%)   0.0083 (  0.3%)   56202830  Eliminate PHI nodes for register allocation
   0.0070 (  0.4%)   0.0005 (  0.1%)   0.0075 (  0.3%)   0.0080 (  0.3%)   54314737  MachinePostDominator Tree Construction
   0.0068 (  0.4%)   0.0010 (  0.2%)   0.0077 (  0.3%)   0.0078 (  0.3%)   44633924  Slot index numbering
   0.0072 (  0.4%)   0.0002 (  0.0%)   0.0074 (  0.3%)   0.0076 (  0.3%)   87766406  Early Tail Duplication
   0.0074 (  0.4%)   0.0001 (  0.0%)   0.0076 (  0.3%)   0.0076 (  0.3%)   80626051  Remove dead machine instructions #2
   0.0069 (  0.4%)   0.0005 (  0.1%)   0.0074 (  0.3%)   0.0074 (  0.3%)   41014285  Slot index numbering #2
   0.0060 (  0.3%)   0.0007 (  0.1%)   0.0067 (  0.3%)   0.0072 (  0.3%)   41140942  MachineDominator Tree Construction
   0.0070 (  0.4%)   0.0002 (  0.0%)   0.0072 (  0.3%)   0.0072 (  0.3%)   73907009  Simplify the CFG
   0.0068 (  0.4%)   0.0001 (  0.0%)   0.0069 (  0.3%)   0.0069 (  0.3%)   84586206  Machine Copy Propagation Pass #2
   0.0061 (  0.3%)   0.0004 (  0.1%)   0.0065 (  0.3%)   0.0065 (  0.3%)   54340145  MachinePostDominator Tree Construction #2
   0.0057 (  0.3%)   0.0006 (  0.1%)   0.0063 (  0.3%)   0.0064 (  0.3%)   54059079  Post-Dominator Tree Construction #2
   0.0058 (  0.3%)   0.0001 (  0.0%)   0.0059 (  0.2%)   0.0059 (  0.2%)   46145979  AArch64 Collect Linker Optimization Hint (LOH)
   0.0051 (  0.3%)   0.0006 (  0.1%)   0.0057 (  0.2%)   0.0057 (  0.2%)   54005700  Post-Dominator Tree Construction
   0.0050 (  0.3%)   0.0006 (  0.1%)   0.0056 (  0.2%)   0.0056 (  0.2%)   44647405  MachinePostDominator Tree Construction #3
   0.0048 (  0.2%)   0.0003 (  0.1%)   0.0050 (  0.2%)   0.0056 (  0.2%)   47534346  Machine InstCombiner
   0.0044 (  0.2%)   0.0004 (  0.1%)   0.0049 (  0.2%)   0.0049 (  0.2%)   40128980  MachineDominator Tree Construction #4
   0.0045 (  0.2%)   0.0002 (  0.0%)   0.0047 (  0.2%)   0.0049 (  0.2%)   42290173  AArch64 pseudo instruction expansion pass
   0.0045 (  0.2%)   0.0003 (  0.1%)   0.0048 (  0.2%)   0.0048 (  0.2%)   48064278  Block Frequency Analysis
   0.0044 (  0.2%)   0.0004 (  0.1%)   0.0048 (  0.2%)   0.0048 (  0.2%)   40080835  MachineDominator Tree Construction #2
   0.0042 (  0.2%)   0.0005 (  0.1%)   0.0047 (  0.2%)   0.0047 (  0.2%)   41236504  MachineDominator Tree Construction #5
   0.0038 (  0.2%)   0.0002 (  0.0%)   0.0040 (  0.2%)   0.0047 (  0.2%)   37338288  Constant Hoisting
   0.0043 (  0.2%)   0.0003 (  0.1%)   0.0046 (  0.2%)   0.0046 (  0.2%)   39083275  Dominator Tree Construction #8
   0.0044 (  0.2%)   0.0001 (  0.0%)   0.0046 (  0.2%)   0.0045 (  0.2%)   15237924  ObjC ARC contraction
   0.0041 (  0.2%)   0.0004 (  0.1%)   0.0044 (  0.2%)   0.0045 (  0.2%)   39207224  Dominator Tree Construction #4
   0.0037 (  0.2%)   0.0003 (  0.1%)   0.0040 (  0.2%)   0.0044 (  0.2%)   50164445  Induction Variable Users
   0.0039 (  0.2%)   0.0005 (  0.1%)   0.0044 (  0.2%)   0.0043 (  0.2%)   38877096  Dominator Tree Construction
   0.0038 (  0.2%)   0.0003 (  0.1%)   0.0042 (  0.2%)   0.0041 (  0.2%)   40417867  MachineDominator Tree Construction #3
   0.0037 (  0.2%)   0.0004 (  0.1%)   0.0041 (  0.2%)   0.0041 (  0.2%)   39442007  Dominator Tree Construction #5
   0.0039 (  0.2%)   0.0001 (  0.0%)   0.0040 (  0.2%)   0.0041 (  0.2%)   15783281  AArch64 Compress Jump Tables
   0.0035 (  0.2%)   0.0005 (  0.1%)   0.0040 (  0.2%)   0.0040 (  0.2%)   34129315  MachineDominator Tree Construction #6
   0.0026 (  0.1%)   0.0014 (  0.3%)   0.0039 (  0.2%)   0.0040 (  0.2%)   32983814  Free MachineFunction
   0.0034 (  0.2%)   0.0005 (  0.1%)   0.0039 (  0.2%)   0.0039 (  0.2%)   38705492  Dominator Tree Construction #2
   0.0035 (  0.2%)   0.0002 (  0.0%)   0.0036 (  0.1%)   0.0039 (  0.2%)   39711609  Local Stack Slot Allocation
   0.0037 (  0.2%)   0.0002 (  0.0%)   0.0038 (  0.2%)   0.0038 (  0.2%)   26998014  Machine Block Frequency Analysis #5
   0.0037 (  0.2%)   0.0001 (  0.0%)   0.0038 (  0.2%)   0.0038 (  0.2%)   14187857  Finalize ISel and expand pseudo-instructions
   0.0034 (  0.2%)   0.0005 (  0.1%)   0.0038 (  0.2%)   0.0038 (  0.2%)   39547991  Dominator Tree Construction #3
   0.0035 (  0.2%)   0.0003 (  0.1%)   0.0038 (  0.2%)   0.0038 (  0.2%)   39124746  Dominator Tree Construction #6
   0.0035 (  0.2%)   0.0001 (  0.0%)   0.0037 (  0.2%)   0.0038 (  0.2%)   18626552  AArch64 Condition Optimizer
   0.0037 (  0.2%)   0.0001 (  0.0%)   0.0037 (  0.2%)   0.0038 (  0.2%)   28787069  AArch64 Dead register definitions
   0.0034 (  0.2%)   0.0002 (  0.0%)   0.0036 (  0.1%)   0.0038 (  0.2%)   15302878  Branch relaxation pass
   0.0033 (  0.2%)   0.0003 (  0.1%)   0.0036 (  0.1%)   0.0037 (  0.1%)   39363543  Dominator Tree Construction #7
   0.0032 (  0.2%)   0.0001 (  0.0%)   0.0034 (  0.1%)   0.0036 (  0.1%)   21702873  Post-RA pseudo instruction expansion pass
   0.0033 (  0.2%)   0.0001 (  0.0%)   0.0034 (  0.1%)   0.0034 (  0.1%)   31528840  Machine Block Frequency Analysis #3
   0.0030 (  0.2%)   0.0002 (  0.0%)   0.0031 (  0.1%)   0.0033 (  0.1%)   31375217  Machine Block Frequency Analysis
   0.0030 (  0.2%)   0.0001 (  0.0%)   0.0031 (  0.1%)   0.0031 (  0.1%)   13939713  Interleaved Load Combine Pass
   0.0029 (  0.2%)   0.0001 (  0.0%)   0.0030 (  0.1%)   0.0031 (  0.1%)   31374222  Machine Block Frequency Analysis #2
   0.0026 (  0.1%)   0.0002 (  0.0%)   0.0028 (  0.1%)   0.0030 (  0.1%)   22842835  Shrink Wrapping analysis
   0.0029 (  0.2%)   0.0001 (  0.0%)   0.0030 (  0.1%)   0.0030 (  0.1%)    8921850  AArch64 Conditional Branch Tuning
   0.0028 (  0.1%)   0.0001 (  0.0%)   0.0029 (  0.1%)   0.0029 (  0.1%)    7404709  Unpack machine instruction bundles
   0.0027 (  0.1%)   0.0001 (  0.0%)   0.0028 (  0.1%)   0.0028 (  0.1%)   31289526  Machine Block Frequency Analysis #4
   0.0024 (  0.1%)   0.0001 (  0.0%)   0.0026 (  0.1%)   0.0027 (  0.1%)   16579584  PostRA Machine Sink
   0.0026 (  0.1%)   0.0001 (  0.0%)   0.0027 (  0.1%)   0.0027 (  0.1%)   20830194  Natural Loop Information #6
   0.0022 (  0.1%)   0.0004 (  0.1%)   0.0027 (  0.1%)   0.0027 (  0.1%)   39019060  Natural Loop Information
   0.0017 (  0.1%)   0.0002 (  0.0%)   0.0019 (  0.1%)   0.0026 (  0.1%)   16821219  Tail Duplication
   0.0024 (  0.1%)   0.0002 (  0.0%)   0.0026 (  0.1%)   0.0026 (  0.1%)   32596316  Canonicalize Freeze Instructions in Loops
   0.0024 (  0.1%)   0.0001 (  0.0%)   0.0026 (  0.1%)   0.0026 (  0.1%)   17441685  Lower constant intrinsics
   0.0022 (  0.1%)   0.0002 (  0.0%)   0.0024 (  0.1%)   0.0025 (  0.1%)   18700525  Machine Natural Loop Construction
   0.0022 (  0.1%)   0.0001 (  0.0%)   0.0023 (  0.1%)   0.0023 (  0.1%)   14093543  Remove unreachable machine basic blocks
   0.0021 (  0.1%)   0.0001 (  0.0%)   0.0022 (  0.1%)   0.0022 (  0.1%)   11657502  AArch64 MI Peephole Optimization pass
   0.0021 (  0.1%)   0.0001 (  0.0%)   0.0022 (  0.1%)   0.0022 (  0.1%)   10808188  Insert stack protectors
   0.0021 (  0.1%)   0.0001 (  0.0%)   0.0022 (  0.1%)   0.0022 (  0.1%)   18979256  Expand memcmp() to load/stores
   0.0021 (  0.1%)   0.0001 (  0.0%)   0.0022 (  0.1%)   0.0022 (  0.1%)   20817342  Natural Loop Information #5
   0.0020 (  0.1%)   0.0001 (  0.0%)   0.0022 (  0.1%)   0.0022 (  0.1%)   20738170  Natural Loop Information #3
   0.0020 (  0.1%)   0.0001 (  0.0%)   0.0021 (  0.1%)   0.0021 (  0.1%)   19900880  Natural Loop Information #4
   0.0019 (  0.1%)   0.0000 (  0.0%)   0.0019 (  0.1%)   0.0021 (  0.1%)    7976838  AArch64 Promote Constant
   0.0019 (  0.1%)   0.0001 (  0.0%)   0.0020 (  0.1%)   0.0020 (  0.1%)    9966904  AArch64 Store Pair Suppression
   0.0019 (  0.1%)   0.0001 (  0.0%)   0.0020 (  0.1%)   0.0020 (  0.1%)   15096748  Type Promotion
   0.0019 (  0.1%)   0.0001 (  0.0%)   0.0020 (  0.1%)   0.0020 (  0.1%)    9099038  AArch64 Stack Tagging PreRA
   0.0017 (  0.1%)   0.0001 (  0.0%)   0.0018 (  0.1%)   0.0020 (  0.1%)   10014588  Expand large div/rem
   0.0018 (  0.1%)   0.0001 (  0.0%)   0.0019 (  0.1%)   0.0020 (  0.1%)   18664096  Machine Natural Loop Construction #3
   0.0018 (  0.1%)   0.0001 (  0.0%)   0.0019 (  0.1%)   0.0019 (  0.1%)   18156000  Machine Cycle Info Analysis
   0.0019 (  0.1%)   0.0001 (  0.0%)   0.0020 (  0.1%)   0.0019 (  0.1%)   19852274  Natural Loop Information #2
   0.0018 (  0.1%)   0.0001 (  0.0%)   0.0019 (  0.1%)   0.0019 (  0.1%)   13589190  Remove unreachable blocks from the CFG
   0.0017 (  0.1%)   0.0001 (  0.0%)   0.0018 (  0.1%)   0.0019 (  0.1%)   18533280  Machine Natural Loop Construction #2
   0.0018 (  0.1%)   0.0001 (  0.0%)   0.0019 (  0.1%)   0.0019 (  0.1%)    9133019  Process Implicit Definitions
   0.0018 (  0.1%)   0.0001 (  0.0%)   0.0018 (  0.1%)   0.0019 (  0.1%)   16950641  Machine Natural Loop Construction #4
   0.0018 (  0.1%)   0.0001 (  0.0%)   0.0019 (  0.1%)   0.0019 (  0.1%)   11227404  Interleaved Access Pass
   0.0017 (  0.1%)   0.0001 (  0.0%)   0.0018 (  0.1%)   0.0018 (  0.1%)    9472616  Debug Variable Analysis
   0.0017 (  0.1%)   0.0001 (  0.0%)   0.0018 (  0.1%)   0.0018 (  0.1%)   18265577  Partially inline calls to library functions
   0.0012 (  0.1%)   0.0002 (  0.0%)   0.0014 (  0.1%)   0.0016 (  0.1%)   18316073  Early Machine Loop Invariant Code Motion
   0.0013 (  0.1%)   0.0001 (  0.0%)   0.0014 (  0.1%)   0.0014 (  0.1%)    8077346  AArch64 Expand Hardened Pseudos
   0.0013 (  0.1%)   0.0001 (  0.0%)   0.0014 (  0.1%)   0.0014 (  0.1%)   12465953  Early If-Conversion
   0.0011 (  0.1%)   0.0001 (  0.0%)   0.0012 (  0.1%)   0.0014 (  0.1%)    8907759  AArch64 Redundant Copy Elimination
   0.0010 (  0.0%)   0.0001 (  0.0%)   0.0011 (  0.0%)   0.0014 (  0.1%)   11883955  AArch64 Conditional Compares
   0.0012 (  0.1%)   0.0001 (  0.0%)   0.0013 (  0.1%)   0.0013 (  0.1%)    9928839  Replace intrinsics with calls to vector library
   0.0012 (  0.1%)   0.0001 (  0.0%)   0.0013 (  0.1%)   0.0013 (  0.1%)   11218060  Expand Atomic instructions
   0.0010 (  0.1%)   0.0001 (  0.0%)   0.0011 (  0.0%)   0.0011 (  0.0%)   11063391  Scalarize Masked Memory Intrinsics
   0.0009 (  0.0%)   0.0001 (  0.0%)   0.0010 (  0.0%)   0.0011 (  0.0%)   10129231  Expand vector predication intrinsics
   0.0008 (  0.0%)   0.0001 (  0.0%)   0.0009 (  0.0%)   0.0010 (  0.0%)   13439385  Scalar Evolution Analysis
   0.0007 (  0.0%)   0.0002 (  0.0%)   0.0008 (  0.0%)   0.0010 (  0.0%)    7808228  Optimize machine instruction PHIs
   0.0002 (  0.0%)   0.0003 (  0.1%)   0.0004 (  0.0%)   0.0010 (  0.0%)    7225458  AArch64 SIMD instructions optimization pass
   0.0009 (  0.0%)   0.0001 (  0.0%)   0.0010 (  0.0%)   0.0009 (  0.0%)   10030927  Expand reduction intrinsics
   0.0007 (  0.0%)   0.0001 (  0.0%)   0.0009 (  0.0%)   0.0009 (  0.0%)   10509325  Exception handling preparation
   0.0006 (  0.0%)   0.0001 (  0.0%)   0.0007 (  0.0%)   0.0009 (  0.0%)   10756261  Loop Data Prefetch
   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0008 (  0.0%)     399932  Stack Safety Analysis
   0.0006 (  0.0%)   0.0001 (  0.0%)   0.0007 (  0.0%)   0.0007 (  0.0%)    9552554  Bundle Machine CFG Edges
   0.0005 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)    8151432  Spill Code Placement Analysis
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0006 (  0.0%)    8592314  Canonicalize natural loops
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)    7705783  Machine Trace Metrics
   0.0003 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0006 (  0.0%)    8462909  Basic Alias Analysis (stateless AA impl)
   0.0005 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)    8259233  Merge contiguous icmps into a memcmp
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)    6654996  AArch64 sls hardening pass
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)    8165062  Function Alias Analysis Results #5
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)    8290975  Machine Loop Invariant Code Motion
   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)     382929  Machine Outliner
   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)    6516167  Remove Redundant DEBUG_VALUE analysis
   0.0003 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)    7040456  Basic Alias Analysis (stateless AA impl) #5
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7042205  Live Register Matrix
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7919675  Function Alias Analysis Results #3
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7914823  Function Alias Analysis Results #2
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7427618  Falkor HW Prefetch Fix
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7919671  Function Alias Analysis Results #4
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7001940  Basic Alias Analysis (stateless AA impl) #4
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7897532  Function Alias Analysis Results
   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    6738461  Machine Trace Metrics #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0003 (  0.0%)    6432875  Insert CFI remember/restore state instructions
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6552466  Virtual Register Map
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0002 (  0.0%)    6857488  Lazy Branch Probability Analysis #3
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0002 (  0.0%)    6980404  Basic Alias Analysis (stateless AA impl) #3
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6992533  Basic Alias Analysis (stateless AA impl) #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6567037  Live DEBUG_VALUE analysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6546165  Insert KCFI indirect call checks
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    7115379  Canonicalize natural loops #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6705679  SME ABI Pass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0002 (  0.0%)    6901143  Lazy Branch Probability Analysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6902561  Lazy Branch Probability Analysis #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6400876  Lazy Machine Block Frequency Analysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6435520  Insert fentry calls
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6403950  Lazy Machine Block Frequency Analysis #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6402121  Lazy Machine Block Frequency Analysis #6
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6405330  Falkor HW Prefetch Fix Late Phase
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6402810  AArch64 Branch Targets
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6502009  Insert XRay ops
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6408743  TLS Variable Hoist
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6465573  Implement the 'patchable-function' attribute
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6411075  SME Peephole Optimization pass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6425271  PostRA Machine Instruction Scheduler
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6485534  Machine Optimization Remark Emitter #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6403937  Rename Disconnected Subregister Components
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6409365  Live Stack Slot Analysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6485019  Machine Optimization Remark Emitter #3
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6401670  Register Allocation Pass Scoring
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6485531  Machine Optimization Remark Emitter
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6403950  AArch64 speculation hardening pass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6440444  Stack Slot Coloring
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6412128  Fixup Statepoint Caller Saved
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6376590  Lazy Block Frequency Analysis #3
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6403950  Lazy Machine Block Frequency Analysis #5
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6367954  Safe Stack instrumentation pass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6403685  StackMap Liveness Analysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6402217  Analyze Machine Code For Garbage Collection
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6412517  A57 FP Anti-dependency breaker
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6376590  Lazy Block Frequency Analysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6365760  AArch64 Stack Tagging
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6418882  Contiguously Lay Out Funclets
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6362055  Lower Garbage Collection Instructions
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6420842  AArch64 Indirect Thunks
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6364901  Shadow Stack GC Lowering
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6401666  Workaround A53 erratum 835769 pass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6408311  Lazy Machine Block Frequency Analysis #3
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6360056  Merge internal globals
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6453640  Optimization Remark Emitter
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6376621  Lazy Block Frequency Analysis #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6401955  Detect Dead Lanes
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6403961  Lazy Machine Block Frequency Analysis #4
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)     495128  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      34630  Pre-ISel Intrinsic Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Machine Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Profile summary info
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Default Regalloc Priority Advisor
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Default Regalloc Eviction Advisor
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      22473  Lower @llvm.global_dtors via `__cxa_atexit`
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Scoped NoAlias Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Target Pass Configuration
   1.9215 (100.0%)   0.5103 (100.0%)   2.4318 (100.0%)   2.4717 (100.0%)  24676503454  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0681 seconds (0.0690 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.0323 (100.0%)   0.0358 (100.0%)   0.0681 (100.0%)   0.0690 (100.0%)  2375980112  DWARF Exception Writer
   0.0323 (100.0%)   0.0358 (100.0%)   0.0681 (100.0%)   0.0690 (100.0%)  2375980112  Total

===-------------------------------------------------------------------------===
                          Clang front-end time report
===-------------------------------------------------------------------------===
  Total Execution Time: 48.2802 seconds (48.8638 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
  47.3865 (100.0%)   0.8937 (100.0%)  48.2802 (100.0%)  48.8638 (100.0%)  578082259552  Clang front-end timer
  47.3865 (100.0%)   0.8937 (100.0%)  48.2802 (100.0%)  48.8638 (100.0%)  578082259552  Total

 -ftime-report -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB 47.40s user 0.93s system 98% cpu 49.062 total
```

After
```
% time /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -ftime-report -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/Users/nshulga/git/pytorch/pytorch/build/aten/src -I/Users/nshulga/git/pytorch/pytorch/aten/src -I/Users/nshulga/git/pytorch/pytorch/build -I/Users/nshulga/git/pytorch/pytorch -I/Users/nshulga/git/pytorch/pytorch/cmake/../third_party/benchmark/include -I/Users/nshulga/git/pytorch/pytorch/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/build/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/build/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include -I/Users/nshulga/git/pytorch/pytorch/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/../aten/src -I/Users/nshulga/git/pytorch/pytorch/torch/csrc -I/Users/nshulga/git/pytorch/pytorch/third_party/miniz-2.1.0 -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/include -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/src -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/FXdiv/include -I/Users/nshulga/git/pytorch/pytorch/c10/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/pthreadpool/include -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/deps/clog/include -I/Users/nshulga/git/pytorch/pytorch/third_party/NNPACK/include -I/Users/nshulga/git/pytorch/pytorch/third_party/FP16/include -I/Users/nshulga/git/pytorch/pytorch/third_party/fmt/include -I/Users/nshulga/git/pytorch/pytorch/third_party/flatbuffers/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googletest/include -isystem /Users/nshulga/git/pytorch/pytorch/third_party/protobuf/src -isystem /Users/nshulga/git/pytorch/pytorch/third_party/XNNPACK/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/eigen -isystem /Users/nshulga/git/pytorch/pytorch/build/include  -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wsuggest-override -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wno-error=inconsistent-missing-override -Wno-error=inconsistent-missing-destructor-override -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -faligned-new -Werror -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces -O3 -DNDEBUG -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk -fPIC -D__NEON__ -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-unused-function -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-strict-overflow -Wno-strict-aliasing -fvisibility=hidden -O2 -Wmissing-prototypes -Werror=missing-prototypes -Xpreprocessor -fopenmp -I/Users/nshulga/miniforge3/include -std=gnu++17 -Wno-missing-prototypes -Wno-error=missing-prototypes -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSchema.cpp.o -c /Users/nshulga/git/pytorch/pytorch/build/aten/src/ATen/RegisterSchema.cpp
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 1.2920 seconds (1.3187 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.3070 ( 27.6%)   0.0547 ( 30.2%)   0.3617 ( 28.0%)   0.3654 ( 27.7%)  3719690895  ModuleInlinerWrapperPass
   0.3024 ( 27.2%)   0.0525 ( 29.0%)   0.3549 ( 27.5%)   0.3585 ( 27.2%)  3653363330  DevirtSCCRepeatedPass
   0.0619 (  5.6%)   0.0073 (  4.0%)   0.0692 (  5.4%)   0.0711 (  5.4%)  868136227  InstCombinePass
   0.0601 (  5.4%)   0.0065 (  3.6%)   0.0666 (  5.2%)   0.0679 (  5.1%)  696430647  InlinerPass
   0.0363 (  3.3%)   0.0033 (  1.8%)   0.0396 (  3.1%)   0.0425 (  3.2%)  535426974  SimplifyCFGPass
   0.0280 (  2.5%)   0.0069 (  3.8%)   0.0348 (  2.7%)   0.0358 (  2.7%)  378716394  BlockFrequencyAnalysis
   0.0208 (  1.9%)   0.0049 (  2.7%)   0.0257 (  2.0%)   0.0262 (  2.0%)  283689627  BranchProbabilityAnalysis
   0.0239 (  2.1%)   0.0002 (  0.1%)   0.0241 (  1.9%)   0.0241 (  1.8%)  219122704  OpenMPOptCGSCCPass
   0.0174 (  1.6%)   0.0015 (  0.8%)   0.0189 (  1.5%)   0.0192 (  1.5%)  215583965  GVNPass
   0.0153 (  1.4%)   0.0025 (  1.4%)   0.0178 (  1.4%)   0.0187 (  1.4%)  184232295  EarlyCSEPass
   0.0079 (  0.7%)   0.0064 (  3.5%)   0.0143 (  1.1%)   0.0145 (  1.1%)  192415300  AAManager
   0.0116 (  1.0%)   0.0019 (  1.0%)   0.0134 (  1.0%)   0.0135 (  1.0%)  153354488  JumpThreadingPass
   0.0099 (  0.9%)   0.0023 (  1.3%)   0.0122 (  0.9%)   0.0131 (  1.0%)  128911185  CGProfilePass
   0.0081 (  0.7%)   0.0022 (  1.2%)   0.0103 (  0.8%)   0.0128 (  1.0%)  112266933  SLPVectorizerPass
   0.0119 (  1.1%)   0.0005 (  0.3%)   0.0124 (  1.0%)   0.0125 (  0.9%)  131510939  MemorySSAAnalysis
   0.0122 (  1.1%)   0.0002 (  0.1%)   0.0124 (  1.0%)   0.0124 (  0.9%)  129264559  DSEPass
   0.0108 (  1.0%)   0.0010 (  0.6%)   0.0118 (  0.9%)   0.0119 (  0.9%)  158891693  DominatorTreeAnalysis
   0.0116 (  1.0%)   0.0002 (  0.1%)   0.0119 (  0.9%)   0.0119 (  0.9%)  118946130  CorrelatedValuePropagationPass
   0.0082 (  0.7%)   0.0017 (  0.9%)   0.0099 (  0.8%)   0.0100 (  0.8%)  120247256  LoopAnalysis
   0.0090 (  0.8%)   0.0008 (  0.5%)   0.0099 (  0.8%)   0.0099 (  0.8%)   84784225  ADCEPass
   0.0076 (  0.7%)   0.0014 (  0.8%)   0.0090 (  0.7%)   0.0098 (  0.7%)  111411449  SROAPass
   0.0080 (  0.7%)   0.0005 (  0.3%)   0.0085 (  0.7%)   0.0085 (  0.6%)  109824455  PostDominatorTreeAnalysis
   0.0063 (  0.6%)   0.0012 (  0.7%)   0.0076 (  0.6%)   0.0079 (  0.6%)   80323239  LoopVectorizePass
   0.0068 (  0.6%)   0.0003 (  0.2%)   0.0071 (  0.6%)   0.0076 (  0.6%)   60675565  LoopIdiomRecognizePass
   0.0068 (  0.6%)   0.0004 (  0.2%)   0.0072 (  0.6%)   0.0071 (  0.5%)   87177852  LICMPass
   0.0046 (  0.4%)   0.0021 (  1.1%)   0.0067 (  0.5%)   0.0069 (  0.5%)   74829034  PostOrderFunctionAttrsPass
   0.0064 (  0.6%)   0.0001 (  0.1%)   0.0065 (  0.5%)   0.0065 (  0.5%)   48619557  SCCPPass
   0.0063 (  0.6%)   0.0001 (  0.1%)   0.0064 (  0.5%)   0.0064 (  0.5%)   71987307  LoopDeletionPass
   0.0058 (  0.5%)   0.0000 (  0.0%)   0.0059 (  0.5%)   0.0059 (  0.4%)   71423762  HotColdSplittingPass
   0.0050 (  0.5%)   0.0006 (  0.3%)   0.0057 (  0.4%)   0.0056 (  0.4%)   57327860  MemCpyOptPass
   0.0043 (  0.4%)   0.0013 (  0.7%)   0.0056 (  0.4%)   0.0056 (  0.4%)   73868907  LoopSimplifyPass
   0.0054 (  0.5%)   0.0000 (  0.0%)   0.0055 (  0.4%)   0.0055 (  0.4%)   61231613  LoopUnrollPass
   0.0045 (  0.4%)   0.0009 (  0.5%)   0.0054 (  0.4%)   0.0054 (  0.4%)   63427035  LoopSinkPass
   0.0031 (  0.3%)   0.0022 (  1.2%)   0.0053 (  0.4%)   0.0053 (  0.4%)   60661182  LowerMatrixIntrinsicsPass
   0.0039 (  0.3%)   0.0003 (  0.2%)   0.0042 (  0.3%)   0.0053 (  0.4%)   37913352  GlobalOptPass
   0.0037 (  0.3%)   0.0010 (  0.6%)   0.0047 (  0.4%)   0.0050 (  0.4%)   40405305  IPSCCPPass
   0.0031 (  0.3%)   0.0014 (  0.8%)   0.0045 (  0.3%)   0.0046 (  0.3%)   76160561  BasicAA
   0.0036 (  0.3%)   0.0007 (  0.4%)   0.0043 (  0.3%)   0.0043 (  0.3%)   40024164  BDCEPass
   0.0011 (  0.1%)   0.0009 (  0.5%)   0.0020 (  0.2%)   0.0036 (  0.3%)   27093400  TargetIRAnalysis
   0.0033 (  0.3%)   0.0002 (  0.1%)   0.0035 (  0.3%)   0.0035 (  0.3%)   39935174  TailCallElimPass
   0.0026 (  0.2%)   0.0007 (  0.4%)   0.0033 (  0.3%)   0.0033 (  0.3%)   44962489  ScalarEvolutionAnalysis
   0.0028 (  0.3%)   0.0002 (  0.1%)   0.0030 (  0.2%)   0.0032 (  0.2%)   30018982  ReassociatePass
   0.0028 (  0.3%)   0.0002 (  0.1%)   0.0030 (  0.2%)   0.0032 (  0.2%)   28955128  IndVarSimplifyPass
   0.0030 (  0.3%)   0.0001 (  0.0%)   0.0031 (  0.2%)   0.0031 (  0.2%)   31205149  CalledValuePropagationPass
   0.0018 (  0.2%)   0.0004 (  0.2%)   0.0022 (  0.2%)   0.0022 (  0.2%)   22045025  Float2IntPass
   0.0020 (  0.2%)   0.0001 (  0.0%)   0.0020 (  0.2%)   0.0020 (  0.2%)   23867545  LoopLoadEliminationPass
   0.0006 (  0.1%)   0.0005 (  0.3%)   0.0011 (  0.1%)   0.0020 (  0.2%)    7821972  OpenMPOptPass
   0.0011 (  0.1%)   0.0004 (  0.2%)   0.0015 (  0.1%)   0.0017 (  0.1%)   35512421  LCSSAPass
   0.0015 (  0.1%)   0.0002 (  0.1%)   0.0017 (  0.1%)   0.0017 (  0.1%)   28268765  VectorCombinePass
   0.0009 (  0.1%)   0.0007 (  0.4%)   0.0016 (  0.1%)   0.0016 (  0.1%)   23018362  MemoryDependenceAnalysis
   0.0014 (  0.1%)   0.0000 (  0.0%)   0.0015 (  0.1%)   0.0015 (  0.1%)    9265818  GlobalDCEPass
   0.0013 (  0.1%)   0.0000 (  0.0%)   0.0013 (  0.1%)   0.0013 (  0.1%)   17548240  InstSimplifyPass
   0.0009 (  0.1%)   0.0004 (  0.2%)   0.0013 (  0.1%)   0.0013 (  0.1%)   15122797  LowerConstantIntrinsicsPass
   0.0011 (  0.1%)   0.0000 (  0.0%)   0.0011 (  0.1%)   0.0011 (  0.1%)    8506690  CallGraphAnalysis
   0.0008 (  0.1%)   0.0000 (  0.0%)   0.0009 (  0.1%)   0.0009 (  0.1%)    7505976  RequireAnalysisPass<llvm::GlobalsAA, llvm::Module, llvm::AnalysisManager<Module>>
   0.0008 (  0.1%)   0.0000 (  0.0%)   0.0009 (  0.1%)   0.0009 (  0.1%)    7485525  GlobalsAA
   0.0005 (  0.0%)   0.0002 (  0.1%)   0.0007 (  0.1%)   0.0009 (  0.1%)    9580105  LowerExpectIntrinsicPass
   0.0007 (  0.1%)   0.0001 (  0.1%)   0.0008 (  0.1%)   0.0008 (  0.1%)   12017197  LoopFullUnrollPass
   0.0006 (  0.1%)   0.0001 (  0.1%)   0.0007 (  0.1%)   0.0007 (  0.1%)   11381083  MergedLoadStoreMotionPass
   0.0004 (  0.0%)   0.0003 (  0.2%)   0.0007 (  0.1%)   0.0007 (  0.1%)   10150222  LoopDistributePass
   0.0007 (  0.1%)   0.0000 (  0.0%)   0.0007 (  0.1%)   0.0007 (  0.1%)    5649265  ReversePostOrderFunctionAttrsPass
   0.0005 (  0.0%)   0.0002 (  0.1%)   0.0007 (  0.1%)   0.0007 (  0.1%)   18702545  TargetLibraryAnalysis
   0.0006 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)    9964138  LoopInstSimplifyPass
   0.0004 (  0.0%)   0.0002 (  0.1%)   0.0006 (  0.0%)   0.0006 (  0.0%)   15049482  LoopRotatePass
   0.0005 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)   10571955  LibCallsShrinkWrapPass
   0.0004 (  0.0%)   0.0002 (  0.1%)   0.0006 (  0.0%)   0.0006 (  0.0%)   16184249  DemandedBitsAnalysis
   0.0004 (  0.0%)   0.0001 (  0.1%)   0.0005 (  0.0%)   0.0005 (  0.0%)   11227136  FunctionAnalysisManagerCGSCCProxy
   0.0002 (  0.0%)   0.0002 (  0.1%)   0.0004 (  0.0%)   0.0005 (  0.0%)   11871494  RequireAnalysisPass<llvm::OptimizationRemarkEmitterAnalysis, llvm::Function, llvm::AnalysisManager<Function>>
   0.0003 (  0.0%)   0.0002 (  0.1%)   0.0006 (  0.0%)   0.0005 (  0.0%)   16911686  LazyValueAnalysis
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)    9333915  LoopSimplifyCFGPass
   0.0003 (  0.0%)   0.0002 (  0.1%)   0.0005 (  0.0%)   0.0005 (  0.0%)   13022664  AssumptionAnalysis
   0.0003 (  0.0%)   0.0001 (  0.1%)   0.0004 (  0.0%)   0.0004 (  0.0%)    9524395  SimpleLoopUnswitchPass
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0004 (  0.0%)   0.0004 (  0.0%)   12094779  OptimizationRemarkEmitterAnalysis
   0.0002 (  0.0%)   0.0002 (  0.1%)   0.0004 (  0.0%)   0.0004 (  0.0%)   12032778  ScopedNoAliasAA
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0004 (  0.0%)   0.0004 (  0.0%)   12032220  TypeBasedAA
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8581050  CoroSplitPass
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    5126709  InjectTLIMappings
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8379445  CoroElidePass
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    3890082  RecomputeGlobalsAAPass
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8281975  SpeculativeExecutionPass
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8468516  PhiValuesAnalysis
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    4100685  ConstantMergePass
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8462530  PromotePass
   0.0001 (  0.0%)   0.0001 (  0.1%)   0.0002 (  0.0%)   0.0003 (  0.0%)    8345373  InvalidateAnalysisPass<llvm::AAManager>
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8368732  ShouldNotRunFunctionPassesAnalysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    1308997  InferFunctionAttrsPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    4283689  DivRemPairsPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    4855512  WarnMissedTransformationsPass
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    1157640  LazyCallGraphAnalysis
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0001 (  0.0%)     444866  DeadArgumentEliminationPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3627306  AlignmentFromAssumptionsPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3706342  LoopAccessAnalysis
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3573986  AnnotationRemarksPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)     632159  AlwaysInlinerPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3611080  InvalidateAnalysisPass<llvm::ShouldNotRunFunctionPassesAnalysis>
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      47153  EliminateAvailableExternallyPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      56285  Annotation2MetadataPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      58150  CoroEarlyPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      14016  CoroCleanupPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      13044  RelLookupTableConverterPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      13763  ProfileSummaryAnalysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      12678  InlineAdvisorAnalysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      12411  ForceFunctionAttrsPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      12483  RequireAnalysisPass<llvm::ProfileSummaryAnalysis, llvm::Module, llvm::AnalysisManager<Module>>
   1.1105 (100.0%)   0.1815 (100.0%)   1.2920 (100.0%)   1.3187 (100.0%)  14047165388  Total

===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   1.1296 ( 94.4%)   0.4425 ( 98.2%)   1.5720 ( 95.4%)   1.6099 ( 94.9%)  16626483869  Code Generation Time
   0.0670 (  5.6%)   0.0081 (  1.8%)   0.0751 (  4.6%)   0.0858 (  5.1%)  806754444  LLVM IR Generation Time
   1.1965 (100.0%)   0.4506 (100.0%)   1.6471 (100.0%)   1.6957 (100.0%)  17433238313  Total

===-------------------------------------------------------------------------===
                              Register Allocation
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0007 seconds (0.0007 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.0006 (100.0%)   0.0000 (100.0%)   0.0007 (100.0%)   0.0007 (100.0%)    7870431  Seed Live Regs
   0.0006 (100.0%)   0.0000 (100.0%)   0.0007 (100.0%)   0.0007 (100.0%)    7870431  Total

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.1793 seconds (0.1846 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.0382 ( 24.1%)   0.0025 ( 11.8%)   0.0406 ( 22.7%)   0.0427 ( 23.1%)  449731195  DAG Combining 1
   0.0222 ( 14.0%)   0.0035 ( 16.6%)   0.0257 ( 14.3%)   0.0260 ( 14.1%)  323350124  Instruction Scheduling
   0.0207 ( 13.1%)   0.0024 ( 11.5%)   0.0231 ( 12.9%)   0.0257 ( 13.9%)  305541313  Instruction Selection
   0.0234 ( 14.8%)   0.0019 (  8.9%)   0.0252 ( 14.1%)   0.0255 ( 13.8%)  386744618  DAG Combining 2
   0.0171 ( 10.8%)   0.0026 ( 12.4%)   0.0197 ( 11.0%)   0.0199 ( 10.8%)  304585428  Instruction Creation
   0.0108 (  6.8%)   0.0019 (  9.1%)   0.0127 (  7.1%)   0.0128 (  6.9%)  213503986  DAG Legalization
   0.0107 (  6.7%)   0.0019 (  9.3%)   0.0126 (  7.0%)   0.0124 (  6.7%)  217202416  Type Legalization
   0.0089 (  5.6%)   0.0003 (  1.7%)   0.0093 (  5.2%)   0.0092 (  5.0%)   98375640  DAG Combining after legalize types
   0.0041 (  2.6%)   0.0020 (  9.3%)   0.0061 (  3.4%)   0.0061 (  3.3%)  175213222  Instruction Scheduling Cleanup
   0.0023 (  1.5%)   0.0020 (  9.4%)   0.0043 (  2.4%)   0.0043 (  2.4%)  143306060  Vector Legalization
   0.1584 (100.0%)   0.0209 (100.0%)   0.1793 (100.0%)   0.1846 (100.0%)  2617554002  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.8706 seconds (0.8844 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.2523 ( 41.0%)   0.1142 ( 44.8%)   0.3665 ( 42.1%)   0.3729 ( 42.2%)  3751975511  AArch64 Instruction Selection
   0.0769 ( 12.5%)   0.1178 ( 46.2%)   0.1947 ( 22.4%)   0.1954 ( 22.1%)  2284494832  AArch64 Assembly Printer
   0.0199 (  3.2%)   0.0006 (  0.2%)   0.0205 (  2.4%)   0.0205 (  2.3%)  208860244  Greedy Register Allocator
   0.0169 (  2.8%)   0.0002 (  0.1%)   0.0172 (  2.0%)   0.0171 (  1.9%)  247073374  Live Variable Analysis
   0.0129 (  2.1%)   0.0003 (  0.1%)   0.0132 (  1.5%)   0.0139 (  1.6%)  165651494  CodeGen Prepare
   0.0133 (  2.2%)   0.0003 (  0.1%)   0.0136 (  1.6%)   0.0139 (  1.6%)  153339584  Machine Instruction Scheduler
   0.0105 (  1.7%)   0.0001 (  0.0%)   0.0106 (  1.2%)   0.0106 (  1.2%)  122934084  AArch64 load / store optimization pass
   0.0084 (  1.4%)   0.0003 (  0.1%)   0.0087 (  1.0%)   0.0091 (  1.0%)   81985504  Simple Register Coalescing
   0.0082 (  1.3%)   0.0004 (  0.2%)   0.0086 (  1.0%)   0.0086 (  1.0%)   76550569  Live Interval Analysis
   0.0078 (  1.3%)   0.0003 (  0.1%)   0.0081 (  0.9%)   0.0083 (  0.9%)  103543246  Loop Strength Reduction
   0.0077 (  1.3%)   0.0002 (  0.1%)   0.0079 (  0.9%)   0.0079 (  0.9%)   76599592  Prologue/Epilogue Insertion & Frame Finalization
   0.0064 (  1.0%)   0.0005 (  0.2%)   0.0069 (  0.8%)   0.0077 (  0.9%)   65721168  Merge disjoint stack slots
   0.0067 (  1.1%)  …
pytorchmergebot pushed a commit that referenced this issue Dec 5, 2023
… to hang (#115124)

Let's see if it helps #114913

The issues on llvm are at llvm/llvm-project#55530 and llvm/llvm-project#69369.  In my CI test, I saw the following process hanged:

```
/pytorch/pytorch/.lintbin/clang-tidy -p=/pytorch/pytorch/build --extra-arg -I/usr/lib/llvm-11/include/openmp --extra-arg -I/opt/conda/envs/py_3.9/include/python3.9 --extra-arg -I/pytorch/pytorch/third_party/pybind11/include --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward --extra-arg -I/usr/lib/llvm-14/lib/clang/14.0.0/include --extra-arg -I/usr/local/include --extra-arg -I/usr/include/x86_64-linux-gnu --extra-arg -I/usr/include /pytorch/pytorch/torch/csrc/autograd/python_nested_functions_manual.cpp
```

and the core dump matches the description found in llvm/llvm-project#69369 showing the stuck in `clang::tidy::bugprone::UncheckedOptionalAccessCheck::check`:

```
#0  0x00000000030c7420 in clang::dataflow::WatchedLiteralsSolverImpl::updateWatchedLiterals() ()
#1  0x00000000030c6c2a in clang::dataflow::WatchedLiteralsSolverImpl::solve() && ()
#2  0x00000000030c6572 in clang::dataflow::WatchedLiteralsSolver::solve(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) ()
#3  0x00000000030b3bd3 in clang::dataflow::DataflowAnalysisContext::querySolver(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) ()
#4  0x00000000030b3ca5 in clang::dataflow::DataflowAnalysisContext::flowConditionImplies(clang::dataflow::AtomicBoolValue&, clang::dataflow::BoolValue&) ()
#5  0x00000000030b1213 in clang::dataflow::(anonymous namespace)::diagnoseUnwrapCall(clang::Expr const*, clang::Expr const*, clang::dataflow::Environment const&) ()
#6  0x00000000030b1357 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::(anonymous namespace)::buildDiagnoseMatchSwitch(clang::dataflow::UncheckedOptionalAccessModelOptions const&)::$_7>::_M_invoke(std::_Any_data const&, clang::CallExpr const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) ()
#7  0x00000000030b1292 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::CaseOf<clang::CallExpr>(clang::ast_matchers::internal::Matcher<clang::Stmt>, std::function<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)>) &&::{lambda(clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) ()
#8  0x00000000030b1995 in clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)#1}::operator()(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) const ()
#9  0x00000000030b170c in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) ()
#10 0x00000000030a7c27 in clang::dataflow::UncheckedOptionalAccessDiagnoser::diagnose(clang::ASTContext&, clang::Stmt const*, clang::dataflow::Environment const&) ()
#11 0x0000000002931286 in std::_Function_handler<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&), clang::tidy::bugprone::analyzeFunction(clang::FunctionDecl const&, clang::ASTContext&)::$_0>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&) ()
#12 0x0000000002930b41 in clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>)::{lambda(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)#1}::operator()(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&) const ()
#13 0x00000000030c18cc in std::_Function_handler<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&), clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>)::$_1>::_M_invoke(std::_Any_data const&, clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&) ()
#14 0x00000000030bf069 in clang::dataflow::transferBlock(clang::dataflow::ControlFlowContext const&, std::vector<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState>, std::allocator<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState> > >&, clang::CFGBlock const&, clang::dataflow::Environment const&, clang::dataflow::TypeErasedDataflowAnalysis&, std::function<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) ()
#15 0x00000000030bfaa5 in clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) ()
#16 0x00000000029301b3 in llvm::Expected<std::vector<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> >, std::allocator<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> > > > > clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>) ()
#17 0x000000000292fbe8 in clang::tidy::bugprone::UncheckedOptionalAccessCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) ()
#18 0x00000000022e1572 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::MatchVisitor::visitMatch(clang::ast_matchers::BoundNodes const&) ()
#19 0x0000000002797a1c in clang::ast_matchers::internal::BoundNodesTreeBuilder::visitMatches(clang::ast_matchers::internal::BoundNodesTreeBuilder::Visitor*) ()
#20 0x00000000022e0dc6 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::matchWithFilter(clang::DynTypedNode const&) ()
#21 0x00000000022e3b57 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
#22 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
#23 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
#24 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
#25 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
#26 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
#27 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
#28 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
#29 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
#30 0x00000000022e8791 in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
#31 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
#32 0x00000000022c017a in clang::ast_matchers::MatchFinder::matchAST(clang::ASTContext&) ()
#33 0x000000000370ad3c in clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) ()
#34 0x00000000038ed4bb in clang::ParseAST(clang::Sema&, bool, bool) ()
#35 0x000000000369eda7 in clang::FrontendAction::Execute() ()
#36 0x000000000360d3f6 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) ()
#37 0x00000000027c475c in clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) ()
#38 0x00000000022ad486 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef)::ActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) ()
#39 0x00000000027c44c6 in clang::tooling::ToolInvocation::runInvocation(char const*, clang::driver::Compilation*, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainerOperations>) ()
#40 0x00000000027c360b in clang::tooling::ToolInvocation::run() ()
#41 0x00000000027c5bb1 in clang::tooling::ClangTool::run(clang::tooling::ToolAction*) ()
#42 0x00000000022a90c7 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef) ()
#43 0x0000000001ebc7f2 in clang::tidy::clangTidyMain(int, char const**) ()
#44 0x0000000004c54ba0 in __libc_start_main ()
#45 0x0000000001eb76ae in _start ()
```

Another note is that clang-tidy is CPU-bound.  So we could consider running lintrunner job on 4xlarge if needed.
Pull Request resolved: #115124
Approved by: https://github.com/kit1980, https://github.com/Skylion007, https://github.com/malfet
hyperfraise pushed a commit to hyperfraise/pytorch that referenced this issue Dec 21, 2023
… to hang (pytorch#115124)

Let's see if it helps pytorch#114913

The issues on llvm are at llvm/llvm-project#55530 and llvm/llvm-project#69369.  In my CI test, I saw the following process hanged:

```
/pytorch/pytorch/.lintbin/clang-tidy -p=/pytorch/pytorch/build --extra-arg -I/usr/lib/llvm-11/include/openmp --extra-arg -I/opt/conda/envs/py_3.9/include/python3.9 --extra-arg -I/pytorch/pytorch/third_party/pybind11/include --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward --extra-arg -I/usr/lib/llvm-14/lib/clang/14.0.0/include --extra-arg -I/usr/local/include --extra-arg -I/usr/include/x86_64-linux-gnu --extra-arg -I/usr/include /pytorch/pytorch/torch/csrc/autograd/python_nested_functions_manual.cpp
```

and the core dump matches the description found in llvm/llvm-project#69369 showing the stuck in `clang::tidy::bugprone::UncheckedOptionalAccessCheck::check`:

```
#0  0x00000000030c7420 in clang::dataflow::WatchedLiteralsSolverImpl::updateWatchedLiterals() ()
pytorch#1  0x00000000030c6c2a in clang::dataflow::WatchedLiteralsSolverImpl::solve() && ()
pytorch#2  0x00000000030c6572 in clang::dataflow::WatchedLiteralsSolver::solve(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) ()
pytorch#3  0x00000000030b3bd3 in clang::dataflow::DataflowAnalysisContext::querySolver(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) ()
pytorch#4  0x00000000030b3ca5 in clang::dataflow::DataflowAnalysisContext::flowConditionImplies(clang::dataflow::AtomicBoolValue&, clang::dataflow::BoolValue&) ()
pytorch#5  0x00000000030b1213 in clang::dataflow::(anonymous namespace)::diagnoseUnwrapCall(clang::Expr const*, clang::Expr const*, clang::dataflow::Environment const&) ()
pytorch#6  0x00000000030b1357 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::(anonymous namespace)::buildDiagnoseMatchSwitch(clang::dataflow::UncheckedOptionalAccessModelOptions const&)::$_7>::_M_invoke(std::_Any_data const&, clang::CallExpr const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) ()
pytorch#7  0x00000000030b1292 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::CaseOf<clang::CallExpr>(clang::ast_matchers::internal::Matcher<clang::Stmt>, std::function<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)>) &&::{lambda(clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)pytorch#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) ()
pytorch#8  0x00000000030b1995 in clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)pytorch#1}::operator()(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) const ()
pytorch#9  0x00000000030b170c in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)pytorch#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) ()
pytorch#10 0x00000000030a7c27 in clang::dataflow::UncheckedOptionalAccessDiagnoser::diagnose(clang::ASTContext&, clang::Stmt const*, clang::dataflow::Environment const&) ()
pytorch#11 0x0000000002931286 in std::_Function_handler<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&), clang::tidy::bugprone::analyzeFunction(clang::FunctionDecl const&, clang::ASTContext&)::$_0>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&) ()
pytorch#12 0x0000000002930b41 in clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>)::{lambda(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)pytorch#1}::operator()(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&) const ()
pytorch#13 0x00000000030c18cc in std::_Function_handler<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&), clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>)::$_1>::_M_invoke(std::_Any_data const&, clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&) ()
pytorch#14 0x00000000030bf069 in clang::dataflow::transferBlock(clang::dataflow::ControlFlowContext const&, std::vector<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState>, std::allocator<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState> > >&, clang::CFGBlock const&, clang::dataflow::Environment const&, clang::dataflow::TypeErasedDataflowAnalysis&, std::function<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) ()
pytorch#15 0x00000000030bfaa5 in clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) ()
pytorch#16 0x00000000029301b3 in llvm::Expected<std::vector<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> >, std::allocator<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> > > > > clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>) ()
pytorch#17 0x000000000292fbe8 in clang::tidy::bugprone::UncheckedOptionalAccessCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) ()
pytorch#18 0x00000000022e1572 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::MatchVisitor::visitMatch(clang::ast_matchers::BoundNodes const&) ()
pytorch#19 0x0000000002797a1c in clang::ast_matchers::internal::BoundNodesTreeBuilder::visitMatches(clang::ast_matchers::internal::BoundNodesTreeBuilder::Visitor*) ()
pytorch#20 0x00000000022e0dc6 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::matchWithFilter(clang::DynTypedNode const&) ()
pytorch#21 0x00000000022e3b57 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
pytorch#22 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
pytorch#23 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
pytorch#24 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
pytorch#25 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
pytorch#26 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
pytorch#27 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
pytorch#28 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
pytorch#29 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
pytorch#30 0x00000000022e8791 in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
pytorch#31 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
pytorch#32 0x00000000022c017a in clang::ast_matchers::MatchFinder::matchAST(clang::ASTContext&) ()
pytorch#33 0x000000000370ad3c in clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) ()
pytorch#34 0x00000000038ed4bb in clang::ParseAST(clang::Sema&, bool, bool) ()
pytorch#35 0x000000000369eda7 in clang::FrontendAction::Execute() ()
pytorch#36 0x000000000360d3f6 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) ()
pytorch#37 0x00000000027c475c in clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) ()
pytorch#38 0x00000000022ad486 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef)::ActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) ()
pytorch#39 0x00000000027c44c6 in clang::tooling::ToolInvocation::runInvocation(char const*, clang::driver::Compilation*, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainerOperations>) ()
pytorch#40 0x00000000027c360b in clang::tooling::ToolInvocation::run() ()
pytorch#41 0x00000000027c5bb1 in clang::tooling::ClangTool::run(clang::tooling::ToolAction*) ()
pytorch#42 0x00000000022a90c7 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef) ()
pytorch#43 0x0000000001ebc7f2 in clang::tidy::clangTidyMain(int, char const**) ()
pytorch#44 0x0000000004c54ba0 in __libc_start_main ()
pytorch#45 0x0000000001eb76ae in _start ()
```

Another note is that clang-tidy is CPU-bound.  So we could consider running lintrunner job on 4xlarge if needed.
Pull Request resolved: pytorch#115124
Approved by: https://github.com/kit1980, https://github.com/Skylion007, https://github.com/malfet
hyperfraise pushed a commit to hyperfraise/pytorch that referenced this issue Dec 21, 2023
… to hang (pytorch#115124)

Let's see if it helps pytorch#114913

The issues on llvm are at llvm/llvm-project#55530 and llvm/llvm-project#69369.  In my CI test, I saw the following process hanged:

```
/pytorch/pytorch/.lintbin/clang-tidy -p=/pytorch/pytorch/build --extra-arg -I/usr/lib/llvm-11/include/openmp --extra-arg -I/opt/conda/envs/py_3.9/include/python3.9 --extra-arg -I/pytorch/pytorch/third_party/pybind11/include --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward --extra-arg -I/usr/lib/llvm-14/lib/clang/14.0.0/include --extra-arg -I/usr/local/include --extra-arg -I/usr/include/x86_64-linux-gnu --extra-arg -I/usr/include /pytorch/pytorch/torch/csrc/autograd/python_nested_functions_manual.cpp
```

and the core dump matches the description found in llvm/llvm-project#69369 showing the stuck in `clang::tidy::bugprone::UncheckedOptionalAccessCheck::check`:

```
#0  0x00000000030c7420 in clang::dataflow::WatchedLiteralsSolverImpl::updateWatchedLiterals() ()
pytorch#1  0x00000000030c6c2a in clang::dataflow::WatchedLiteralsSolverImpl::solve() && ()
pytorch#2  0x00000000030c6572 in clang::dataflow::WatchedLiteralsSolver::solve(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) ()
pytorch#3  0x00000000030b3bd3 in clang::dataflow::DataflowAnalysisContext::querySolver(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) ()
pytorch#4  0x00000000030b3ca5 in clang::dataflow::DataflowAnalysisContext::flowConditionImplies(clang::dataflow::AtomicBoolValue&, clang::dataflow::BoolValue&) ()
pytorch#5  0x00000000030b1213 in clang::dataflow::(anonymous namespace)::diagnoseUnwrapCall(clang::Expr const*, clang::Expr const*, clang::dataflow::Environment const&) ()
pytorch#6  0x00000000030b1357 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::(anonymous namespace)::buildDiagnoseMatchSwitch(clang::dataflow::UncheckedOptionalAccessModelOptions const&)::$_7>::_M_invoke(std::_Any_data const&, clang::CallExpr const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) ()
pytorch#7  0x00000000030b1292 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::CaseOf<clang::CallExpr>(clang::ast_matchers::internal::Matcher<clang::Stmt>, std::function<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)>) &&::{lambda(clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)pytorch#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) ()
pytorch#8  0x00000000030b1995 in clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)pytorch#1}::operator()(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) const ()
pytorch#9  0x00000000030b170c in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)pytorch#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) ()
pytorch#10 0x00000000030a7c27 in clang::dataflow::UncheckedOptionalAccessDiagnoser::diagnose(clang::ASTContext&, clang::Stmt const*, clang::dataflow::Environment const&) ()
pytorch#11 0x0000000002931286 in std::_Function_handler<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&), clang::tidy::bugprone::analyzeFunction(clang::FunctionDecl const&, clang::ASTContext&)::$_0>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&) ()
pytorch#12 0x0000000002930b41 in clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>)::{lambda(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)pytorch#1}::operator()(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&) const ()
pytorch#13 0x00000000030c18cc in std::_Function_handler<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&), clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>)::$_1>::_M_invoke(std::_Any_data const&, clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&) ()
pytorch#14 0x00000000030bf069 in clang::dataflow::transferBlock(clang::dataflow::ControlFlowContext const&, std::vector<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState>, std::allocator<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState> > >&, clang::CFGBlock const&, clang::dataflow::Environment const&, clang::dataflow::TypeErasedDataflowAnalysis&, std::function<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) ()
pytorch#15 0x00000000030bfaa5 in clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) ()
pytorch#16 0x00000000029301b3 in llvm::Expected<std::vector<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> >, std::allocator<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> > > > > clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>) ()
pytorch#17 0x000000000292fbe8 in clang::tidy::bugprone::UncheckedOptionalAccessCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) ()
pytorch#18 0x00000000022e1572 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::MatchVisitor::visitMatch(clang::ast_matchers::BoundNodes const&) ()
pytorch#19 0x0000000002797a1c in clang::ast_matchers::internal::BoundNodesTreeBuilder::visitMatches(clang::ast_matchers::internal::BoundNodesTreeBuilder::Visitor*) ()
pytorch#20 0x00000000022e0dc6 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::matchWithFilter(clang::DynTypedNode const&) ()
pytorch#21 0x00000000022e3b57 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
pytorch#22 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
pytorch#23 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
pytorch#24 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
pytorch#25 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
pytorch#26 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
pytorch#27 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
pytorch#28 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
pytorch#29 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
pytorch#30 0x00000000022e8791 in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) ()
pytorch#31 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) ()
pytorch#32 0x00000000022c017a in clang::ast_matchers::MatchFinder::matchAST(clang::ASTContext&) ()
pytorch#33 0x000000000370ad3c in clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) ()
pytorch#34 0x00000000038ed4bb in clang::ParseAST(clang::Sema&, bool, bool) ()
pytorch#35 0x000000000369eda7 in clang::FrontendAction::Execute() ()
pytorch#36 0x000000000360d3f6 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) ()
pytorch#37 0x00000000027c475c in clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) ()
pytorch#38 0x00000000022ad486 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef)::ActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) ()
pytorch#39 0x00000000027c44c6 in clang::tooling::ToolInvocation::runInvocation(char const*, clang::driver::Compilation*, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainerOperations>) ()
pytorch#40 0x00000000027c360b in clang::tooling::ToolInvocation::run() ()
pytorch#41 0x00000000027c5bb1 in clang::tooling::ClangTool::run(clang::tooling::ToolAction*) ()
pytorch#42 0x00000000022a90c7 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef) ()
pytorch#43 0x0000000001ebc7f2 in clang::tidy::clangTidyMain(int, char const**) ()
pytorch#44 0x0000000004c54ba0 in __libc_start_main ()
pytorch#45 0x0000000001eb76ae in _start ()
```

Another note is that clang-tidy is CPU-bound.  So we could consider running lintrunner job on 4xlarge if needed.
Pull Request resolved: pytorch#115124
Approved by: https://github.com/kit1980, https://github.com/Skylion007, https://github.com/malfet
malfet added a commit that referenced this issue Jan 7, 2024
As [`newFunctionWithName:`](https://developer.apple.com/documentation/metal/mtllibrary/1515524-newfunctionwithname) does not accept error argument, do not attempt to print it as it'll be guaranteed `nil` at that point, that results in a classic null pointer dereference, when `TORCH_CHECK` will attempt to construct `std::string` from it.
See below backtrace for example:
```
 thread #1, queue = 'metal gpu stream', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000018a316dc4 libsystem_platform.dylib`_platform_strlen + 4
    frame #1: 0x00000001471011bc libtorch_cpu.dylib`std::__1::__constexpr_strlen[abi:v160006](__str=0x0000000000000000) at cstring:114:10
    frame #2: 0x0000000147100c24 libtorch_cpu.dylib`std::__1::char_traits<char>::length(__s=0x0000000000000000) at char_traits.h:220:12
  * frame #3: 0x0000000147100bf0 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& std::__1::operator<<[abi:v160006]<std::__1::char_traits<char>>(__os=0x000000016fdfb3a0, __str=0x0000000000000000) at ostream:901:57
    frame #4: 0x0000000147100bb4 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb5d0) at StringUtil.h:55:6
    frame #5: 0x00000001471007ac libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10
    frame #6: 0x0000000147101444 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10
    frame #7: 0x0000000147101404 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10
    frame #8: 0x000000014710137c libtorch_cpu.dylib`c10::detail::_str_wrapper<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, char const*, char const* const&>::call(args=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:75:5
    frame #9: 0x0000000147101310 libtorch_cpu.dylib`decltype(auto) c10::str<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>(args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at StringUtil.h:111:10
    frame #10: 0x0000000147100210 libtorch_cpu.dylib`decltype(auto) c10::detail::torchCheckMsgImpl<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>((null)="Expected indexFunction to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)", args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at Exception.h:453:10
    frame #11: 0x00000001470fffe8 libtorch_cpu.dylib`at::mps::MPSDevice::metalIndexingPSO(this=0x0000600000381670, kernel="index_select_32bit_idx32") at MPSDevice.mm:62:3
```
pytorchmergebot pushed a commit that referenced this issue Jan 7, 2024
As [`newFunctionWithName:`](https://developer.apple.com/documentation/metal/mtllibrary/1515524-newfunctionwithname) does not accept error argument, do not attempt to print it as it'll be guaranteed `nil` at that point, that results in a classic null pointer dereference, when `TORCH_CHECK` will attempt to construct `std::string` from it. See below backtrace for example:
```
 thread #1, queue = 'metal gpu stream', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000018a316dc4 libsystem_platform.dylib`_platform_strlen + 4
    frame #1: 0x00000001471011bc libtorch_cpu.dylib`std::__1::__constexpr_strlen[abi:v160006](__str=0x0000000000000000) at cstring:114:10
    frame #2: 0x0000000147100c24 libtorch_cpu.dylib`std::__1::char_traits<char>::length(__s=0x0000000000000000) at char_traits.h:220:12
  * frame #3: 0x0000000147100bf0 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& std::__1::operator<<[abi:v160006]<std::__1::char_traits<char>>(__os=0x000000016fdfb3a0, __str=0x0000000000000000) at ostream:901:57
    frame #4: 0x0000000147100bb4 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb5d0) at StringUtil.h:55:6
    frame #5: 0x00000001471007ac libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10
    frame #6: 0x0000000147101444 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10
    frame #7: 0x0000000147101404 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10
    frame #8: 0x000000014710137c libtorch_cpu.dylib`c10::detail::_str_wrapper<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, char const*, char const* const&>::call(args=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:75:5
    frame #9: 0x0000000147101310 libtorch_cpu.dylib`decltype(auto) c10::str<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>(args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at StringUtil.h:111:10
    frame #10: 0x0000000147100210 libtorch_cpu.dylib`decltype(auto) c10::detail::torchCheckMsgImpl<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>((null)="Expected indexFunction to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)", args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at Exception.h:453:10
    frame #11: 0x00000001470fffe8 libtorch_cpu.dylib`at::mps::MPSDevice::metalIndexingPSO(this=0x0000600000381670, kernel="index_select_32bit_idx32") at MPSDevice.mm:62:3
```

This was introduced by #99855 that replaced `newFunctionWithName:constantValues:error:` with `newFunctionWithName:`
Pull Request resolved: #116938
Approved by: https://github.com/Skylion007
pytorch-bot bot pushed a commit that referenced this issue Jan 9, 2024
pytorch-bot bot pushed a commit that referenced this issue Feb 1, 2024
pytorchmergebot pushed a commit that referenced this issue Feb 2, 2024
user may not know which line of code called collectives in a big code base. When debugging, we can print python-cpp stacktrace in case user call ``ProcessGroup.reduce`` instead of ``torch.distributed.reduce``

```
LOG(INFO) << "ProcessGroupNCCL::_allgather_base stacktrace: "
                       << get_python_cpp_trace();
```

output (using _allgather_base as an example): one example python-part trace is ``all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838``
```
ProcessGroupNCCL::_allgather_base stacktrace: #0 torch::unwind::unwind() from ??:0
#1 torch::CapturedTraceback::gather(bool, bool, bool) from ??:0
#2 c10d::get_python_cpp_trace[abi:cxx11]() from :0
#3 c10d::ProcessGroupNCCL::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from ??:0
#4 c10d::ops::(anonymous namespace)::_allgather_base_CUDA(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long) from Ops.cpp:0
#5 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > > (*)(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long), std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > >, c10::guts::typelist::typelist<at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from :0
#6 torch::autograd::basicAutogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0
#7 c10d::ProcessGroup::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from :0
#8 pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}&&, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (*)(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from :0
#9 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
#10 cfunction_call from /usr/local/src/conda/python-3.10.12/Objects/methodobject.c:543
#11 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215
#12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112
#13 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#14 all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838
#15 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#16 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945
#17 wrapper from /data/users/weif/pytorch/torch/distributed/c10d_logger.py:75
#18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#20 _all_gather_flat_param from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1399
#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#23 unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1308
#24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#26 _unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:332
#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#29 _pre_forward_unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:448
#30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#32 _pre_forward from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:413
#33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#35 forward from /data/users/weif/pytorch/torch/distributed/fsdp/fully_sharded_data_parallel.py:839
#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#37 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945
#38 _call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1520
#39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#40 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945
#41 _wrapped_call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1511
#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#43 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.12/Objects/call.c:431
#44 slot_tp_call from /usr/local/src/conda/python-3.10.12/Objects/typeobject.c:7494
#45 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215
#46 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112
#47 inner from /data/users/weif/pytorch/run_fsdp.py:72
#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#50 run from /data/users/weif/pytorch/run_fsdp.py:76
#51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#53 main from /data/users/weif/pytorch/run_fsdp.py:133
#54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#56 <module> from /data/users/weif/pytorch/run_fsdp.py:137
#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#58 PyEval_EvalCode from /usr/local/src/conda/python-3.10.12/Python/ceval.c:1134
#59 run_eval_code_obj from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1291
#60 run_mod from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1312
#61 pyrun_file from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1208
#62 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:456
#63 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:90
#64 pymain_run_file_obj from /usr/local/src/conda/python-3.10.12/Modules/main.c:357
#65 Py_BytesMain from /usr/local/src/conda/python-3.10.12/Modules/main.c:1090
#66 __libc_start_call_main from ??:0
#67 <unwind unsupported> from ??:0
```

Pull Request resolved: #118924
Approved by: https://github.com/kwen2501
pytorch-bot bot pushed a commit that referenced this issue Feb 8, 2024
user may not know which line of code called collectives in a big code base. When debugging, we can print python-cpp stacktrace in case user call ``ProcessGroup.reduce`` instead of ``torch.distributed.reduce``

```
LOG(INFO) << "ProcessGroupNCCL::_allgather_base stacktrace: "
                       << get_python_cpp_trace();
```

output (using _allgather_base as an example): one example python-part trace is ``all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838``
```
ProcessGroupNCCL::_allgather_base stacktrace: #0 torch::unwind::unwind() from ??:0
#1 torch::CapturedTraceback::gather(bool, bool, bool) from ??:0
#2 c10d::get_python_cpp_trace[abi:cxx11]() from :0
#3 c10d::ProcessGroupNCCL::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from ??:0
#4 c10d::ops::(anonymous namespace)::_allgather_base_CUDA(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long) from Ops.cpp:0
#5 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > > (*)(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long), std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > >, c10::guts::typelist::typelist<at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from :0
#6 torch::autograd::basicAutogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0
#7 c10d::ProcessGroup::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from :0
#8 pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}&&, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (*)(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from :0
#9 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
#10 cfunction_call from /usr/local/src/conda/python-3.10.12/Objects/methodobject.c:543
#11 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215
#12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112
#13 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#14 all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838
#15 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#16 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945
#17 wrapper from /data/users/weif/pytorch/torch/distributed/c10d_logger.py:75
#18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#20 _all_gather_flat_param from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1399
#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#23 unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1308
#24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#26 _unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:332
#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#29 _pre_forward_unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:448
#30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#32 _pre_forward from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:413
#33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#35 forward from /data/users/weif/pytorch/torch/distributed/fsdp/fully_sharded_data_parallel.py:839
#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#37 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945
#38 _call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1520
#39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#40 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945
#41 _wrapped_call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1511
#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#43 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.12/Objects/call.c:431
#44 slot_tp_call from /usr/local/src/conda/python-3.10.12/Objects/typeobject.c:7494
#45 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215
#46 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112
#47 inner from /data/users/weif/pytorch/run_fsdp.py:72
#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#50 run from /data/users/weif/pytorch/run_fsdp.py:76
#51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#53 main from /data/users/weif/pytorch/run_fsdp.py:133
#54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114
#56 <module> from /data/users/weif/pytorch/run_fsdp.py:137
#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46
#58 PyEval_EvalCode from /usr/local/src/conda/python-3.10.12/Python/ceval.c:1134
#59 run_eval_code_obj from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1291
#60 run_mod from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1312
#61 pyrun_file from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1208
#62 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:456
#63 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:90
#64 pymain_run_file_obj from /usr/local/src/conda/python-3.10.12/Modules/main.c:357
#65 Py_BytesMain from /usr/local/src/conda/python-3.10.12/Modules/main.c:1090
#66 __libc_start_call_main from ??:0
#67 <unwind unsupported> from ??:0
```

Pull Request resolved: #118924
Approved by: https://github.com/kwen2501
chsivic pushed a commit to chsivic/pytorch that referenced this issue Apr 16, 2024
Summary:
The caffe2/utils threadpool impl used to set thread name, since D8266344
https://www.internalfb.com/code/fbsource/[3ba3d30d6841]/xplat/caffe2/caffe2/utils/threadpool/WorkersPool.h?lines=271-273

But now we don't use this caffe2's own impl (since D21232894?), but use the third-party threadpool instead, which doesn't set thread name

This diff is to achieve same effect as D8266344, such that we can tell which threads are pytorch threads from perfetto trace.

The idea comes from https://stackoverflow.com/questions/32375034/how-to-obtain-thread-name-in-android-ndk and folly ThreadName
https://www.internalfb.com/code/fbsource/[3ba3d30d6841]/xplat/folly/system/ThreadName.cpp?lines=30-41

I'm not sure if this is the right place to put this change.


BTW, Pytorch thread pool caller thread is worker #0

https://www.internalfb.com/code/fbsource/[3ba3d30d6841281c140db1c8bd2f85ede310a01b]/xplat/third-party/pthreadpool/pthreadpool/src/pthreads.c?lines=289-292


Test Plan:
## Before

```
--num_cpu_threads 2 --num_pytorch_threads -1     # default to size equal to 4 cpu cores
mos:/ $ ps -T -p `pidof transcribe_bin`
USER            PID   TID   PPID     VSZ    RSS WCHAN            ADDR S CMD
shell          8985  8985   8983  118576  47688 hrtimer_n+          0 S transcribe_bin        <-- main thread
shell          8985  8986   8983  118576  47688 0                   0 R transcribe_bin         <-- pytorch thread pytorch#1
shell          8985  8987   8983  118576  47688 0                   0 R transcribe_bin         <-- pytorch thread pytorch#2
shell          8985  8988   8983  118576  47688 0                   0 R transcribe_bin         <-- pytorch thread pytorch#3
shell          8985  8989   8983  118576  47688 0                   0 R CPUThreadPool0
shell          8985  8990   8983  118576  47688 futex_wai+          0 S CPUThreadPool1
shell          8985  8991   8983  118576  47688 ep_poll             0 S IOThreadPool0
shell          8985  8992   8983  118576  47688 futex_wai+          0 S FutureTimekeepr
shell          8985  8993   8983  118576  47688 pipe_wait           0 S snapshot_thread
shell          8985  8994   8983  118576  47688 hrtimer_n+          0 S snapshot_thread
shell          8985  8997   8983  118576  47688 futex_wai+          0 S AsyncDataQueue
```

## After
```
--num_cpu_threads 2 --num_pytorch_threads -1
mos:/ $ ps -T -p `pidof transcribe_bin`
USER            PID   TID   PPID     VSZ    RSS WCHAN            ADDR S CMD
shell         11901 11901  11899  118128  40748 futex_wai+          0 S transcribe_bin         <-- main thread serves as pytorch thread #0
shell         11901 11902  11899  118132  40748 futex_wai+          0 S c10pthreadpool         <-- pytorch thread pytorch#1
shell         11901 11903  11899  118132  40748 futex_wai+          0 S c10pthreadpool         <-- pytorch thread pytorch#2
shell         11901 11904  11899  118132  40748 futex_wai+          0 S c10pthreadpool         <-- pytorch thread pytorch#3
shell         11901 11905  11899  118152  40752 futex_wai+          0 S CPUThreadPool0
shell         11901 11906  11899  118148  40752 0                   0 R CPUThreadPool1
shell         11901 11907  11899  118148  40756 ep_poll             0 S IOThreadPool0
shell         11901 11908  11899  118152  40756 futex_wai+          0 S FutureTimekeepr
shell         11901 11909  11899  118164  40756 pipe_wait           0 S snapshot_thread
shell         11901 11910  11899  118168  40756 hrtimer_n+          0 S snapshot_thread
shell         11901 11913  11899  118160  40760 futex_wai+          0 S AsyncDataQueue
```

Example Perfetto trace:

 {F1483727859} 
Looks like the pytorch thread pool was originally created with 4 thread during ASR loading (`loadTunaFactory`), and later recreated with 3 threads during inference.

Differential Revision: D55990584

Pulled By: chsivic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
todo Not as important as medium or high priority tasks, but we will work on these.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants