Skip to content

Commit

Permalink
[BE] Speedup register schema compilation
Browse files Browse the repository at this point in the history
For some reason, inlining initializer list into a std::vector takes a lot of time using clang-15.
But considering that there are only dozen or so distrinct tags, creating them once and pass as def argument should not affect runtime speed at all, but this significantly improves compilation time.
On Mac M1 it reduces time needed to compiler RegisterSchema.cpp from 50 to 3 seconds.

Before
```
% time /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -ftime-report -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/Users/nshulga/git/pytorch/pytorch/build/aten/src -I/Users/nshulga/git/pytorch/pytorch/aten/src -I/Users/nshulga/git/pytorch/pytorch/build -I/Users/nshulga/git/pytorch/pytorch -I/Users/nshulga/git/pytorch/pytorch/cmake/../third_party/benchmark/include -I/Users/nshulga/git/pytorch/pytorch/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/build/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/build/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include -I/Users/nshulga/git/pytorch/pytorch/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/../aten/src -I/Users/nshulga/git/pytorch/pytorch/torch/csrc -I/Users/nshulga/git/pytorch/pytorch/third_party/miniz-2.1.0 -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/include -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/src -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/FXdiv/include -I/Users/nshulga/git/pytorch/pytorch/c10/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/pthreadpool/include -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/deps/clog/include -I/Users/nshulga/git/pytorch/pytorch/third_party/NNPACK/include -I/Users/nshulga/git/pytorch/pytorch/third_party/FP16/include -I/Users/nshulga/git/pytorch/pytorch/third_party/fmt/include -I/Users/nshulga/git/pytorch/pytorch/third_party/flatbuffers/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googletest/include -isystem /Users/nshulga/git/pytorch/pytorch/third_party/protobuf/src -isystem /Users/nshulga/git/pytorch/pytorch/third_party/XNNPACK/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/eigen -isystem /Users/nshulga/git/pytorch/pytorch/build/include  -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wsuggest-override -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wno-error=inconsistent-missing-override -Wno-error=inconsistent-missing-destructor-override -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -faligned-new -Werror -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces -O3 -DNDEBUG -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk -fPIC -D__NEON__ -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-unused-function -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-strict-overflow -Wno-strict-aliasing -fvisibility=hidden -O2 -Wmissing-prototypes -Werror=missing-prototypes -Xpreprocessor -fopenmp -I/Users/nshulga/miniforge3/include -std=gnu++17 -Wno-missing-prototypes -Wno-error=missing-prototypes -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSchema.cpp.o -c /Users/nshulga/git/pytorch/pytorch/build/aten/src/ATen/RegisterSchema.cpp
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 131.8054 seconds (132.5540 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
  43.6364 ( 33.2%)   0.0919 ( 30.1%)  43.7282 ( 33.2%)  43.9658 ( 33.2%)  536345245380  ModuleInlinerWrapperPass
  43.6291 ( 33.2%)   0.0891 ( 29.2%)  43.7182 ( 33.2%)  43.9549 ( 33.2%)  536264096394  DevirtSCCRepeatedPass
  42.3766 ( 32.2%)   0.0185 (  6.1%)  42.3951 ( 32.2%)  42.6198 ( 32.2%)  523040901767  GVNPass
   0.4085 (  0.3%)   0.0040 (  1.3%)   0.4125 (  0.3%)   0.4195 (  0.3%)  4106085945  SimplifyCFGPass
   0.3611 (  0.3%)   0.0115 (  3.8%)   0.3726 (  0.3%)   0.3779 (  0.3%)  4864696407  InstCombinePass
   0.1607 (  0.1%)   0.0088 (  2.9%)   0.1695 (  0.1%)   0.1720 (  0.1%)  1780986175  InlinerPass
   0.0865 (  0.1%)   0.0024 (  0.8%)   0.0889 (  0.1%)   0.0914 (  0.1%)  1489982961  SROAPass
   0.0750 (  0.1%)   0.0013 (  0.4%)   0.0763 (  0.1%)   0.0764 (  0.1%)  620016338  SCCPPass
   0.0661 (  0.1%)   0.0040 (  1.3%)   0.0701 (  0.1%)   0.0735 (  0.1%)  592027163  EarlyCSEPass
   0.0554 (  0.0%)   0.0026 (  0.8%)   0.0580 (  0.0%)   0.0604 (  0.0%)  586567838  SLPVectorizerPass
   0.0468 (  0.0%)   0.0081 (  2.7%)   0.0549 (  0.0%)   0.0571 (  0.0%)  486049135  BlockFrequencyAnalysis
   0.0364 (  0.0%)   0.0059 (  1.9%)   0.0424 (  0.0%)   0.0437 (  0.0%)  366002196  BranchProbabilityAnalysis
   0.0399 (  0.0%)   0.0003 (  0.1%)   0.0401 (  0.0%)   0.0404 (  0.0%)  324932876  OpenMPOptCGSCCPass
   0.0361 (  0.0%)   0.0022 (  0.7%)   0.0383 (  0.0%)   0.0385 (  0.0%)  289493455  MemorySSAAnalysis
   0.0341 (  0.0%)   0.0017 (  0.5%)   0.0358 (  0.0%)   0.0360 (  0.0%)  202039544  ADCEPass
   0.0323 (  0.0%)   0.0023 (  0.7%)   0.0346 (  0.0%)   0.0351 (  0.0%)  279814836  CorrelatedValuePropagationPass
   0.0318 (  0.0%)   0.0005 (  0.2%)   0.0324 (  0.0%)   0.0334 (  0.0%)  302116539  DSEPass
   0.0251 (  0.0%)   0.0032 (  1.0%)   0.0283 (  0.0%)   0.0290 (  0.0%)  268768995  DominatorTreeAnalysis
   0.0275 (  0.0%)   0.0012 (  0.4%)   0.0286 (  0.0%)   0.0289 (  0.0%)  335916941  HotColdSplittingPass
   0.0251 (  0.0%)   0.0031 (  1.0%)   0.0282 (  0.0%)   0.0286 (  0.0%)  222934147  CGProfilePass
   0.0221 (  0.0%)   0.0009 (  0.3%)   0.0230 (  0.0%)   0.0255 (  0.0%)   79855412  GlobalOptPass
   0.0184 (  0.0%)   0.0019 (  0.6%)   0.0203 (  0.0%)   0.0209 (  0.0%)  205236334  JumpThreadingPass
   0.0185 (  0.0%)   0.0021 (  0.7%)   0.0206 (  0.0%)   0.0208 (  0.0%)  175318325  LoopAnalysis
   0.0164 (  0.0%)   0.0030 (  1.0%)   0.0194 (  0.0%)   0.0199 (  0.0%)  163560340  PostOrderFunctionAttrsPass
   0.0188 (  0.0%)   0.0004 (  0.1%)   0.0193 (  0.0%)   0.0194 (  0.0%)  103197563  TailCallElimPass
   0.0176 (  0.0%)   0.0015 (  0.5%)   0.0190 (  0.0%)   0.0192 (  0.0%)  130956806  MemCpyOptPass
   0.0116 (  0.0%)   0.0074 (  2.4%)   0.0190 (  0.0%)   0.0191 (  0.0%)  221717778  AAManager
   0.0163 (  0.0%)   0.0013 (  0.4%)   0.0176 (  0.0%)   0.0178 (  0.0%)  167126689  PostDominatorTreeAnalysis
   0.0155 (  0.0%)   0.0003 (  0.1%)   0.0158 (  0.0%)   0.0160 (  0.0%)  162157524  CalledValuePropagationPass
   0.0132 (  0.0%)   0.0014 (  0.5%)   0.0146 (  0.0%)   0.0159 (  0.0%)   87781235  IPSCCPPass
   0.0127 (  0.0%)   0.0008 (  0.3%)   0.0135 (  0.0%)   0.0140 (  0.0%)   91128714  ReassociatePass
   0.0101 (  0.0%)   0.0009 (  0.3%)   0.0110 (  0.0%)   0.0111 (  0.0%)   73124251  BDCEPass
   0.0072 (  0.0%)   0.0004 (  0.1%)   0.0077 (  0.0%)   0.0089 (  0.0%)   60948332  LoopIdiomRecognizePass
   0.0064 (  0.0%)   0.0014 (  0.5%)   0.0079 (  0.0%)   0.0088 (  0.0%)   80334128  LoopVectorizePass
   0.0065 (  0.0%)   0.0022 (  0.7%)   0.0087 (  0.0%)   0.0088 (  0.0%)  105525946  BasicAA
   0.0068 (  0.0%)   0.0014 (  0.5%)   0.0082 (  0.0%)   0.0083 (  0.0%)   86368700  LoopSimplifyPass
   0.0071 (  0.0%)   0.0005 (  0.2%)   0.0075 (  0.0%)   0.0077 (  0.0%)   87195315  LICMPass
   0.0052 (  0.0%)   0.0024 (  0.8%)   0.0076 (  0.0%)   0.0075 (  0.0%)   68859408  LowerMatrixIntrinsicsPass
   0.0064 (  0.0%)   0.0003 (  0.1%)   0.0067 (  0.0%)   0.0067 (  0.0%)   72021939  LoopDeletionPass
   0.0012 (  0.0%)   0.0011 (  0.4%)   0.0023 (  0.0%)   0.0065 (  0.0%)   28855092  TargetIRAnalysis
   0.0052 (  0.0%)   0.0006 (  0.2%)   0.0058 (  0.0%)   0.0058 (  0.0%)   38197861  Float2IntPass
   0.0047 (  0.0%)   0.0009 (  0.3%)   0.0056 (  0.0%)   0.0056 (  0.0%)   63722846  LoopSinkPass
   0.0055 (  0.0%)   0.0001 (  0.0%)   0.0056 (  0.0%)   0.0056 (  0.0%)   61106373  LoopUnrollPass
   0.0051 (  0.0%)   0.0002 (  0.1%)   0.0053 (  0.0%)   0.0055 (  0.0%)   60361028  VectorCombinePass
   0.0044 (  0.0%)   0.0002 (  0.1%)   0.0046 (  0.0%)   0.0049 (  0.0%)   22674564  CallGraphAnalysis
   0.0046 (  0.0%)   0.0001 (  0.0%)   0.0047 (  0.0%)   0.0049 (  0.0%)   12102487  GlobalDCEPass
   0.0043 (  0.0%)   0.0000 (  0.0%)   0.0043 (  0.0%)   0.0043 (  0.0%)   48372244  InstSimplifyPass
   0.0027 (  0.0%)   0.0008 (  0.3%)   0.0035 (  0.0%)   0.0037 (  0.0%)   45045562  ScalarEvolutionAnalysis
   0.0030 (  0.0%)   0.0003 (  0.1%)   0.0033 (  0.0%)   0.0036 (  0.0%)   29145265  IndVarSimplifyPass
   0.0025 (  0.0%)   0.0002 (  0.1%)   0.0027 (  0.0%)   0.0032 (  0.0%)   16671955  RequireAnalysisPass<llvm::GlobalsAA, llvm::Module, llvm::AnalysisManager<Module>>
   0.0025 (  0.0%)   0.0002 (  0.1%)   0.0027 (  0.0%)   0.0032 (  0.0%)   16651504  GlobalsAA
   0.0006 (  0.0%)   0.0005 (  0.2%)   0.0011 (  0.0%)   0.0029 (  0.0%)    8186724  OpenMPOptPass
   0.0027 (  0.0%)   0.0001 (  0.0%)   0.0028 (  0.0%)   0.0028 (  0.0%)   12998003  ReversePostOrderFunctionAttrsPass
   0.0019 (  0.0%)   0.0006 (  0.2%)   0.0025 (  0.0%)   0.0028 (  0.0%)   11967259  LowerExpectIntrinsicPass
   0.0024 (  0.0%)   0.0003 (  0.1%)   0.0028 (  0.0%)   0.0028 (  0.0%)   19995960  LowerConstantIntrinsicsPass
   0.0022 (  0.0%)   0.0001 (  0.0%)   0.0023 (  0.0%)   0.0023 (  0.0%)   19367864  LibCallsShrinkWrapPass
   0.0019 (  0.0%)   0.0001 (  0.0%)   0.0020 (  0.0%)   0.0021 (  0.0%)   24061124  LoopLoadEliminationPass
   0.0011 (  0.0%)   0.0004 (  0.1%)   0.0016 (  0.0%)   0.0018 (  0.0%)   35505583  LCSSAPass
   0.0009 (  0.0%)   0.0008 (  0.3%)   0.0016 (  0.0%)   0.0016 (  0.0%)   22693970  MemoryDependenceAnalysis
   0.0013 (  0.0%)   0.0001 (  0.0%)   0.0014 (  0.0%)   0.0016 (  0.0%)    9251166  InjectTLIMappings
   0.0010 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)   0.0011 (  0.0%)    2782049  AlwaysInlinerPass
   0.0010 (  0.0%)   0.0000 (  0.0%)   0.0010 (  0.0%)   0.0010 (  0.0%)    5709095  DivRemPairsPass
   0.0009 (  0.0%)   0.0001 (  0.0%)   0.0010 (  0.0%)   0.0010 (  0.0%)   12138843  MergedLoadStoreMotionPass
   0.0007 (  0.0%)   0.0001 (  0.0%)   0.0009 (  0.0%)   0.0010 (  0.0%)   12095182  LoopFullUnrollPass
   0.0004 (  0.0%)   0.0002 (  0.1%)   0.0007 (  0.0%)   0.0009 (  0.0%)   15168801  LoopRotatePass
   0.0005 (  0.0%)   0.0002 (  0.1%)   0.0007 (  0.0%)   0.0008 (  0.0%)   18714381  TargetLibraryAnalysis
   0.0006 (  0.0%)   0.0001 (  0.0%)   0.0007 (  0.0%)   0.0007 (  0.0%)    9991748  LoopInstSimplifyPass
   0.0004 (  0.0%)   0.0004 (  0.1%)   0.0007 (  0.0%)   0.0007 (  0.0%)   10149528  LoopDistributePass
   0.0003 (  0.0%)   0.0002 (  0.1%)   0.0004 (  0.0%)   0.0007 (  0.0%)    1096854  DeadArgumentEliminationPass
   0.0006 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)    5367319  RecomputeGlobalsAAPass
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0006 (  0.0%)    8937323  PromotePass
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0006 (  0.0%)    9579538  SimpleLoopUnswitchPass
   0.0004 (  0.0%)   0.0002 (  0.1%)   0.0006 (  0.0%)   0.0006 (  0.0%)   16129558  DemandedBitsAnalysis
   0.0005 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)   11233413  FunctionAnalysisManagerCGSCCProxy
   0.0002 (  0.0%)   0.0002 (  0.1%)   0.0004 (  0.0%)   0.0006 (  0.0%)   11872487  RequireAnalysisPass<llvm::OptimizationRemarkEmitterAnalysis, llvm::Function, llvm::AnalysisManager<Function>>
   0.0003 (  0.0%)   0.0002 (  0.1%)   0.0005 (  0.0%)   0.0006 (  0.0%)   16910811  LazyValueAnalysis
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)    9314494  LoopSimplifyCFGPass
   0.0003 (  0.0%)   0.0002 (  0.1%)   0.0005 (  0.0%)   0.0005 (  0.0%)   13019354  AssumptionAnalysis
   0.0002 (  0.0%)   0.0002 (  0.1%)   0.0004 (  0.0%)   0.0005 (  0.0%)   12099715  OptimizationRemarkEmitterAnalysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)    8403351  InvalidateAnalysisPass<llvm::AAManager>
   0.0002 (  0.0%)   0.0002 (  0.1%)   0.0004 (  0.0%)   0.0004 (  0.0%)   12032802  TypeBasedAA
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0004 (  0.0%)   0.0004 (  0.0%)   12031548  ScopedNoAliasAA
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)    8582619  CoroSplitPass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0003 (  0.0%)    1358379  InferFunctionAttrsPass
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8383272  CoroElidePass
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8467353  PhiValuesAnalysis
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    4092920  ConstantMergePass
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8279547  SpeculativeExecutionPass
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8368351  ShouldNotRunFunctionPassesAnalysis
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    1312838  LazyCallGraphAnalysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    4855087  WarnMissedTransformationsPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)     130368  CoroEarlyPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3625888  AlignmentFromAssumptionsPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3704343  LoopAccessAnalysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)     111237  Annotation2MetadataPass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3574289  AnnotationRemarksPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3611080  InvalidateAnalysisPass<llvm::ShouldNotRunFunctionPassesAnalysis>
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      47163  EliminateAvailableExternallyPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      17908  CoroCleanupPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      14976  RelLookupTableConverterPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      13763  ProfileSummaryAnalysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      12483  RequireAnalysisPass<llvm::ProfileSummaryAnalysis, llvm::Module, llvm::AnalysisManager<Module>>
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      12411  ForceFunctionAttrsPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      12678  InlineAdvisorAnalysis
  131.5002 (100.0%)   0.3052 (100.0%)  131.8054 (100.0%)  132.5540 (100.0%)  1615901391352  Total

===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
  46.0915 ( 99.8%)   0.7497 ( 98.5%)  46.8412 ( 99.8%)  47.1692 ( 99.7%)  567401093834  Code Generation Time
   0.0923 (  0.2%)   0.0116 (  1.5%)   0.1039 (  0.2%)   0.1258 (  0.3%)  1088790744  LLVM IR Generation Time
  46.1838 (100.0%)   0.7613 (100.0%)  46.9451 (100.0%)  47.2950 (100.0%)  568489884578  Total

===-------------------------------------------------------------------------===
                              Register Allocation
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0021 seconds (0.0021 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.0020 (100.0%)   0.0001 (100.0%)   0.0021 (100.0%)   0.0021 (100.0%)   12292396  Seed Live Regs
   0.0020 (100.0%)   0.0001 (100.0%)   0.0021 (100.0%)   0.0021 (100.0%)   12292396  Total

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.4432 seconds (0.4524 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.1275 ( 32.3%)   0.0056 ( 11.6%)   0.1331 ( 30.0%)   0.1363 ( 30.1%)  1438634389  DAG Combining 1
   0.0702 ( 17.8%)   0.0047 (  9.7%)   0.0749 ( 16.9%)   0.0751 ( 16.6%)  1027837820  DAG Combining 2
   0.0548 ( 13.9%)   0.0054 ( 11.1%)   0.0601 ( 13.6%)   0.0636 ( 14.1%)  791659261  Instruction Selection
   0.0438 ( 11.1%)   0.0060 ( 12.5%)   0.0499 ( 11.3%)   0.0509 ( 11.2%)  712994861  Instruction Scheduling
   0.0345 (  8.7%)   0.0073 ( 15.1%)   0.0418 (  9.4%)   0.0420 (  9.3%)  654102488  Instruction Creation
   0.0228 (  5.8%)   0.0047 (  9.8%)   0.0276 (  6.2%)   0.0278 (  6.2%)  481250135  DAG Legalization
   0.0175 (  4.4%)   0.0048 (  9.9%)   0.0223 (  5.0%)   0.0231 (  5.1%)  455645073  Type Legalization
   0.0092 (  2.3%)   0.0047 (  9.7%)   0.0139 (  3.1%)   0.0137 (  3.0%)  388554644  Instruction Scheduling Cleanup
   0.0057 (  1.4%)   0.0047 (  9.8%)   0.0104 (  2.4%)   0.0107 (  2.4%)  326297296  Vector Legalization
   0.0089 (  2.2%)   0.0004 (  0.8%)   0.0092 (  2.1%)   0.0093 (  2.0%)   98001723  DAG Combining after legalize types
   0.3949 (100.0%)   0.0483 (100.0%)   0.4432 (100.0%)   0.4524 (100.0%)  6374977690  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 2.4318 seconds (2.4717 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.6326 ( 32.9%)   0.2596 ( 50.9%)   0.8922 ( 36.7%)   0.9075 ( 36.7%)  9093031759  AArch64 Instruction Selection
   0.1319 (  6.9%)   0.2043 ( 40.0%)   0.3361 ( 13.8%)   0.3398 ( 13.7%)  3764363631  AArch64 Assembly Printer
   0.2016 ( 10.5%)   0.0005 (  0.1%)   0.2021 (  8.3%)   0.2036 (  8.2%)  2487079531  Branch Probability Basic Block Placement
   0.1485 (  7.7%)   0.0004 (  0.1%)   0.1489 (  6.1%)   0.1497 (  6.1%)  1184297842  Control Flow Optimizer
   0.0899 (  4.7%)   0.0060 (  1.2%)   0.0960 (  3.9%)   0.0971 (  3.9%)  1123119540  Merge disjoint stack slots
   0.0566 (  2.9%)   0.0017 (  0.3%)   0.0582 (  2.4%)   0.0592 (  2.4%)  581010640  Greedy Register Allocator
   0.0446 (  2.3%)   0.0018 (  0.3%)   0.0464 (  1.9%)   0.0477 (  1.9%)  398700449  CodeGen Prepare
   0.0440 (  2.3%)   0.0004 (  0.1%)   0.0444 (  1.8%)   0.0454 (  1.8%)  320770210  Simple Register Coalescing
   0.0375 (  2.0%)   0.0008 (  0.2%)   0.0384 (  1.6%)   0.0384 (  1.6%)  514716387  Live Variable Analysis
   0.0324 (  1.7%)   0.0012 (  0.2%)   0.0336 (  1.4%)   0.0337 (  1.4%)  193160032  Live Interval Analysis
   0.0311 (  1.6%)   0.0004 (  0.1%)   0.0316 (  1.3%)   0.0330 (  1.3%)  371458250  Machine Instruction Scheduler
   0.0267 (  1.4%)   0.0001 (  0.0%)   0.0269 (  1.1%)   0.0270 (  1.1%)  331502370  AArch64 load / store optimization pass
   0.0202 (  1.0%)   0.0002 (  0.0%)   0.0204 (  0.8%)   0.0208 (  0.8%)  130127378  Prologue/Epilogue Insertion & Frame Finalization
   0.0159 (  0.8%)   0.0003 (  0.0%)   0.0162 (  0.7%)   0.0169 (  0.7%)  108527868  Machine code sinking
   0.0150 (  0.8%)   0.0011 (  0.2%)   0.0162 (  0.7%)   0.0162 (  0.7%)  125256424  Memory SSA
   0.0146 (  0.8%)   0.0002 (  0.0%)   0.0148 (  0.6%)   0.0149 (  0.6%)  157745290  Remove dead machine instructions
   0.0120 (  0.6%)   0.0003 (  0.1%)   0.0123 (  0.5%)   0.0126 (  0.5%)   69240869  Virtual Register Rewriter
   0.0119 (  0.6%)   0.0003 (  0.1%)   0.0122 (  0.5%)   0.0124 (  0.5%)  105492803  Machine Common Subexpression Elimination
   0.0097 (  0.5%)   0.0001 (  0.0%)   0.0097 (  0.4%)   0.0098 (  0.4%)   62131793  Branch Probability Analysis #2
   0.0092 (  0.5%)   0.0002 (  0.0%)   0.0094 (  0.4%)   0.0097 (  0.4%)   96000604  Two-Address instruction pass
   0.0092 (  0.5%)   0.0002 (  0.0%)   0.0094 (  0.4%)   0.0095 (  0.4%)  113744830  Peephole Optimizations
   0.0078 (  0.4%)   0.0004 (  0.1%)   0.0082 (  0.3%)   0.0089 (  0.4%)  103346285  Loop Strength Reduction
   0.0083 (  0.4%)   0.0002 (  0.0%)   0.0085 (  0.4%)   0.0085 (  0.3%)   61189281  Branch Probability Analysis
   0.0081 (  0.4%)   0.0001 (  0.0%)   0.0082 (  0.3%)   0.0084 (  0.3%)  100283314  Machine Copy Propagation Pass
   0.0071 (  0.4%)   0.0009 (  0.2%)   0.0080 (  0.3%)   0.0083 (  0.3%)   56202830  Eliminate PHI nodes for register allocation
   0.0070 (  0.4%)   0.0005 (  0.1%)   0.0075 (  0.3%)   0.0080 (  0.3%)   54314737  MachinePostDominator Tree Construction
   0.0068 (  0.4%)   0.0010 (  0.2%)   0.0077 (  0.3%)   0.0078 (  0.3%)   44633924  Slot index numbering
   0.0072 (  0.4%)   0.0002 (  0.0%)   0.0074 (  0.3%)   0.0076 (  0.3%)   87766406  Early Tail Duplication
   0.0074 (  0.4%)   0.0001 (  0.0%)   0.0076 (  0.3%)   0.0076 (  0.3%)   80626051  Remove dead machine instructions #2
   0.0069 (  0.4%)   0.0005 (  0.1%)   0.0074 (  0.3%)   0.0074 (  0.3%)   41014285  Slot index numbering #2
   0.0060 (  0.3%)   0.0007 (  0.1%)   0.0067 (  0.3%)   0.0072 (  0.3%)   41140942  MachineDominator Tree Construction
   0.0070 (  0.4%)   0.0002 (  0.0%)   0.0072 (  0.3%)   0.0072 (  0.3%)   73907009  Simplify the CFG
   0.0068 (  0.4%)   0.0001 (  0.0%)   0.0069 (  0.3%)   0.0069 (  0.3%)   84586206  Machine Copy Propagation Pass #2
   0.0061 (  0.3%)   0.0004 (  0.1%)   0.0065 (  0.3%)   0.0065 (  0.3%)   54340145  MachinePostDominator Tree Construction #2
   0.0057 (  0.3%)   0.0006 (  0.1%)   0.0063 (  0.3%)   0.0064 (  0.3%)   54059079  Post-Dominator Tree Construction #2
   0.0058 (  0.3%)   0.0001 (  0.0%)   0.0059 (  0.2%)   0.0059 (  0.2%)   46145979  AArch64 Collect Linker Optimization Hint (LOH)
   0.0051 (  0.3%)   0.0006 (  0.1%)   0.0057 (  0.2%)   0.0057 (  0.2%)   54005700  Post-Dominator Tree Construction
   0.0050 (  0.3%)   0.0006 (  0.1%)   0.0056 (  0.2%)   0.0056 (  0.2%)   44647405  MachinePostDominator Tree Construction #3
   0.0048 (  0.2%)   0.0003 (  0.1%)   0.0050 (  0.2%)   0.0056 (  0.2%)   47534346  Machine InstCombiner
   0.0044 (  0.2%)   0.0004 (  0.1%)   0.0049 (  0.2%)   0.0049 (  0.2%)   40128980  MachineDominator Tree Construction #4
   0.0045 (  0.2%)   0.0002 (  0.0%)   0.0047 (  0.2%)   0.0049 (  0.2%)   42290173  AArch64 pseudo instruction expansion pass
   0.0045 (  0.2%)   0.0003 (  0.1%)   0.0048 (  0.2%)   0.0048 (  0.2%)   48064278  Block Frequency Analysis
   0.0044 (  0.2%)   0.0004 (  0.1%)   0.0048 (  0.2%)   0.0048 (  0.2%)   40080835  MachineDominator Tree Construction #2
   0.0042 (  0.2%)   0.0005 (  0.1%)   0.0047 (  0.2%)   0.0047 (  0.2%)   41236504  MachineDominator Tree Construction #5
   0.0038 (  0.2%)   0.0002 (  0.0%)   0.0040 (  0.2%)   0.0047 (  0.2%)   37338288  Constant Hoisting
   0.0043 (  0.2%)   0.0003 (  0.1%)   0.0046 (  0.2%)   0.0046 (  0.2%)   39083275  Dominator Tree Construction #8
   0.0044 (  0.2%)   0.0001 (  0.0%)   0.0046 (  0.2%)   0.0045 (  0.2%)   15237924  ObjC ARC contraction
   0.0041 (  0.2%)   0.0004 (  0.1%)   0.0044 (  0.2%)   0.0045 (  0.2%)   39207224  Dominator Tree Construction #4
   0.0037 (  0.2%)   0.0003 (  0.1%)   0.0040 (  0.2%)   0.0044 (  0.2%)   50164445  Induction Variable Users
   0.0039 (  0.2%)   0.0005 (  0.1%)   0.0044 (  0.2%)   0.0043 (  0.2%)   38877096  Dominator Tree Construction
   0.0038 (  0.2%)   0.0003 (  0.1%)   0.0042 (  0.2%)   0.0041 (  0.2%)   40417867  MachineDominator Tree Construction #3
   0.0037 (  0.2%)   0.0004 (  0.1%)   0.0041 (  0.2%)   0.0041 (  0.2%)   39442007  Dominator Tree Construction #5
   0.0039 (  0.2%)   0.0001 (  0.0%)   0.0040 (  0.2%)   0.0041 (  0.2%)   15783281  AArch64 Compress Jump Tables
   0.0035 (  0.2%)   0.0005 (  0.1%)   0.0040 (  0.2%)   0.0040 (  0.2%)   34129315  MachineDominator Tree Construction #6
   0.0026 (  0.1%)   0.0014 (  0.3%)   0.0039 (  0.2%)   0.0040 (  0.2%)   32983814  Free MachineFunction
   0.0034 (  0.2%)   0.0005 (  0.1%)   0.0039 (  0.2%)   0.0039 (  0.2%)   38705492  Dominator Tree Construction #2
   0.0035 (  0.2%)   0.0002 (  0.0%)   0.0036 (  0.1%)   0.0039 (  0.2%)   39711609  Local Stack Slot Allocation
   0.0037 (  0.2%)   0.0002 (  0.0%)   0.0038 (  0.2%)   0.0038 (  0.2%)   26998014  Machine Block Frequency Analysis #5
   0.0037 (  0.2%)   0.0001 (  0.0%)   0.0038 (  0.2%)   0.0038 (  0.2%)   14187857  Finalize ISel and expand pseudo-instructions
   0.0034 (  0.2%)   0.0005 (  0.1%)   0.0038 (  0.2%)   0.0038 (  0.2%)   39547991  Dominator Tree Construction #3
   0.0035 (  0.2%)   0.0003 (  0.1%)   0.0038 (  0.2%)   0.0038 (  0.2%)   39124746  Dominator Tree Construction #6
   0.0035 (  0.2%)   0.0001 (  0.0%)   0.0037 (  0.2%)   0.0038 (  0.2%)   18626552  AArch64 Condition Optimizer
   0.0037 (  0.2%)   0.0001 (  0.0%)   0.0037 (  0.2%)   0.0038 (  0.2%)   28787069  AArch64 Dead register definitions
   0.0034 (  0.2%)   0.0002 (  0.0%)   0.0036 (  0.1%)   0.0038 (  0.2%)   15302878  Branch relaxation pass
   0.0033 (  0.2%)   0.0003 (  0.1%)   0.0036 (  0.1%)   0.0037 (  0.1%)   39363543  Dominator Tree Construction #7
   0.0032 (  0.2%)   0.0001 (  0.0%)   0.0034 (  0.1%)   0.0036 (  0.1%)   21702873  Post-RA pseudo instruction expansion pass
   0.0033 (  0.2%)   0.0001 (  0.0%)   0.0034 (  0.1%)   0.0034 (  0.1%)   31528840  Machine Block Frequency Analysis #3
   0.0030 (  0.2%)   0.0002 (  0.0%)   0.0031 (  0.1%)   0.0033 (  0.1%)   31375217  Machine Block Frequency Analysis
   0.0030 (  0.2%)   0.0001 (  0.0%)   0.0031 (  0.1%)   0.0031 (  0.1%)   13939713  Interleaved Load Combine Pass
   0.0029 (  0.2%)   0.0001 (  0.0%)   0.0030 (  0.1%)   0.0031 (  0.1%)   31374222  Machine Block Frequency Analysis #2
   0.0026 (  0.1%)   0.0002 (  0.0%)   0.0028 (  0.1%)   0.0030 (  0.1%)   22842835  Shrink Wrapping analysis
   0.0029 (  0.2%)   0.0001 (  0.0%)   0.0030 (  0.1%)   0.0030 (  0.1%)    8921850  AArch64 Conditional Branch Tuning
   0.0028 (  0.1%)   0.0001 (  0.0%)   0.0029 (  0.1%)   0.0029 (  0.1%)    7404709  Unpack machine instruction bundles
   0.0027 (  0.1%)   0.0001 (  0.0%)   0.0028 (  0.1%)   0.0028 (  0.1%)   31289526  Machine Block Frequency Analysis #4
   0.0024 (  0.1%)   0.0001 (  0.0%)   0.0026 (  0.1%)   0.0027 (  0.1%)   16579584  PostRA Machine Sink
   0.0026 (  0.1%)   0.0001 (  0.0%)   0.0027 (  0.1%)   0.0027 (  0.1%)   20830194  Natural Loop Information #6
   0.0022 (  0.1%)   0.0004 (  0.1%)   0.0027 (  0.1%)   0.0027 (  0.1%)   39019060  Natural Loop Information
   0.0017 (  0.1%)   0.0002 (  0.0%)   0.0019 (  0.1%)   0.0026 (  0.1%)   16821219  Tail Duplication
   0.0024 (  0.1%)   0.0002 (  0.0%)   0.0026 (  0.1%)   0.0026 (  0.1%)   32596316  Canonicalize Freeze Instructions in Loops
   0.0024 (  0.1%)   0.0001 (  0.0%)   0.0026 (  0.1%)   0.0026 (  0.1%)   17441685  Lower constant intrinsics
   0.0022 (  0.1%)   0.0002 (  0.0%)   0.0024 (  0.1%)   0.0025 (  0.1%)   18700525  Machine Natural Loop Construction
   0.0022 (  0.1%)   0.0001 (  0.0%)   0.0023 (  0.1%)   0.0023 (  0.1%)   14093543  Remove unreachable machine basic blocks
   0.0021 (  0.1%)   0.0001 (  0.0%)   0.0022 (  0.1%)   0.0022 (  0.1%)   11657502  AArch64 MI Peephole Optimization pass
   0.0021 (  0.1%)   0.0001 (  0.0%)   0.0022 (  0.1%)   0.0022 (  0.1%)   10808188  Insert stack protectors
   0.0021 (  0.1%)   0.0001 (  0.0%)   0.0022 (  0.1%)   0.0022 (  0.1%)   18979256  Expand memcmp() to load/stores
   0.0021 (  0.1%)   0.0001 (  0.0%)   0.0022 (  0.1%)   0.0022 (  0.1%)   20817342  Natural Loop Information #5
   0.0020 (  0.1%)   0.0001 (  0.0%)   0.0022 (  0.1%)   0.0022 (  0.1%)   20738170  Natural Loop Information #3
   0.0020 (  0.1%)   0.0001 (  0.0%)   0.0021 (  0.1%)   0.0021 (  0.1%)   19900880  Natural Loop Information #4
   0.0019 (  0.1%)   0.0000 (  0.0%)   0.0019 (  0.1%)   0.0021 (  0.1%)    7976838  AArch64 Promote Constant
   0.0019 (  0.1%)   0.0001 (  0.0%)   0.0020 (  0.1%)   0.0020 (  0.1%)    9966904  AArch64 Store Pair Suppression
   0.0019 (  0.1%)   0.0001 (  0.0%)   0.0020 (  0.1%)   0.0020 (  0.1%)   15096748  Type Promotion
   0.0019 (  0.1%)   0.0001 (  0.0%)   0.0020 (  0.1%)   0.0020 (  0.1%)    9099038  AArch64 Stack Tagging PreRA
   0.0017 (  0.1%)   0.0001 (  0.0%)   0.0018 (  0.1%)   0.0020 (  0.1%)   10014588  Expand large div/rem
   0.0018 (  0.1%)   0.0001 (  0.0%)   0.0019 (  0.1%)   0.0020 (  0.1%)   18664096  Machine Natural Loop Construction #3
   0.0018 (  0.1%)   0.0001 (  0.0%)   0.0019 (  0.1%)   0.0019 (  0.1%)   18156000  Machine Cycle Info Analysis
   0.0019 (  0.1%)   0.0001 (  0.0%)   0.0020 (  0.1%)   0.0019 (  0.1%)   19852274  Natural Loop Information #2
   0.0018 (  0.1%)   0.0001 (  0.0%)   0.0019 (  0.1%)   0.0019 (  0.1%)   13589190  Remove unreachable blocks from the CFG
   0.0017 (  0.1%)   0.0001 (  0.0%)   0.0018 (  0.1%)   0.0019 (  0.1%)   18533280  Machine Natural Loop Construction #2
   0.0018 (  0.1%)   0.0001 (  0.0%)   0.0019 (  0.1%)   0.0019 (  0.1%)    9133019  Process Implicit Definitions
   0.0018 (  0.1%)   0.0001 (  0.0%)   0.0018 (  0.1%)   0.0019 (  0.1%)   16950641  Machine Natural Loop Construction #4
   0.0018 (  0.1%)   0.0001 (  0.0%)   0.0019 (  0.1%)   0.0019 (  0.1%)   11227404  Interleaved Access Pass
   0.0017 (  0.1%)   0.0001 (  0.0%)   0.0018 (  0.1%)   0.0018 (  0.1%)    9472616  Debug Variable Analysis
   0.0017 (  0.1%)   0.0001 (  0.0%)   0.0018 (  0.1%)   0.0018 (  0.1%)   18265577  Partially inline calls to library functions
   0.0012 (  0.1%)   0.0002 (  0.0%)   0.0014 (  0.1%)   0.0016 (  0.1%)   18316073  Early Machine Loop Invariant Code Motion
   0.0013 (  0.1%)   0.0001 (  0.0%)   0.0014 (  0.1%)   0.0014 (  0.1%)    8077346  AArch64 Expand Hardened Pseudos
   0.0013 (  0.1%)   0.0001 (  0.0%)   0.0014 (  0.1%)   0.0014 (  0.1%)   12465953  Early If-Conversion
   0.0011 (  0.1%)   0.0001 (  0.0%)   0.0012 (  0.1%)   0.0014 (  0.1%)    8907759  AArch64 Redundant Copy Elimination
   0.0010 (  0.0%)   0.0001 (  0.0%)   0.0011 (  0.0%)   0.0014 (  0.1%)   11883955  AArch64 Conditional Compares
   0.0012 (  0.1%)   0.0001 (  0.0%)   0.0013 (  0.1%)   0.0013 (  0.1%)    9928839  Replace intrinsics with calls to vector library
   0.0012 (  0.1%)   0.0001 (  0.0%)   0.0013 (  0.1%)   0.0013 (  0.1%)   11218060  Expand Atomic instructions
   0.0010 (  0.1%)   0.0001 (  0.0%)   0.0011 (  0.0%)   0.0011 (  0.0%)   11063391  Scalarize Masked Memory Intrinsics
   0.0009 (  0.0%)   0.0001 (  0.0%)   0.0010 (  0.0%)   0.0011 (  0.0%)   10129231  Expand vector predication intrinsics
   0.0008 (  0.0%)   0.0001 (  0.0%)   0.0009 (  0.0%)   0.0010 (  0.0%)   13439385  Scalar Evolution Analysis
   0.0007 (  0.0%)   0.0002 (  0.0%)   0.0008 (  0.0%)   0.0010 (  0.0%)    7808228  Optimize machine instruction PHIs
   0.0002 (  0.0%)   0.0003 (  0.1%)   0.0004 (  0.0%)   0.0010 (  0.0%)    7225458  AArch64 SIMD instructions optimization pass
   0.0009 (  0.0%)   0.0001 (  0.0%)   0.0010 (  0.0%)   0.0009 (  0.0%)   10030927  Expand reduction intrinsics
   0.0007 (  0.0%)   0.0001 (  0.0%)   0.0009 (  0.0%)   0.0009 (  0.0%)   10509325  Exception handling preparation
   0.0006 (  0.0%)   0.0001 (  0.0%)   0.0007 (  0.0%)   0.0009 (  0.0%)   10756261  Loop Data Prefetch
   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0008 (  0.0%)     399932  Stack Safety Analysis
   0.0006 (  0.0%)   0.0001 (  0.0%)   0.0007 (  0.0%)   0.0007 (  0.0%)    9552554  Bundle Machine CFG Edges
   0.0005 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)    8151432  Spill Code Placement Analysis
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0006 (  0.0%)    8592314  Canonicalize natural loops
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)    7705783  Machine Trace Metrics
   0.0003 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0006 (  0.0%)    8462909  Basic Alias Analysis (stateless AA impl)
   0.0005 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)    8259233  Merge contiguous icmps into a memcmp
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)    6654996  AArch64 sls hardening pass
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)    8165062  Function Alias Analysis Results #5
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)    8290975  Machine Loop Invariant Code Motion
   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)     382929  Machine Outliner
   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)    6516167  Remove Redundant DEBUG_VALUE analysis
   0.0003 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)    7040456  Basic Alias Analysis (stateless AA impl) #5
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7042205  Live Register Matrix
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7919675  Function Alias Analysis Results #3
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7914823  Function Alias Analysis Results #2
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7427618  Falkor HW Prefetch Fix
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7919671  Function Alias Analysis Results #4
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7001940  Basic Alias Analysis (stateless AA impl) #4
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    7897532  Function Alias Analysis Results
   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    6738461  Machine Trace Metrics #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0003 (  0.0%)    6432875  Insert CFI remember/restore state instructions
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6552466  Virtual Register Map
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0002 (  0.0%)    6857488  Lazy Branch Probability Analysis #3
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0002 (  0.0%)    6980404  Basic Alias Analysis (stateless AA impl) #3
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6992533  Basic Alias Analysis (stateless AA impl) #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6567037  Live DEBUG_VALUE analysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6546165  Insert KCFI indirect call checks
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    7115379  Canonicalize natural loops #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6705679  SME ABI Pass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0002 (  0.0%)    6901143  Lazy Branch Probability Analysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6902561  Lazy Branch Probability Analysis #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6400876  Lazy Machine Block Frequency Analysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6435520  Insert fentry calls
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6403950  Lazy Machine Block Frequency Analysis #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6402121  Lazy Machine Block Frequency Analysis #6
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6405330  Falkor HW Prefetch Fix Late Phase
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6402810  AArch64 Branch Targets
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6502009  Insert XRay ops
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6408743  TLS Variable Hoist
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6465573  Implement the 'patchable-function' attribute
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6411075  SME Peephole Optimization pass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6425271  PostRA Machine Instruction Scheduler
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6485534  Machine Optimization Remark Emitter #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6403937  Rename Disconnected Subregister Components
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6409365  Live Stack Slot Analysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6485019  Machine Optimization Remark Emitter #3
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6401670  Register Allocation Pass Scoring
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6485531  Machine Optimization Remark Emitter
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6403950  AArch64 speculation hardening pass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6440444  Stack Slot Coloring
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6412128  Fixup Statepoint Caller Saved
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6376590  Lazy Block Frequency Analysis #3
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6403950  Lazy Machine Block Frequency Analysis #5
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6367954  Safe Stack instrumentation pass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6403685  StackMap Liveness Analysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6402217  Analyze Machine Code For Garbage Collection
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6412517  A57 FP Anti-dependency breaker
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6376590  Lazy Block Frequency Analysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6365760  AArch64 Stack Tagging
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6418882  Contiguously Lay Out Funclets
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6362055  Lower Garbage Collection Instructions
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6420842  AArch64 Indirect Thunks
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6364901  Shadow Stack GC Lowering
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6401666  Workaround A53 erratum 835769 pass
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6408311  Lazy Machine Block Frequency Analysis #3
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6360056  Merge internal globals
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6453640  Optimization Remark Emitter
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6376621  Lazy Block Frequency Analysis #2
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6401955  Detect Dead Lanes
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    6403961  Lazy Machine Block Frequency Analysis #4
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)     495128  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      34630  Pre-ISel Intrinsic Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Machine Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Profile summary info
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Default Regalloc Priority Advisor
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Default Regalloc Eviction Advisor
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      22473  Lower @llvm.global_dtors via `__cxa_atexit`
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Scoped NoAlias Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      11153  Target Pass Configuration
   1.9215 (100.0%)   0.5103 (100.0%)   2.4318 (100.0%)   2.4717 (100.0%)  24676503454  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0681 seconds (0.0690 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.0323 (100.0%)   0.0358 (100.0%)   0.0681 (100.0%)   0.0690 (100.0%)  2375980112  DWARF Exception Writer
   0.0323 (100.0%)   0.0358 (100.0%)   0.0681 (100.0%)   0.0690 (100.0%)  2375980112  Total

===-------------------------------------------------------------------------===
                          Clang front-end time report
===-------------------------------------------------------------------------===
  Total Execution Time: 48.2802 seconds (48.8638 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
  47.3865 (100.0%)   0.8937 (100.0%)  48.2802 (100.0%)  48.8638 (100.0%)  578082259552  Clang front-end timer
  47.3865 (100.0%)   0.8937 (100.0%)  48.2802 (100.0%)  48.8638 (100.0%)  578082259552  Total

 -ftime-report -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB 47.40s user 0.93s system 98% cpu 49.062 total
```

After
```
% time /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -ftime-report -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/Users/nshulga/git/pytorch/pytorch/build/aten/src -I/Users/nshulga/git/pytorch/pytorch/aten/src -I/Users/nshulga/git/pytorch/pytorch/build -I/Users/nshulga/git/pytorch/pytorch -I/Users/nshulga/git/pytorch/pytorch/cmake/../third_party/benchmark/include -I/Users/nshulga/git/pytorch/pytorch/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/build/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/build/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include -I/Users/nshulga/git/pytorch/pytorch/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/../aten/src -I/Users/nshulga/git/pytorch/pytorch/torch/csrc -I/Users/nshulga/git/pytorch/pytorch/third_party/miniz-2.1.0 -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/include -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/src -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/FXdiv/include -I/Users/nshulga/git/pytorch/pytorch/c10/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/pthreadpool/include -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/deps/clog/include -I/Users/nshulga/git/pytorch/pytorch/third_party/NNPACK/include -I/Users/nshulga/git/pytorch/pytorch/third_party/FP16/include -I/Users/nshulga/git/pytorch/pytorch/third_party/fmt/include -I/Users/nshulga/git/pytorch/pytorch/third_party/flatbuffers/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googletest/include -isystem /Users/nshulga/git/pytorch/pytorch/third_party/protobuf/src -isystem /Users/nshulga/git/pytorch/pytorch/third_party/XNNPACK/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/eigen -isystem /Users/nshulga/git/pytorch/pytorch/build/include  -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wsuggest-override -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wno-error=inconsistent-missing-override -Wno-error=inconsistent-missing-destructor-override -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -faligned-new -Werror -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces -O3 -DNDEBUG -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk -fPIC -D__NEON__ -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-unused-function -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-strict-overflow -Wno-strict-aliasing -fvisibility=hidden -O2 -Wmissing-prototypes -Werror=missing-prototypes -Xpreprocessor -fopenmp -I/Users/nshulga/miniforge3/include -std=gnu++17 -Wno-missing-prototypes -Wno-error=missing-prototypes -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSchema.cpp.o -c /Users/nshulga/git/pytorch/pytorch/build/aten/src/ATen/RegisterSchema.cpp
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 1.2920 seconds (1.3187 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.3070 ( 27.6%)   0.0547 ( 30.2%)   0.3617 ( 28.0%)   0.3654 ( 27.7%)  3719690895  ModuleInlinerWrapperPass
   0.3024 ( 27.2%)   0.0525 ( 29.0%)   0.3549 ( 27.5%)   0.3585 ( 27.2%)  3653363330  DevirtSCCRepeatedPass
   0.0619 (  5.6%)   0.0073 (  4.0%)   0.0692 (  5.4%)   0.0711 (  5.4%)  868136227  InstCombinePass
   0.0601 (  5.4%)   0.0065 (  3.6%)   0.0666 (  5.2%)   0.0679 (  5.1%)  696430647  InlinerPass
   0.0363 (  3.3%)   0.0033 (  1.8%)   0.0396 (  3.1%)   0.0425 (  3.2%)  535426974  SimplifyCFGPass
   0.0280 (  2.5%)   0.0069 (  3.8%)   0.0348 (  2.7%)   0.0358 (  2.7%)  378716394  BlockFrequencyAnalysis
   0.0208 (  1.9%)   0.0049 (  2.7%)   0.0257 (  2.0%)   0.0262 (  2.0%)  283689627  BranchProbabilityAnalysis
   0.0239 (  2.1%)   0.0002 (  0.1%)   0.0241 (  1.9%)   0.0241 (  1.8%)  219122704  OpenMPOptCGSCCPass
   0.0174 (  1.6%)   0.0015 (  0.8%)   0.0189 (  1.5%)   0.0192 (  1.5%)  215583965  GVNPass
   0.0153 (  1.4%)   0.0025 (  1.4%)   0.0178 (  1.4%)   0.0187 (  1.4%)  184232295  EarlyCSEPass
   0.0079 (  0.7%)   0.0064 (  3.5%)   0.0143 (  1.1%)   0.0145 (  1.1%)  192415300  AAManager
   0.0116 (  1.0%)   0.0019 (  1.0%)   0.0134 (  1.0%)   0.0135 (  1.0%)  153354488  JumpThreadingPass
   0.0099 (  0.9%)   0.0023 (  1.3%)   0.0122 (  0.9%)   0.0131 (  1.0%)  128911185  CGProfilePass
   0.0081 (  0.7%)   0.0022 (  1.2%)   0.0103 (  0.8%)   0.0128 (  1.0%)  112266933  SLPVectorizerPass
   0.0119 (  1.1%)   0.0005 (  0.3%)   0.0124 (  1.0%)   0.0125 (  0.9%)  131510939  MemorySSAAnalysis
   0.0122 (  1.1%)   0.0002 (  0.1%)   0.0124 (  1.0%)   0.0124 (  0.9%)  129264559  DSEPass
   0.0108 (  1.0%)   0.0010 (  0.6%)   0.0118 (  0.9%)   0.0119 (  0.9%)  158891693  DominatorTreeAnalysis
   0.0116 (  1.0%)   0.0002 (  0.1%)   0.0119 (  0.9%)   0.0119 (  0.9%)  118946130  CorrelatedValuePropagationPass
   0.0082 (  0.7%)   0.0017 (  0.9%)   0.0099 (  0.8%)   0.0100 (  0.8%)  120247256  LoopAnalysis
   0.0090 (  0.8%)   0.0008 (  0.5%)   0.0099 (  0.8%)   0.0099 (  0.8%)   84784225  ADCEPass
   0.0076 (  0.7%)   0.0014 (  0.8%)   0.0090 (  0.7%)   0.0098 (  0.7%)  111411449  SROAPass
   0.0080 (  0.7%)   0.0005 (  0.3%)   0.0085 (  0.7%)   0.0085 (  0.6%)  109824455  PostDominatorTreeAnalysis
   0.0063 (  0.6%)   0.0012 (  0.7%)   0.0076 (  0.6%)   0.0079 (  0.6%)   80323239  LoopVectorizePass
   0.0068 (  0.6%)   0.0003 (  0.2%)   0.0071 (  0.6%)   0.0076 (  0.6%)   60675565  LoopIdiomRecognizePass
   0.0068 (  0.6%)   0.0004 (  0.2%)   0.0072 (  0.6%)   0.0071 (  0.5%)   87177852  LICMPass
   0.0046 (  0.4%)   0.0021 (  1.1%)   0.0067 (  0.5%)   0.0069 (  0.5%)   74829034  PostOrderFunctionAttrsPass
   0.0064 (  0.6%)   0.0001 (  0.1%)   0.0065 (  0.5%)   0.0065 (  0.5%)   48619557  SCCPPass
   0.0063 (  0.6%)   0.0001 (  0.1%)   0.0064 (  0.5%)   0.0064 (  0.5%)   71987307  LoopDeletionPass
   0.0058 (  0.5%)   0.0000 (  0.0%)   0.0059 (  0.5%)   0.0059 (  0.4%)   71423762  HotColdSplittingPass
   0.0050 (  0.5%)   0.0006 (  0.3%)   0.0057 (  0.4%)   0.0056 (  0.4%)   57327860  MemCpyOptPass
   0.0043 (  0.4%)   0.0013 (  0.7%)   0.0056 (  0.4%)   0.0056 (  0.4%)   73868907  LoopSimplifyPass
   0.0054 (  0.5%)   0.0000 (  0.0%)   0.0055 (  0.4%)   0.0055 (  0.4%)   61231613  LoopUnrollPass
   0.0045 (  0.4%)   0.0009 (  0.5%)   0.0054 (  0.4%)   0.0054 (  0.4%)   63427035  LoopSinkPass
   0.0031 (  0.3%)   0.0022 (  1.2%)   0.0053 (  0.4%)   0.0053 (  0.4%)   60661182  LowerMatrixIntrinsicsPass
   0.0039 (  0.3%)   0.0003 (  0.2%)   0.0042 (  0.3%)   0.0053 (  0.4%)   37913352  GlobalOptPass
   0.0037 (  0.3%)   0.0010 (  0.6%)   0.0047 (  0.4%)   0.0050 (  0.4%)   40405305  IPSCCPPass
   0.0031 (  0.3%)   0.0014 (  0.8%)   0.0045 (  0.3%)   0.0046 (  0.3%)   76160561  BasicAA
   0.0036 (  0.3%)   0.0007 (  0.4%)   0.0043 (  0.3%)   0.0043 (  0.3%)   40024164  BDCEPass
   0.0011 (  0.1%)   0.0009 (  0.5%)   0.0020 (  0.2%)   0.0036 (  0.3%)   27093400  TargetIRAnalysis
   0.0033 (  0.3%)   0.0002 (  0.1%)   0.0035 (  0.3%)   0.0035 (  0.3%)   39935174  TailCallElimPass
   0.0026 (  0.2%)   0.0007 (  0.4%)   0.0033 (  0.3%)   0.0033 (  0.3%)   44962489  ScalarEvolutionAnalysis
   0.0028 (  0.3%)   0.0002 (  0.1%)   0.0030 (  0.2%)   0.0032 (  0.2%)   30018982  ReassociatePass
   0.0028 (  0.3%)   0.0002 (  0.1%)   0.0030 (  0.2%)   0.0032 (  0.2%)   28955128  IndVarSimplifyPass
   0.0030 (  0.3%)   0.0001 (  0.0%)   0.0031 (  0.2%)   0.0031 (  0.2%)   31205149  CalledValuePropagationPass
   0.0018 (  0.2%)   0.0004 (  0.2%)   0.0022 (  0.2%)   0.0022 (  0.2%)   22045025  Float2IntPass
   0.0020 (  0.2%)   0.0001 (  0.0%)   0.0020 (  0.2%)   0.0020 (  0.2%)   23867545  LoopLoadEliminationPass
   0.0006 (  0.1%)   0.0005 (  0.3%)   0.0011 (  0.1%)   0.0020 (  0.2%)    7821972  OpenMPOptPass
   0.0011 (  0.1%)   0.0004 (  0.2%)   0.0015 (  0.1%)   0.0017 (  0.1%)   35512421  LCSSAPass
   0.0015 (  0.1%)   0.0002 (  0.1%)   0.0017 (  0.1%)   0.0017 (  0.1%)   28268765  VectorCombinePass
   0.0009 (  0.1%)   0.0007 (  0.4%)   0.0016 (  0.1%)   0.0016 (  0.1%)   23018362  MemoryDependenceAnalysis
   0.0014 (  0.1%)   0.0000 (  0.0%)   0.0015 (  0.1%)   0.0015 (  0.1%)    9265818  GlobalDCEPass
   0.0013 (  0.1%)   0.0000 (  0.0%)   0.0013 (  0.1%)   0.0013 (  0.1%)   17548240  InstSimplifyPass
   0.0009 (  0.1%)   0.0004 (  0.2%)   0.0013 (  0.1%)   0.0013 (  0.1%)   15122797  LowerConstantIntrinsicsPass
   0.0011 (  0.1%)   0.0000 (  0.0%)   0.0011 (  0.1%)   0.0011 (  0.1%)    8506690  CallGraphAnalysis
   0.0008 (  0.1%)   0.0000 (  0.0%)   0.0009 (  0.1%)   0.0009 (  0.1%)    7505976  RequireAnalysisPass<llvm::GlobalsAA, llvm::Module, llvm::AnalysisManager<Module>>
   0.0008 (  0.1%)   0.0000 (  0.0%)   0.0009 (  0.1%)   0.0009 (  0.1%)    7485525  GlobalsAA
   0.0005 (  0.0%)   0.0002 (  0.1%)   0.0007 (  0.1%)   0.0009 (  0.1%)    9580105  LowerExpectIntrinsicPass
   0.0007 (  0.1%)   0.0001 (  0.1%)   0.0008 (  0.1%)   0.0008 (  0.1%)   12017197  LoopFullUnrollPass
   0.0006 (  0.1%)   0.0001 (  0.1%)   0.0007 (  0.1%)   0.0007 (  0.1%)   11381083  MergedLoadStoreMotionPass
   0.0004 (  0.0%)   0.0003 (  0.2%)   0.0007 (  0.1%)   0.0007 (  0.1%)   10150222  LoopDistributePass
   0.0007 (  0.1%)   0.0000 (  0.0%)   0.0007 (  0.1%)   0.0007 (  0.1%)    5649265  ReversePostOrderFunctionAttrsPass
   0.0005 (  0.0%)   0.0002 (  0.1%)   0.0007 (  0.1%)   0.0007 (  0.1%)   18702545  TargetLibraryAnalysis
   0.0006 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)    9964138  LoopInstSimplifyPass
   0.0004 (  0.0%)   0.0002 (  0.1%)   0.0006 (  0.0%)   0.0006 (  0.0%)   15049482  LoopRotatePass
   0.0005 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)   10571955  LibCallsShrinkWrapPass
   0.0004 (  0.0%)   0.0002 (  0.1%)   0.0006 (  0.0%)   0.0006 (  0.0%)   16184249  DemandedBitsAnalysis
   0.0004 (  0.0%)   0.0001 (  0.1%)   0.0005 (  0.0%)   0.0005 (  0.0%)   11227136  FunctionAnalysisManagerCGSCCProxy
   0.0002 (  0.0%)   0.0002 (  0.1%)   0.0004 (  0.0%)   0.0005 (  0.0%)   11871494  RequireAnalysisPass<llvm::OptimizationRemarkEmitterAnalysis, llvm::Function, llvm::AnalysisManager<Function>>
   0.0003 (  0.0%)   0.0002 (  0.1%)   0.0006 (  0.0%)   0.0005 (  0.0%)   16911686  LazyValueAnalysis
   0.0004 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)    9333915  LoopSimplifyCFGPass
   0.0003 (  0.0%)   0.0002 (  0.1%)   0.0005 (  0.0%)   0.0005 (  0.0%)   13022664  AssumptionAnalysis
   0.0003 (  0.0%)   0.0001 (  0.1%)   0.0004 (  0.0%)   0.0004 (  0.0%)    9524395  SimpleLoopUnswitchPass
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0004 (  0.0%)   0.0004 (  0.0%)   12094779  OptimizationRemarkEmitterAnalysis
   0.0002 (  0.0%)   0.0002 (  0.1%)   0.0004 (  0.0%)   0.0004 (  0.0%)   12032778  ScopedNoAliasAA
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0004 (  0.0%)   0.0004 (  0.0%)   12032220  TypeBasedAA
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8581050  CoroSplitPass
   0.0002 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    5126709  InjectTLIMappings
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8379445  CoroElidePass
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    3890082  RecomputeGlobalsAAPass
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8281975  SpeculativeExecutionPass
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8468516  PhiValuesAnalysis
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)    4100685  ConstantMergePass
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8462530  PromotePass
   0.0001 (  0.0%)   0.0001 (  0.1%)   0.0002 (  0.0%)   0.0003 (  0.0%)    8345373  InvalidateAnalysisPass<llvm::AAManager>
   0.0002 (  0.0%)   0.0001 (  0.1%)   0.0003 (  0.0%)   0.0003 (  0.0%)    8368732  ShouldNotRunFunctionPassesAnalysis
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    1308997  InferFunctionAttrsPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    4283689  DivRemPairsPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    4855512  WarnMissedTransformationsPass
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)    1157640  LazyCallGraphAnalysis
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0001 (  0.0%)     444866  DeadArgumentEliminationPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3627306  AlignmentFromAssumptionsPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3706342  LoopAccessAnalysis
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3573986  AnnotationRemarksPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)     632159  AlwaysInlinerPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)    3611080  InvalidateAnalysisPass<llvm::ShouldNotRunFunctionPassesAnalysis>
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      47153  EliminateAvailableExternallyPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      56285  Annotation2MetadataPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      58150  CoroEarlyPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      14016  CoroCleanupPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      13044  RelLookupTableConverterPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      13763  ProfileSummaryAnalysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      12678  InlineAdvisorAnalysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      12411  ForceFunctionAttrsPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)      12483  RequireAnalysisPass<llvm::ProfileSummaryAnalysis, llvm::Module, llvm::AnalysisManager<Module>>
   1.1105 (100.0%)   0.1815 (100.0%)   1.2920 (100.0%)   1.3187 (100.0%)  14047165388  Total

===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   1.1296 ( 94.4%)   0.4425 ( 98.2%)   1.5720 ( 95.4%)   1.6099 ( 94.9%)  16626483869  Code Generation Time
   0.0670 (  5.6%)   0.0081 (  1.8%)   0.0751 (  4.6%)   0.0858 (  5.1%)  806754444  LLVM IR Generation Time
   1.1965 (100.0%)   0.4506 (100.0%)   1.6471 (100.0%)   1.6957 (100.0%)  17433238313  Total

===-------------------------------------------------------------------------===
                              Register Allocation
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0007 seconds (0.0007 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.0006 (100.0%)   0.0000 (100.0%)   0.0007 (100.0%)   0.0007 (100.0%)    7870431  Seed Live Regs
   0.0006 (100.0%)   0.0000 (100.0%)   0.0007 (100.0%)   0.0007 (100.0%)    7870431  Total

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.1793 seconds (0.1846 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.0382 ( 24.1%)   0.0025 ( 11.8%)   0.0406 ( 22.7%)   0.0427 ( 23.1%)  449731195  DAG Combining 1
   0.0222 ( 14.0%)   0.0035 ( 16.6%)   0.0257 ( 14.3%)   0.0260 ( 14.1%)  323350124  Instruction Scheduling
   0.0207 ( 13.1%)   0.0024 ( 11.5%)   0.0231 ( 12.9%)   0.0257 ( 13.9%)  305541313  Instruction Selection
   0.0234 ( 14.8%)   0.0019 (  8.9%)   0.0252 ( 14.1%)   0.0255 ( 13.8%)  386744618  DAG Combining 2
   0.0171 ( 10.8%)   0.0026 ( 12.4%)   0.0197 ( 11.0%)   0.0199 ( 10.8%)  304585428  Instruction Creation
   0.0108 (  6.8%)   0.0019 (  9.1%)   0.0127 (  7.1%)   0.0128 (  6.9%)  213503986  DAG Legalization
   0.0107 (  6.7%)   0.0019 (  9.3%)   0.0126 (  7.0%)   0.0124 (  6.7%)  217202416  Type Legalization
   0.0089 (  5.6%)   0.0003 (  1.7%)   0.0093 (  5.2%)   0.0092 (  5.0%)   98375640  DAG Combining after legalize types
   0.0041 (  2.6%)   0.0020 (  9.3%)   0.0061 (  3.4%)   0.0061 (  3.3%)  175213222  Instruction Scheduling Cleanup
   0.0023 (  1.5%)   0.0020 (  9.4%)   0.0043 (  2.4%)   0.0043 (  2.4%)  143306060  Vector Legalization
   0.1584 (100.0%)   0.0209 (100.0%)   0.1793 (100.0%)   0.1846 (100.0%)  2617554002  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.8706 seconds (0.8844 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
   0.2523 ( 41.0%)   0.1142 ( 44.8%)   0.3665 ( 42.1%)   0.3729 ( 42.2%)  3751975511  AArch64 Instruction Selection
   0.0769 ( 12.5%)   0.1178 ( 46.2%)   0.1947 ( 22.4%)   0.1954 ( 22.1%)  2284494832  AArch64 Assembly Printer
   0.0199 (  3.2%)   0.0006 (  0.2%)   0.0205 (  2.4%)   0.0205 (  2.3%)  208860244  Greedy Register Allocator
   0.0169 (  2.8%)   0.0002 (  0.1%)   0.0172 (  2.0%)   0.0171 (  1.9%)  247073374  Live Variable Analysis
   0.0129 (  2.1%)   0.0003 (  0.1%)   0.0132 (  1.5%)   0.0139 (  1.6%)  165651494  CodeGen Prepare
   0.0133 (  2.2%)   0.0003 (  0.1%)   0.0136 (  1.6%)   0.0139 (  1.6%)  153339584  Machine Instruction Scheduler
   0.0105 (  1.7%)   0.0001 (  0.0%)   0.0106 (  1.2%)   0.0106 (  1.2%)  122934084  AArch64 load / store optimization pass
   0.0084 (  1.4%)   0.0003 (  0.1%)   0.0087 (  1.0%)   0.0091 (  1.0%)   81985504  Simple Register Coalescing
   0.0082 (  1.3%)   0.0004 (  0.2%)   0.0086 (  1.0%)   0.0086 (  1.0%)   76550569  Live Interval Analysis
   0.0078 (  1.3%)   0.0003 (  0.1%)   0.0081 (  0.9%)   0.0083 (  0.9%)  103543246  Loop Strength Reduction
   0.0077 (  1.3%)   0.0002 (  0.1%)   0.0079 (  0.9%)   0.0079 (  0.9%)   76599592  Prologue/Epilogue Insertion & Frame Finalization
   0.0064 (  1.0%)   0.0005 (  0.2%)   0.0069 (  0.8%)   0.0077 (  0.9%)   65721168  Merge disjoint stack slots
   0.0067 (  1.1%)  …
  • Loading branch information
malfet committed Nov 23, 2023
1 parent f961bda commit 926858f
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions torchgen/gen.py
Expand Up @@ -4,7 +4,7 @@
import os
import pathlib
from collections import defaultdict, namedtuple, OrderedDict
from dataclasses import dataclass
from dataclasses import dataclass, field
from typing import (
Any,
Callable,
Expand Down Expand Up @@ -542,13 +542,19 @@ def static_dispatch(
@dataclass(frozen=True)
class RegisterSchema:
selector: SelectiveBuilder
known_tags: Dict[str, int] = field(default_factory=dict)

@method_with_native_function
def __call__(self, f: NativeFunction) -> Optional[str]:
if not self.selector.is_native_function_selected(f):
return None
tags = "{" + ", ".join(f"at::Tag::{tag}" for tag in sorted(f.tags)) + "}"
return f"m.def({cpp_string(str(f.func))}, {tags});\n"
maybe_tags=""
if tags not in self.known_tags:
idx = len(self.known_tags)
self.known_tags[tags] = idx
maybe_tags = f"const std::vector<at::Tag> tags_{idx} = {tags};\n"
return f"{maybe_tags}m.def({cpp_string(str(f.func))}, tags_{self.known_tags[tags]});\n"


# Generates Operators.h and Operators.cpp.
Expand Down

0 comments on commit 926858f

Please sign in to comment.