Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clang-tidy performance issues in template and consexpr heavy code #86553

Open
ilovepi opened this issue Mar 25, 2024 · 8 comments · Fixed by #86596
Open

clang-tidy performance issues in template and consexpr heavy code #86553

ilovepi opened this issue Mar 25, 2024 · 8 comments · Fixed by #86596

Comments

@ilovepi
Copy link
Contributor

ilovepi commented Mar 25, 2024

We're getting reports from some of our developers that clang-tidy hangs on a certain test file in the Fuchsia source tree (see https://fxbug.dev/330907651).

After some investigation, I found that it will take clang-tidy about 1 hr 40 min to analyze the file. Using --enable-profile I got the following timing information.

===-------------------------------------------------------------------------===
                          clang-tidy checks profiling
===-------------------------------------------------------------------------===
  Total Execution Time: 6183.0321 seconds (6183.2801 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  5859.3510 ( 97.9%)  18.2100 (  9.2%)  5877.5610 ( 95.1%)  5877.7869 ( 95.1%)  google-runtime-int
   5.8892 (  0.1%)   8.2924 (  4.2%)  14.1816 (  0.2%)  14.0830 (  0.2%)  bugprone-use-after-move
   4.6973 (  0.1%)   7.0162 (  3.5%)  11.7135 (  0.2%)  11.6851 (  0.2%)  bugprone-standalone-empty
   4.4222 (  0.1%)   6.5870 (  3.3%)  11.0092 (  0.2%)  11.0159 (  0.2%)  google-upgrade-googletest-case
   4.3013 (  0.1%)   6.3715 (  3.2%)  10.6727 (  0.2%)  10.6772 (  0.2%)  bugprone-stringview-nullptr
   4.1630 (  0.1%)   6.1288 (  3.1%)  10.2918 (  0.2%)  10.2994 (  0.2%)  modernize-type-traits
   1.1905 (  0.0%)   8.9635 (  4.5%)  10.1540 (  0.2%)  10.1544 (  0.2%)  readability-identifier-naming
   3.6672 (  0.1%)   5.3202 (  2.7%)   8.9874 (  0.1%)   9.0209 (  0.1%)  bugprone-suspicious-string-compare
   7.7269 (  0.1%)   0.8458 (  0.4%)   8.5727 (  0.1%)   8.5746 (  0.1%)  modernize-macro-to-enum
   3.3512 (  0.1%)   4.9307 (  2.5%)   8.2819 (  0.1%)   8.2825 (  0.1%)  performance-move-const-arg
   3.3110 (  0.1%)   4.8247 (  2.4%)   8.1357 (  0.1%)   8.1421 (  0.1%)  modernize-use-nullptr
   3.2855 (  0.1%)   4.7510 (  2.4%)   8.0365 (  0.1%)   8.0638 (  0.1%)  bugprone-assert-side-effect
   3.1923 (  0.1%)   4.6726 (  2.4%)   7.8649 (  0.1%)   7.9634 (  0.1%)  readability-non-const-parameter
   3.1948 (  0.1%)   4.6543 (  2.4%)   7.8491 (  0.1%)   7.8627 (  0.1%)  bugprone-multiple-statement-macro
   3.1812 (  0.1%)   4.6334 (  2.3%)   7.8146 (  0.1%)   7.8233 (  0.1%)  bugprone-suspicious-semicolon
   3.1394 (  0.1%)   4.5716 (  2.3%)   7.7109 (  0.1%)   7.7115 (  0.1%)  bugprone-infinite-loop
   2.9936 (  0.1%)   4.3745 (  2.2%)   7.3681 (  0.1%)   7.3946 (  0.1%)  readability-redundant-control-flow
   3.0067 (  0.1%)   4.3366 (  2.2%)   7.3433 (  0.1%)   7.3469 (  0.1%)  readability-container-size-empty
   2.9848 (  0.0%)   4.3437 (  2.2%)   7.3286 (  0.1%)   7.3451 (  0.1%)  bugprone-unused-return-value
   2.9454 (  0.0%)   4.3885 (  2.2%)   7.3339 (  0.1%)   7.3424 (  0.1%)  misc-unused-using-decls
   2.8416 (  0.0%)   4.1509 (  2.1%)   6.9926 (  0.1%)   6.9975 (  0.1%)  bugprone-unused-raii
   2.8112 (  0.0%)   4.1556 (  2.1%)   6.9668 (  0.1%)   6.9684 (  0.1%)  bugprone-sizeof-expression
   2.5915 (  0.0%)   3.8519 (  1.9%)   6.4434 (  0.1%)   6.4736 (  0.1%)  bugprone-chained-comparison
   2.4902 (  0.0%)   3.7024 (  1.9%)   6.1927 (  0.1%)   6.1979 (  0.1%)  bugprone-inc-dec-in-conditions
   1.9273 (  0.0%)   2.7521 (  1.4%)   4.6794 (  0.1%)   4.6468 (  0.1%)  bugprone-implicit-widening-of-multiplication-result
   1.8187 (  0.0%)   2.5638 (  1.3%)   4.3825 (  0.1%)   4.3604 (  0.1%)  misc-redundant-expression
   2.1165 (  0.0%)   2.2258 (  1.1%)   4.3422 (  0.1%)   4.3366 (  0.1%)  performance-unnecessary-copy-initialization
  <redacted>
  5984.9860 (100.0%)  198.0461 (100.0%)  6183.0321 (100.0%)  6183.2801 (100.0%)  Total

60078 warnings generated.

I've tried to reproduce this behavior with the output from -gen-reproducer=always, but it seems be behaving differently (e.g., finishing w/in a 2 minutes) using the -cc1 options. Even with the original command (minus the include dirs) and with any required paths, I'm not seeing the same behavior. I'm not sure why using the pre-processed file would be functionally different, but it seems to make a large difference here.

For reference, the reproducer compiles in about 46 seconds, and w/ -ftime-trace I see that 42/46 seconds is in the frontend, nearly all of that instantiating functions, an significant portion evaluating R values.

I'm happy to share the reproducer, but since it isn't exhibiting this behavior I'm not sure it has much value. And aside from checking out the Fuchsia source and running a command against it, I'm not sure what I can provide to aid the investigation.

If possible, I'd like to find out:

  1. is this a known problem?
  2. is there a way to more faithfully reproduce the clang-tidy issue?
@ilovepi ilovepi added bug Indicates an unexpected problem or unintended behavior clang-tidy labels Mar 25, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 25, 2024

@llvm/issue-subscribers-bug

Author: Paul Kirth (ilovepi)

We're getting reports from some of our developers that clang-tidy hangs on a certain test file in the Fuchsia source tree (see https://fxbug.dev/330907651).

After some investigation, I found that it will take clang-tidy about 1 hr 40 min to analyze the file. Using --enable-profile I got the following timing information.

===-------------------------------------------------------------------------===
                          clang-tidy checks profiling
===-------------------------------------------------------------------------===
  Total Execution Time: 6183.0321 seconds (6183.2801 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  5859.3510 ( 97.9%)  18.2100 (  9.2%)  5877.5610 ( 95.1%)  5877.7869 ( 95.1%)  google-runtime-int
   5.8892 (  0.1%)   8.2924 (  4.2%)  14.1816 (  0.2%)  14.0830 (  0.2%)  bugprone-use-after-move
   4.6973 (  0.1%)   7.0162 (  3.5%)  11.7135 (  0.2%)  11.6851 (  0.2%)  bugprone-standalone-empty
   4.4222 (  0.1%)   6.5870 (  3.3%)  11.0092 (  0.2%)  11.0159 (  0.2%)  google-upgrade-googletest-case
   4.3013 (  0.1%)   6.3715 (  3.2%)  10.6727 (  0.2%)  10.6772 (  0.2%)  bugprone-stringview-nullptr
   4.1630 (  0.1%)   6.1288 (  3.1%)  10.2918 (  0.2%)  10.2994 (  0.2%)  modernize-type-traits
   1.1905 (  0.0%)   8.9635 (  4.5%)  10.1540 (  0.2%)  10.1544 (  0.2%)  readability-identifier-naming
   3.6672 (  0.1%)   5.3202 (  2.7%)   8.9874 (  0.1%)   9.0209 (  0.1%)  bugprone-suspicious-string-compare
   7.7269 (  0.1%)   0.8458 (  0.4%)   8.5727 (  0.1%)   8.5746 (  0.1%)  modernize-macro-to-enum
   3.3512 (  0.1%)   4.9307 (  2.5%)   8.2819 (  0.1%)   8.2825 (  0.1%)  performance-move-const-arg
   3.3110 (  0.1%)   4.8247 (  2.4%)   8.1357 (  0.1%)   8.1421 (  0.1%)  modernize-use-nullptr
   3.2855 (  0.1%)   4.7510 (  2.4%)   8.0365 (  0.1%)   8.0638 (  0.1%)  bugprone-assert-side-effect
   3.1923 (  0.1%)   4.6726 (  2.4%)   7.8649 (  0.1%)   7.9634 (  0.1%)  readability-non-const-parameter
   3.1948 (  0.1%)   4.6543 (  2.4%)   7.8491 (  0.1%)   7.8627 (  0.1%)  bugprone-multiple-statement-macro
   3.1812 (  0.1%)   4.6334 (  2.3%)   7.8146 (  0.1%)   7.8233 (  0.1%)  bugprone-suspicious-semicolon
   3.1394 (  0.1%)   4.5716 (  2.3%)   7.7109 (  0.1%)   7.7115 (  0.1%)  bugprone-infinite-loop
   2.9936 (  0.1%)   4.3745 (  2.2%)   7.3681 (  0.1%)   7.3946 (  0.1%)  readability-redundant-control-flow
   3.0067 (  0.1%)   4.3366 (  2.2%)   7.3433 (  0.1%)   7.3469 (  0.1%)  readability-container-size-empty
   2.9848 (  0.0%)   4.3437 (  2.2%)   7.3286 (  0.1%)   7.3451 (  0.1%)  bugprone-unused-return-value
   2.9454 (  0.0%)   4.3885 (  2.2%)   7.3339 (  0.1%)   7.3424 (  0.1%)  misc-unused-using-decls
   2.8416 (  0.0%)   4.1509 (  2.1%)   6.9926 (  0.1%)   6.9975 (  0.1%)  bugprone-unused-raii
   2.8112 (  0.0%)   4.1556 (  2.1%)   6.9668 (  0.1%)   6.9684 (  0.1%)  bugprone-sizeof-expression
   2.5915 (  0.0%)   3.8519 (  1.9%)   6.4434 (  0.1%)   6.4736 (  0.1%)  bugprone-chained-comparison
   2.4902 (  0.0%)   3.7024 (  1.9%)   6.1927 (  0.1%)   6.1979 (  0.1%)  bugprone-inc-dec-in-conditions
   1.9273 (  0.0%)   2.7521 (  1.4%)   4.6794 (  0.1%)   4.6468 (  0.1%)  bugprone-implicit-widening-of-multiplication-result
   1.8187 (  0.0%)   2.5638 (  1.3%)   4.3825 (  0.1%)   4.3604 (  0.1%)  misc-redundant-expression
   2.1165 (  0.0%)   2.2258 (  1.1%)   4.3422 (  0.1%)   4.3366 (  0.1%)  performance-unnecessary-copy-initialization
  &lt;redacted&gt;
  5984.9860 (100.0%)  198.0461 (100.0%)  6183.0321 (100.0%)  6183.2801 (100.0%)  Total

60078 warnings generated.

I've tried to reproduce this behavior with the output from -gen-reproducer=always, but it seems be behaving differently (e.g., finishing w/in a 2 minutes) using the -cc1 options. Even with the original command (minus the include dirs) and with any required paths, I'm not seeing the same behavior. I'm not sure why using the pre-processed file would be functionally different, but it seems to make a large difference here.

For reference, the reproducer compiles in about 46 seconds, and w/ -ftime-trace I see that 42/46 seconds is in the frontend, nearly all of that instantiating functions, an significant portion evaluating R values.

I'm happy to share the reproducer, but since it isn't exhibiting this behavior I'm not sure it has much value. And aside from checking out the Fuchsia source and running a command against it, I'm not sure what I can provide to aid the investigation.

If possible, I'd like to find out:

  1. is this a known problem?
  2. is there a way to more faithfully reproduce the clang-tidy issue?

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 25, 2024

@llvm/issue-subscribers-clang-tidy

Author: Paul Kirth (ilovepi)

We're getting reports from some of our developers that clang-tidy hangs on a certain test file in the Fuchsia source tree (see https://fxbug.dev/330907651).

After some investigation, I found that it will take clang-tidy about 1 hr 40 min to analyze the file. Using --enable-profile I got the following timing information.

===-------------------------------------------------------------------------===
                          clang-tidy checks profiling
===-------------------------------------------------------------------------===
  Total Execution Time: 6183.0321 seconds (6183.2801 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  5859.3510 ( 97.9%)  18.2100 (  9.2%)  5877.5610 ( 95.1%)  5877.7869 ( 95.1%)  google-runtime-int
   5.8892 (  0.1%)   8.2924 (  4.2%)  14.1816 (  0.2%)  14.0830 (  0.2%)  bugprone-use-after-move
   4.6973 (  0.1%)   7.0162 (  3.5%)  11.7135 (  0.2%)  11.6851 (  0.2%)  bugprone-standalone-empty
   4.4222 (  0.1%)   6.5870 (  3.3%)  11.0092 (  0.2%)  11.0159 (  0.2%)  google-upgrade-googletest-case
   4.3013 (  0.1%)   6.3715 (  3.2%)  10.6727 (  0.2%)  10.6772 (  0.2%)  bugprone-stringview-nullptr
   4.1630 (  0.1%)   6.1288 (  3.1%)  10.2918 (  0.2%)  10.2994 (  0.2%)  modernize-type-traits
   1.1905 (  0.0%)   8.9635 (  4.5%)  10.1540 (  0.2%)  10.1544 (  0.2%)  readability-identifier-naming
   3.6672 (  0.1%)   5.3202 (  2.7%)   8.9874 (  0.1%)   9.0209 (  0.1%)  bugprone-suspicious-string-compare
   7.7269 (  0.1%)   0.8458 (  0.4%)   8.5727 (  0.1%)   8.5746 (  0.1%)  modernize-macro-to-enum
   3.3512 (  0.1%)   4.9307 (  2.5%)   8.2819 (  0.1%)   8.2825 (  0.1%)  performance-move-const-arg
   3.3110 (  0.1%)   4.8247 (  2.4%)   8.1357 (  0.1%)   8.1421 (  0.1%)  modernize-use-nullptr
   3.2855 (  0.1%)   4.7510 (  2.4%)   8.0365 (  0.1%)   8.0638 (  0.1%)  bugprone-assert-side-effect
   3.1923 (  0.1%)   4.6726 (  2.4%)   7.8649 (  0.1%)   7.9634 (  0.1%)  readability-non-const-parameter
   3.1948 (  0.1%)   4.6543 (  2.4%)   7.8491 (  0.1%)   7.8627 (  0.1%)  bugprone-multiple-statement-macro
   3.1812 (  0.1%)   4.6334 (  2.3%)   7.8146 (  0.1%)   7.8233 (  0.1%)  bugprone-suspicious-semicolon
   3.1394 (  0.1%)   4.5716 (  2.3%)   7.7109 (  0.1%)   7.7115 (  0.1%)  bugprone-infinite-loop
   2.9936 (  0.1%)   4.3745 (  2.2%)   7.3681 (  0.1%)   7.3946 (  0.1%)  readability-redundant-control-flow
   3.0067 (  0.1%)   4.3366 (  2.2%)   7.3433 (  0.1%)   7.3469 (  0.1%)  readability-container-size-empty
   2.9848 (  0.0%)   4.3437 (  2.2%)   7.3286 (  0.1%)   7.3451 (  0.1%)  bugprone-unused-return-value
   2.9454 (  0.0%)   4.3885 (  2.2%)   7.3339 (  0.1%)   7.3424 (  0.1%)  misc-unused-using-decls
   2.8416 (  0.0%)   4.1509 (  2.1%)   6.9926 (  0.1%)   6.9975 (  0.1%)  bugprone-unused-raii
   2.8112 (  0.0%)   4.1556 (  2.1%)   6.9668 (  0.1%)   6.9684 (  0.1%)  bugprone-sizeof-expression
   2.5915 (  0.0%)   3.8519 (  1.9%)   6.4434 (  0.1%)   6.4736 (  0.1%)  bugprone-chained-comparison
   2.4902 (  0.0%)   3.7024 (  1.9%)   6.1927 (  0.1%)   6.1979 (  0.1%)  bugprone-inc-dec-in-conditions
   1.9273 (  0.0%)   2.7521 (  1.4%)   4.6794 (  0.1%)   4.6468 (  0.1%)  bugprone-implicit-widening-of-multiplication-result
   1.8187 (  0.0%)   2.5638 (  1.3%)   4.3825 (  0.1%)   4.3604 (  0.1%)  misc-redundant-expression
   2.1165 (  0.0%)   2.2258 (  1.1%)   4.3422 (  0.1%)   4.3366 (  0.1%)  performance-unnecessary-copy-initialization
  &lt;redacted&gt;
  5984.9860 (100.0%)  198.0461 (100.0%)  6183.0321 (100.0%)  6183.2801 (100.0%)  Total

60078 warnings generated.

I've tried to reproduce this behavior with the output from -gen-reproducer=always, but it seems be behaving differently (e.g., finishing w/in a 2 minutes) using the -cc1 options. Even with the original command (minus the include dirs) and with any required paths, I'm not seeing the same behavior. I'm not sure why using the pre-processed file would be functionally different, but it seems to make a large difference here.

For reference, the reproducer compiles in about 46 seconds, and w/ -ftime-trace I see that 42/46 seconds is in the frontend, nearly all of that instantiating functions, an significant portion evaluating R values.

I'm happy to share the reproducer, but since it isn't exhibiting this behavior I'm not sure it has much value. And aside from checking out the Fuchsia source and running a command against it, I'm not sure what I can provide to aid the investigation.

If possible, I'd like to find out:

  1. is this a known problem?
  2. is there a way to more faithfully reproduce the clang-tidy issue?

@PiotrZSL
Copy link
Member

PiotrZSL commented Mar 25, 2024

@ilovepi Are you able to attach preprocessed ffl_tests.cc file ?

@ilovepi
Copy link
Contributor Author

ilovepi commented Mar 25, 2024

I'll just include everything. Here's the reproducer and a script. I think the only thing that needs to be changed is the path to clang or clang-tidy, but its possible I missed something else that's only available locally for when using the original driver args.
reproducer.zip

@ilovepi
Copy link
Contributor Author

ilovepi commented Mar 25, 2024

Forgot to mention I included the .clang-tidy file, which may not be obvious, since its hidden by default.

@PiotrZSL PiotrZSL self-assigned this Mar 25, 2024
@PiotrZSL
Copy link
Member

I have a few ideas about what went wrong and how to fix it. It'll likely take me a couple of days, and the fix should be included in Clang-tidy 19 unless you decide to cherry-pick it.

@ilovepi
Copy link
Contributor Author

ilovepi commented Mar 25, 2024

Thanks for looking into this so quickly. We run at ToT, and try to release a new compiler for Fuchsia on a weekly cadence so it isn't a problem for us if it won't be in an official LLVM release for ~6 months. AFAIK this isn't blocking anything but this one file from being linted in our CI. We'd like to get that fixed, but having a patch out for review w/in a week is great in my opinion.

I am curious though as to the underlying mechanism that causes this issue, so if you can CC me on the fix I'd appreciate it.

PiotrZSL added a commit to PiotrZSL/llvm-project that referenced this issue Mar 25, 2024
Main problem with performance of this check is caused by hasAncestor matcher,
and to be more precise by an llvm::DenseSet and std::deque in matchesAnyAncestorOf.

To reduce impact of this matcher, multiple conditions that were
checked in check method were copied into ast matcher that is now checked
before hasAncestor.

Using custom getCheckTraversalKind to exclude template instances that
shouldn't be checked anyway is an additional improvment, but gain
from that one is low.

Tested on ffl_tests.cc, visible reduction from ~442 seconds to
~15 seconds (~96% reduction).

Closes llvm#86553
PiotrZSL added a commit that referenced this issue Mar 26, 2024
Main problem with performance of this check is caused by hasAncestor
matcher, and to be more precise by an llvm::DenseSet and std::deque in
matchesAnyAncestorOf.

To reduce impact of this matcher, multiple conditions that were checked
in check method were copied into AST matcher that is now checked before
hasAncestor.

Using custom getCheckTraversalKind to exclude template instances that
shouldn't be checked anyway is an additional improvement, but gain from
that one is low.

Tested on ffl_tests.cc, visible reduction from ~442 seconds to ~15
seconds (~96% reduction).

Closes #86553
@PiotrZSL
Copy link
Member

I will keep this issue open for a while, as still some performance improvements are possible in this check.

@PiotrZSL PiotrZSL reopened this Mar 26, 2024
@EugeneZelenko EugeneZelenko added performance and removed bug Indicates an unexpected problem or unintended behavior labels Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment