Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documented sampling profiler steps to best known methods #88438

Merged
merged 4 commits into from
Apr 29, 2024

Conversation

tcreech-intel
Copy link
Contributor

  1. Add -fdebug-info-for-profiling -funique-internal-linkage-names, which improve the usefulness of debug info for profiling.

  2. Recommend the use of br_inst_retired.near_taken:uppp, which provides the most precise results on supporting hardware. Mention branches:u as a more portable backup.

    Both should portray execution counts better than the default event (cycles) and have a better chance of working as an unprivileged user due to the :u modifier.

1. Add `-fdebug-info-for-profiling -funique-internal-linkage-names`,
   which improve the usefulness of debug info for profiling.

2. Recommend the use of `br_inst_retired.near_taken:uppp`, which
   provides the most precise results on supporting hardware.  Mention
   `branches:u` as a more portable backup.

   Both should portray execution counts better than the default event
   (`cycles`) and have a better chance of working as an unprivileged
   user due to the `:u` modifier.
@tcreech-intel tcreech-intel marked this pull request as ready for review April 11, 2024 20:56
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Apr 11, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 11, 2024

@llvm/pr-subscribers-clang

Author: Tim Creech (tcreech-intel)

Changes
  1. Add -fdebug-info-for-profiling -funique-internal-linkage-names, which improve the usefulness of debug info for profiling.

  2. Recommend the use of br_inst_retired.near_taken:uppp, which provides the most precise results on supporting hardware. Mention branches:u as a more portable backup.

    Both should portray execution counts better than the default event (cycles) and have a better chance of working as an unprivileged user due to the :u modifier.


Full diff: https://github.com/llvm/llvm-project/pull/88438.diff

1 Files Affected:

  • (modified) clang/docs/UsersManual.rst (+10-6)
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index c464bc3a69adc5..818841285cfae5 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2443,13 +2443,15 @@ usual build cycle when using sample profilers for optimization:
    usual build flags that you always build your application with. The only
    requirement is that DWARF debug info including source line information is
    generated. This DWARF information is important for the profiler to be able
-   to map instructions back to source line locations.
+   to map instructions back to source line locations. The usefulness of this
+   DWARF information can be improved with the ``-fdebug-info-for-profiling``
+   and ``-funique-internal-linkage-names`` options.
 
-   On Linux, ``-g`` or just ``-gline-tables-only`` is sufficient:
+   On Linux:
 
    .. code-block:: console
 
-     $ clang++ -O2 -gline-tables-only code.cc -o code
+     $ clang++ -O2 -gline-tables-only -fdebug-info-for-profiling -funique-internal-linkage-names code.cc -o code
 
    While MSVC-style targets default to CodeView debug information, DWARF debug
    information is required to generate source-level LLVM profiles. Use
@@ -2457,13 +2459,13 @@ usual build cycle when using sample profilers for optimization:
 
    .. code-block:: console
 
-     $ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld -link -debug:dwarf
+     $ clang-cl -O2 -gdwarf -gline-tables-only /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names code.cc -o code -fuse-ld=lld -link -debug:dwarf
 
 2. Run the executable under a sampling profiler. The specific profiler
    you use does not really matter, as long as its output can be converted
    into the format that the LLVM optimizer understands.
 
-   Two such profilers are the the Linux Perf profiler
+   Two such profilers are the Linux Perf profiler
    (https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
    available as part of `Intel VTune
    <https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html>`_.
@@ -2477,7 +2479,9 @@ usual build cycle when using sample profilers for optimization:
 
    .. code-block:: console
 
-     $ perf record -b ./code
+     $ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code
+
+   If the event above is unavailable, ``branches:u`` is probably next-best.
 
    Note the use of the ``-b`` flag. This tells Perf to use the Last Branch
    Record (LBR) to record call chains. While this is not strictly required,

@tcreech-intel
Copy link
Contributor Author

@williamweixiao, @HaohaiWen, this updates the docs to describe best practices given #83972.

It seems -fdebug-info-for-profiling can be particularly important. Without it we were discarding nearly half of the samples in some cases.

to map instructions back to source line locations.
to map instructions back to source line locations. The usefulness of this
DWARF information can be improved with the ``-fdebug-info-for-profiling``
and ``-funique-internal-linkage-names`` options.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using -funique-internal-linkage-names will cause an identifier to be produced based on the source file name passed into the compiler, so if a build system is used that passes the name using absolute paths, compiling the file source.c would result in different unique identifiers depending on whether it is compiled as:
clang /home/dev _foo/source.c …
or
clang/home/dev_bar/source.c …

even when it is the same file, but just using different base directories. This could limit profile data collected when using one workspace being able to be applied to a different workspace because the function names will no longer match, unless there is some behavior added to relax the default function name matching in the SampleProfileLoaderPass. Should a note be added that when using this option the build for data collection needs to be using the same base directory as the feedback run, if absolute paths are used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we also need -fdebug-info-for-profiling and -funique-internal-linkage-names for step 4 ("-fprofile-sample-use=code.prof") ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @chrulski-intel -- good point. I'll add a brief note.

@williamweixiao, I think you're right that they should match. I'll update those steps.

@williamweixiao williamweixiao self-requested a review April 19, 2024 09:01
.. code-block:: console
.. code-block:: winbatch

> clang-cl -O2 -gdwarf -gline-tables-only ^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these commands are using 'clang-cl', would it be better to show and use the options in the native clang-cl format described at line 4557 for consistency instead of mixing slashes and hyphens? i.e. /O2 instead of -O2, and /Fe instead of -o, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I've updated the clang-cl examples to use cl-style forward-slash options when possible. There are still a few cases (-gdwarf -gline-tables-only) where only the hyphen version is understood, and also some cases (/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names) where the hyphen version is understood only with /clang:.

@williamweixiao williamweixiao merged commit 66274eb into llvm:main Apr 29, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants