-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve documented sampling profiler steps to best known methods #88438
Conversation
1. Add `-fdebug-info-for-profiling -funique-internal-linkage-names`, which improve the usefulness of debug info for profiling. 2. Recommend the use of `br_inst_retired.near_taken:uppp`, which provides the most precise results on supporting hardware. Mention `branches:u` as a more portable backup. Both should portray execution counts better than the default event (`cycles`) and have a better chance of working as an unprivileged user due to the `:u` modifier.
@llvm/pr-subscribers-clang Author: Tim Creech (tcreech-intel) Changes
Full diff: https://github.com/llvm/llvm-project/pull/88438.diff 1 Files Affected:
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index c464bc3a69adc5..818841285cfae5 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2443,13 +2443,15 @@ usual build cycle when using sample profilers for optimization:
usual build flags that you always build your application with. The only
requirement is that DWARF debug info including source line information is
generated. This DWARF information is important for the profiler to be able
- to map instructions back to source line locations.
+ to map instructions back to source line locations. The usefulness of this
+ DWARF information can be improved with the ``-fdebug-info-for-profiling``
+ and ``-funique-internal-linkage-names`` options.
- On Linux, ``-g`` or just ``-gline-tables-only`` is sufficient:
+ On Linux:
.. code-block:: console
- $ clang++ -O2 -gline-tables-only code.cc -o code
+ $ clang++ -O2 -gline-tables-only -fdebug-info-for-profiling -funique-internal-linkage-names code.cc -o code
While MSVC-style targets default to CodeView debug information, DWARF debug
information is required to generate source-level LLVM profiles. Use
@@ -2457,13 +2459,13 @@ usual build cycle when using sample profilers for optimization:
.. code-block:: console
- $ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld -link -debug:dwarf
+ $ clang-cl -O2 -gdwarf -gline-tables-only /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names code.cc -o code -fuse-ld=lld -link -debug:dwarf
2. Run the executable under a sampling profiler. The specific profiler
you use does not really matter, as long as its output can be converted
into the format that the LLVM optimizer understands.
- Two such profilers are the the Linux Perf profiler
+ Two such profilers are the Linux Perf profiler
(https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
available as part of `Intel VTune
<https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html>`_.
@@ -2477,7 +2479,9 @@ usual build cycle when using sample profilers for optimization:
.. code-block:: console
- $ perf record -b ./code
+ $ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code
+
+ If the event above is unavailable, ``branches:u`` is probably next-best.
Note the use of the ``-b`` flag. This tells Perf to use the Last Branch
Record (LBR) to record call chains. While this is not strictly required,
|
@williamweixiao, @HaohaiWen, this updates the docs to describe best practices given #83972. It seems |
to map instructions back to source line locations. | ||
to map instructions back to source line locations. The usefulness of this | ||
DWARF information can be improved with the ``-fdebug-info-for-profiling`` | ||
and ``-funique-internal-linkage-names`` options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using -funique-internal-linkage-names will cause an identifier to be produced based on the source file name passed into the compiler, so if a build system is used that passes the name using absolute paths, compiling the file source.c would result in different unique identifiers depending on whether it is compiled as:
clang /home/dev _foo/source.c …
or
clang/home/dev_bar/source.c …
even when it is the same file, but just using different base directories. This could limit profile data collected when using one workspace being able to be applied to a different workspace because the function names will no longer match, unless there is some behavior added to relax the default function name matching in the SampleProfileLoaderPass. Should a note be added that when using this option the build for data collection needs to be using the same base directory as the feedback run, if absolute paths are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we also need -fdebug-info-for-profiling
and -funique-internal-linkage-names
for step 4 ("-fprofile-sample-use=code.prof") ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @chrulski-intel -- good point. I'll add a brief note.
@williamweixiao, I think you're right that they should match. I'll update those steps.
clang/docs/UsersManual.rst
Outdated
.. code-block:: console | ||
.. code-block:: winbatch | ||
|
||
> clang-cl -O2 -gdwarf -gline-tables-only ^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these commands are using 'clang-cl', would it be better to show and use the options in the native clang-cl format described at line 4557 for consistency instead of mixing slashes and hyphens? i.e. /O2 instead of -O2, and /Fe instead of -o, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I've updated the clang-cl examples to use cl-style forward-slash options when possible. There are still a few cases (-gdwarf -gline-tables-only
) where only the hyphen version is understood, and also some cases (/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names
) where the hyphen version is understood only with /clang:
.
Add
-fdebug-info-for-profiling -funique-internal-linkage-names
, which improve the usefulness of debug info for profiling.Recommend the use of
br_inst_retired.near_taken:uppp
, which provides the most precise results on supporting hardware. Mentionbranches:u
as a more portable backup.Both should portray execution counts better than the default event (
cycles
) and have a better chance of working as an unprivileged user due to the:u
modifier.