Skip to content

Commit

Permalink
[misexpect] Re-implement MisExpect Diagnostics
Browse files Browse the repository at this point in the history
Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.

New checks rely on 2 invariants:

1) For frontend instrumentation, MD_prof branch_weights will always be
   populated before llvm.expect intrinsics are lowered.

2) for IR and sample profiling, llvm.expect intrinsics will always be
   lowered before branch_weights are populated from the IR profiles.

These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.

Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D115907
  • Loading branch information
ilovepi committed Mar 28, 2022
1 parent 42d3d71 commit 2add3fb
Show file tree
Hide file tree
Showing 49 changed files with 2,043 additions and 2 deletions.
75 changes: 75 additions & 0 deletions clang/docs/MisExpect.rst
@@ -0,0 +1,75 @@
===================
Misexpect
===================
.. contents::

.. toctree::
:maxdepth: 1

When developers use ``llvm.expect`` intrinsics, i.e., through use of
``__builtin_expect(...)``, they are trying to communicate how their code is
expected to behave at runtime to the optimizer. These annotations, however, can
be incorrect for a variety of reasons: changes to the code base invalidate them
silently, the developer mis-annotated them (e.g., using ``LIKELY`` instead of
``UNLIKELY``), or perhaps they assumed something incorrectly when they wrote
the annotation. Regardless of why, it is useful to detect these situations so
that the optimizer can make more useful decisions about the code.

MisExpect diagnostics are intended to help developers identify and address
these situations, by comparing the branch weights added by the ``llvm.expect``
intrinsic to those collected through profiling. Whenever these values are
mismatched, a diagnostic is surfaced to the user. Details on how the checks
operate in the LLVM backed can be found in LLVM's documentation.

By default MisExpect checking is quite strict, because the use of the
``llvm.expect`` intrinsic is designed for specialized cases, where the outcome
of a condition is severely skewed. As a result, the optimizer can be extremely
aggressive, which can result in performance degradation if the outcome is less
predictable than the annotation suggests. Even when the annotation is correct
90% of the time, it may be beneficial to either remove the annotation or to use
a different intrinsic that can communicate the probability more directly.

Because this may be too strict, MisExpect diagnostics are not enabled by
default, and support an additional flag to tolerate some deviation from the
exact thresholds. The ``-fdiagnostic-misexpect-tolerance=N`` accepts
deviations when comparing branch weights within ``N%`` of the expected values.
So passing ``-fdiagnostic-misexpect-tolerance=5`` will not report diagnostic messages
if the branch weight from the profile is within 5% of the weight added by
the ``llvm.expect`` intrinsic.

MisExpect diagnostics are also available in the form of optimization remarks,
which can be serialized and processed through the ``opt-viewer.py``
scripts in LLVM.

.. option:: -Rpass=misexpect

Enables optimization remarks for misexpect when profiling data conflicts with
use of ``llvm.expect`` intrinsics.


.. option:: -Wmisexpect

Enables misexpect warnings when profiling data conflicts with use of
``llvm.expect`` intrinsics.

.. option:: -fdiagnostic-misexpect-tolerance=N

Relaxes misexpect checking to tolerate profiling values within N% of the
expected branch weight. e.g., a value of ``N=5`` allows misexpect to check against
``0.95 * Threshold``

LLVM supports 4 types of profile formats: Frontend, IR, CS-IR, and
Sampling. MisExpect Diagnostics are compatible with all Profiling formats.

+----------------+--------------------------------------------------------------------------------------+
| Profile Type | Description |
+================+======================================================================================+
| Frontend | Profiling instrumentation added during compilation by the frontend, i.e. ``clang`` |
+----------------+--------------------------------------------------------------------------------------+
| IR | Profiling instrumentation added during by the LLVM backend |
+----------------+--------------------------------------------------------------------------------------+
| CS-IR | Context Sensitive IR based profiles |
+----------------+--------------------------------------------------------------------------------------+
| Sampling | Profiles collected through sampling with external tools, such as ``perf`` on Linux |
+----------------+--------------------------------------------------------------------------------------+

3 changes: 3 additions & 0 deletions clang/docs/ReleaseNotes.rst
Expand Up @@ -105,6 +105,9 @@ Improvements to Clang's diagnostics
- ``-Wunused-but-set-variable`` now also warns if the variable is only used
by unary operators.

- ``-Wmisexpect`` warns when the branch weights collected during profiling
conflict with those added by ``llvm.expect``.

Non-comprehensive list of changes in this release
-------------------------------------------------
- The builtin function __builtin_dump_struct would crash clang when the target
Expand Down
1 change: 1 addition & 0 deletions clang/include/clang/Basic/CodeGenOptions.def
Expand Up @@ -175,6 +175,7 @@ CODEGENOPT(NoExecStack , 1, 0) ///< Set when -Wa,--noexecstack is enabled.
CODEGENOPT(FatalWarnings , 1, 0) ///< Set when -Wa,--fatal-warnings is
///< enabled.
CODEGENOPT(NoWarn , 1, 0) ///< Set when -Wa,--no-warn is enabled.
CODEGENOPT(MisExpect , 1, 0) ///< Set when -Wmisexpect is enabled
CODEGENOPT(EnableSegmentedStacks , 1, 0) ///< Set when -fsplit-stack is enabled.
CODEGENOPT(NoInlineLineTables, 1, 0) ///< Whether debug info should contain
///< inline line tables.
Expand Down
4 changes: 4 additions & 0 deletions clang/include/clang/Basic/CodeGenOptions.h
Expand Up @@ -420,6 +420,10 @@ class CodeGenOptions : public CodeGenOptionsBase {
/// If threshold option is not specified, it is disabled by default.
Optional<uint64_t> DiagnosticsHotnessThreshold = 0;

/// The maximum percentage profiling weights can deviate from the expected
/// values in order to be included in misexpect diagnostics.
Optional<uint64_t> DiagnosticsMisExpectTolerance = 0;

public:
// Define accessors/mutators for code generation options of enumeration type.
#define CODEGENOPT(Name, Bits, Default)
Expand Down
5 changes: 5 additions & 0 deletions clang/include/clang/Basic/DiagnosticDriverKinds.td
Expand Up @@ -149,6 +149,8 @@ def err_drv_invalid_darwin_version : Error<
"invalid Darwin version number: %0">;
def err_drv_invalid_diagnotics_hotness_threshold : Error<
"invalid argument in '%0', only integer or 'auto' is supported">;
def err_drv_invalid_diagnotics_misexpect_tolerance : Error<
"invalid argument in '%0', only integers are supported">;
def err_drv_missing_argument : Error<
"argument to '%0' is missing (expected %1 value%s1)">;
def err_drv_invalid_Xarch_argument_with_args : Error<
Expand Down Expand Up @@ -376,6 +378,9 @@ def warn_drv_empty_joined_argument : Warning<
def warn_drv_diagnostics_hotness_requires_pgo : Warning<
"argument '%0' requires profile-guided optimization information">,
InGroup<UnusedCommandLineArgument>;
def warn_drv_diagnostics_misexpect_requires_pgo : Warning<
"argument '%0' requires profile-guided optimization information">,
InGroup<UnusedCommandLineArgument>;
def warn_drv_clang_unsupported : Warning<
"the clang compiler does not support '%0'">;
def warn_drv_deprecated_arg : Warning<
Expand Down
5 changes: 5 additions & 0 deletions clang/include/clang/Basic/DiagnosticFrontendKinds.td
Expand Up @@ -311,6 +311,11 @@ def warn_profile_data_missing : Warning<
def warn_profile_data_unprofiled : Warning<
"no profile data available for file \"%0\"">,
InGroup<ProfileInstrUnprofiled>;
def warn_profile_data_misexpect : Warning<
"Potential performance regression from use of __builtin_expect(): "
"Annotation was correct on %0 of profiled executions.">,
BackendInfo,
InGroup<MisExpect>;
} // end of instrumentation issue category

}
1 change: 1 addition & 0 deletions clang/include/clang/Basic/DiagnosticGroups.td
Expand Up @@ -1254,6 +1254,7 @@ def BackendWarningAttributes : DiagGroup<"attribute-warning">;
def ProfileInstrMissing : DiagGroup<"profile-instr-missing">;
def ProfileInstrOutOfDate : DiagGroup<"profile-instr-out-of-date">;
def ProfileInstrUnprofiled : DiagGroup<"profile-instr-unprofiled">;
def MisExpect : DiagGroup<"misexpect">;

// AddressSanitizer frontend instrumentation remarks.
def SanitizeAddressRemarks : DiagGroup<"sanitize-address">;
Expand Down
3 changes: 3 additions & 0 deletions clang/include/clang/Driver/Options.td
Expand Up @@ -1426,6 +1426,9 @@ def fdiagnostics_hotness_threshold_EQ : Joined<["-"], "fdiagnostics-hotness-thre
Group<f_Group>, Flags<[CC1Option]>, MetaVarName<"<value>">,
HelpText<"Prevent optimization remarks from being output if they do not have at least this profile count. "
"Use 'auto' to apply the threshold from profile summary">;
def fdiagnostics_misexpect_tolerance_EQ : Joined<["-"], "fdiagnostics-misexpect-tolerance=">,
Group<f_Group>, Flags<[CC1Option]>, MetaVarName<"<value>">,
HelpText<"Prevent misexpect diagnostics from being output if the profile counts are within N% of the expected. ">;
defm diagnostics_show_option : BoolFOption<"diagnostics-show-option",
DiagnosticOpts<"ShowOptionNames">, DefaultTrue,
NegFlag<SetFalse, [CC1Option]>, PosFlag<SetTrue, [], "Print option name with mappable diagnostics">>;
Expand Down
1 change: 1 addition & 0 deletions clang/lib/CodeGen/BackendUtil.cpp
Expand Up @@ -650,6 +650,7 @@ static bool initTargetOptions(DiagnosticsEngine &Diags,
Entry.IgnoreSysRoot ? Entry.Path : HSOpts.Sysroot + Entry.Path);
Options.MCOptions.Argv0 = CodeGenOpts.Argv0;
Options.MCOptions.CommandLineArgs = CodeGenOpts.CommandLineArgs;
Options.MisExpect = CodeGenOpts.MisExpect;

return true;
}
Expand Down
34 changes: 34 additions & 0 deletions clang/lib/CodeGen/CodeGenAction.cpp
Expand Up @@ -340,6 +340,15 @@ namespace clang {
CodeGenOpts.getProfileUse() != CodeGenOptions::ProfileNone)
Ctx.setDiagnosticsHotnessRequested(true);

if (CodeGenOpts.MisExpect) {
Ctx.setMisExpectWarningRequested(true);
}

if (CodeGenOpts.DiagnosticsMisExpectTolerance) {
Ctx.setDiagnosticsMisExpectTolerance(
CodeGenOpts.DiagnosticsMisExpectTolerance);
}

// Link each LinkModule into our module.
if (LinkInModules())
return;
Expand Down Expand Up @@ -440,6 +449,9 @@ namespace clang {
void OptimizationFailureHandler(
const llvm::DiagnosticInfoOptimizationFailure &D);
void DontCallDiagHandler(const DiagnosticInfoDontCall &D);
/// Specialized handler for misexpect warnings.
/// Note that misexpect remarks are emitted through ORE
void MisExpectDiagHandler(const llvm::DiagnosticInfoMisExpect &D);
};

void BackendConsumer::anchor() {}
Expand Down Expand Up @@ -821,6 +833,25 @@ void BackendConsumer::DontCallDiagHandler(const DiagnosticInfoDontCall &D) {
<< llvm::demangle(D.getFunctionName().str()) << D.getNote();
}

void BackendConsumer::MisExpectDiagHandler(
const llvm::DiagnosticInfoMisExpect &D) {
StringRef Filename;
unsigned Line, Column;
bool BadDebugInfo = false;
FullSourceLoc Loc =
getBestLocationFromDebugLoc(D, BadDebugInfo, Filename, Line, Column);

Diags.Report(Loc, diag::warn_profile_data_misexpect) << D.getMsg().str();

if (BadDebugInfo)
// If we were not able to translate the file:line:col information
// back to a SourceLocation, at least emit a note stating that
// we could not translate this location. This can happen in the
// case of #line directives.
Diags.Report(Loc, diag::note_fe_backend_invalid_loc)
<< Filename << Line << Column;
}

/// This function is invoked when the backend needs
/// to report something to the user.
void BackendConsumer::DiagnosticHandlerImpl(const DiagnosticInfo &DI) {
Expand Down Expand Up @@ -895,6 +926,9 @@ void BackendConsumer::DiagnosticHandlerImpl(const DiagnosticInfo &DI) {
case llvm::DK_DontCall:
DontCallDiagHandler(cast<DiagnosticInfoDontCall>(DI));
return;
case llvm::DK_MisExpect:
MisExpectDiagHandler(cast<DiagnosticInfoMisExpect>(DI));
return;
default:
// Plugin IDs are not bound to any value as they are set dynamically.
ComputeDiagRemarkID(Severity, backend_plugin, DiagID);
Expand Down
41 changes: 41 additions & 0 deletions clang/lib/Frontend/CompilerInvocation.cpp
Expand Up @@ -106,6 +106,20 @@ using namespace driver;
using namespace options;
using namespace llvm::opt;

//===----------------------------------------------------------------------===//
// Helpers.
//===----------------------------------------------------------------------===//

// Parse misexpect tolerance argument value.
// Valid option values are integers in the range [0, 100)
inline Expected<Optional<uint64_t>> parseToleranceOption(StringRef Arg) {
int64_t Val;
if (Arg.getAsInteger(10, Val))
return llvm::createStringError(llvm::inconvertibleErrorCode(),
"Not an integer: %s", Arg.data());
return Val;
}

//===----------------------------------------------------------------------===//
// Initialization.
//===----------------------------------------------------------------------===//
Expand Down Expand Up @@ -1535,6 +1549,9 @@ void CompilerInvocation::GenerateCodeGenArgs(
: "auto",
SA);

GenerateArg(Args, OPT_fdiagnostics_misexpect_tolerance_EQ,
Twine(*Opts.DiagnosticsMisExpectTolerance), SA);

for (StringRef Sanitizer : serializeSanitizerKinds(Opts.SanitizeRecover))
GenerateArg(Args, OPT_fsanitize_recover_EQ, Sanitizer, SA);

Expand Down Expand Up @@ -1952,6 +1969,23 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args,
}
}

if (auto *arg =
Args.getLastArg(options::OPT_fdiagnostics_misexpect_tolerance_EQ)) {
auto ResultOrErr = parseToleranceOption(arg->getValue());

if (!ResultOrErr) {
Diags.Report(diag::err_drv_invalid_diagnotics_misexpect_tolerance)
<< "-fdiagnostics-misexpect-tolerance=";
} else {
Opts.DiagnosticsMisExpectTolerance = *ResultOrErr;
if ((!Opts.DiagnosticsMisExpectTolerance.hasValue() ||
Opts.DiagnosticsMisExpectTolerance.getValue() > 0) &&
!UsingProfile)
Diags.Report(diag::warn_drv_diagnostics_misexpect_requires_pgo)
<< "-fdiagnostics-misexpect-tolerance=";
}
}

// If the user requested to use a sample profile for PGO, then the
// backend will need to track source location information so the profile
// can be incorporated into the IR.
Expand Down Expand Up @@ -4577,6 +4611,13 @@ bool CompilerInvocation::CreateFromArgsImpl(
if (Res.getFrontendOpts().ProgramAction == frontend::RewriteObjC)
LangOpts.ObjCExceptions = 1;

for (auto Warning : Res.getDiagnosticOpts().Warnings) {
if (Warning == "misexpect" &&
!Diags.isIgnored(diag::warn_profile_data_misexpect, SourceLocation())) {
Res.getCodeGenOpts().MisExpect = true;
}
}

if (LangOpts.CUDA) {
// During CUDA device-side compilation, the aux triple is the
// triple used for host compilation.
Expand Down
@@ -0,0 +1,8 @@
bar
# Func Hash:
11262309464
# Num Counters:
2
# Counter Values:
200000
2
17 changes: 17 additions & 0 deletions clang/test/Profile/Inputs/misexpect-branch.proftext
@@ -0,0 +1,17 @@
bar
# Func Hash:
45795613684824
# Num Counters:
2
# Counter Values:
200000
180000

fizz
# Func Hash:
45795613684824
# Num Counters:
2
# Counter Values:
200000
18000
12 changes: 12 additions & 0 deletions clang/test/Profile/Inputs/misexpect-switch-default-only.proftext
@@ -0,0 +1,12 @@
main
# Func Hash:
79676873694057560
# Num Counters:
5
# Counter Values:
1
20
20000
20000
20000

16 changes: 16 additions & 0 deletions clang/test/Profile/Inputs/misexpect-switch-default.proftext
@@ -0,0 +1,16 @@
main
# Func Hash:
8734802134600123338
# Num Counters:
9
# Counter Values:
1
20000
20000
4066
11889
0
0
4045
0

16 changes: 16 additions & 0 deletions clang/test/Profile/Inputs/misexpect-switch-nonconst.proftext
@@ -0,0 +1,16 @@
main
# Func Hash:
3721743393642630379
# Num Counters:
10
# Counter Values:
1
20
20000
20000
1
0
0
0
19999
0
32 changes: 32 additions & 0 deletions clang/test/Profile/Inputs/misexpect-switch.proftext
@@ -0,0 +1,32 @@
main
# Func Hash:
872687477373597607
# Num Counters:
9
# Counter Values:
1
20
20000
20000
3
3
1
3
19990

random_sample
# Func Hash:
24
# Num Counters:
1
# Counter Values:
19990

sum
# Func Hash:
24
# Num Counters:
1
# Counter Values:
3

0 comments on commit 2add3fb

Please sign in to comment.