-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Description
Branch profiles, aka Last Branch Record (LBR) profiles provide a sample-based profile of branches taken. LBR hardware provides a snapshot of the last N taken branches at each sample point. Subsequent records can be combined to provide basic block execution counts as well as branch counts. e.g., one sample could be: basic block executed from startpc to endpc and then jumped to dstpc.
Today we use CPU profiles for PGO because they are easy to collect (no special hardware support required), while LBR profiles require special hardware support (notably, not exposed by most major cloud VM providers). However, LBR profiles can often be better than CPU profiles for PGO. For instance:
-
Iterative stability: a PGO optimization that reduces the cost of a call will make that call use less CPU and thus get fewer samples in the next CPU profile. This could lead to the next compile not identifying that code as hot anymore and no longer performing the optimization. On the other hand, an LBR profile will report the same number of calls despite the optimizations.
-
While CPU cycles and number of calls/executions are generally correlated, there can be significant skew particularly on basic block level optimizations. Basic blocks tend not to have many instructions, so more expensive instructions in a basic block (e.g., MUL vs ADD) can skew results making the MUL block look hotter even if the ADD block is executed more often.
One blocker to native LBR support in the compiler is that the pprof format has no canonical way of encoding LBR samples. We either need to pick a custom interpretation that we recognize, pprof should add an official form, or we use a different file format entirely (such as LLVM's PGO format).
cc @cherrymui @aclements @rajbarik @jinlin-bayarea @hoeppi-google
Metadata
Metadata
Assignees
Labels
Type
Projects
Status