Browse files

Add a loop alignment directive to work around a performance regression.

We found LLVM upstream change at rL310792 degraded zippy benchmark by
~3%. Performance analysis showed the regression was caused by some
side-effect. The incidental loop alignment change (from 32 bytes to 16
bytes) led to increase of branch miss prediction and caused the
regression. The regression was reproducible on several intel
micro-architectures, like sandybridge, haswell and skylake. Sadly we
still don't have good understanding about the internal of intel branch
predictor and cannot explain how the branch miss prediction increases
when the loop alignment changes, so we cannot make a real fix here. The
workaround solution in the patch is to add a directive, align the hot
loop to 32 bytes, which can restore the performance. This is in order to
unblock the flip of default compiler to LLVM.
  • Loading branch information...
wmi authored and pwnall committed Aug 21, 2017
1 parent 55924d1 commit 824e6718b5b5a50d32a89124853da0a11828b25c
Showing with 7 additions and 0 deletions.
  1. +7 −0
@@ -685,6 +685,13 @@ class SnappyDecompressor {
// Add loop alignment directive. Without this directive, we observed
// significant performance degradation on several intel architectures
// in snappy benchmark built with LLVM. The degradation was caused by
// increased branch miss prediction.
#if defined(__clang__) && defined(__x86_64__)
asm volatile (".p2align 5");
for ( ;; ) {
const unsigned char c = *(reinterpret_cast<const unsigned char*>(ip++));

0 comments on commit 824e671

Please sign in to comment.