Please sign in to comment.
Add a loop alignment directive to work around a performance regression.
We found LLVM upstream change at rL310792 degraded zippy benchmark by ~3%. Performance analysis showed the regression was caused by some side-effect. The incidental loop alignment change (from 32 bytes to 16 bytes) led to increase of branch miss prediction and caused the regression. The regression was reproducible on several intel micro-architectures, like sandybridge, haswell and skylake. Sadly we still don't have good understanding about the internal of intel branch predictor and cannot explain how the branch miss prediction increases when the loop alignment changes, so we cannot make a real fix here. The workaround solution in the patch is to add a directive, align the hot loop to 32 bytes, which can restore the performance. This is in order to unblock the flip of default compiler to LLVM.
- Loading branch information...