-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
| Bugzilla Link | 10872 |
| Resolution | FIXED |
| Resolved on | Sep 06, 2011 18:34 |
| Version | trunk |
| OS | All |
| Attachments | -O3 optimized, -O3 -unroll-scev, test bitcode |
| CC | @asl,@sunfishcode |
Extended Description
Test case: SingleSource/Benchmarks/Stanford test.simple.Puzzle on A9 is 14% slower with -unroll-scev.
Running with -O3 optimizes to O3.ll (0.54s)
-unroll-scev produces O3-unroll-scev.ll (0.64s)
These runs were using r138990. -unroll-scev will soon be default, but -disable-unroll-scev will be available.
llc -mcpu=cortex-a9 -relocation-model=pic -disable-fp-elim -disable-non-leaf-fp-elim O3.ll
-unroll-scev exposes more opportunities for memset_pattern, resulting in:
call void @memset_pattern16(i8* bitcast (i32* getelementptr inbounds ([13 x [512 x i32]]* @p, i32 0, i32 6, i32 0) to i8*), i8* bitcast ([4 x i32]* @.memset_pattern3 to i8*), i32 12) nounwind
This is fine, but then GVN fails to remove the subsequent loads:
%tmp2.i = load i32* getelementptr inbounds ([13 x i32]* @piecemax, i32 0, i32 0), align 4, !tbaa !3
for.end.i: ; preds = %for.inc.i8, %if.then
%tmp14.i = load i32* getelementptr inbounds ([13 x i32]* @class, i32 0, i32 0), align 4, !tbaa !3
Removing %tmp2.i exposes lots of constant folding, but it is the removal of %tmp14.i that speeds up the benchmark.
Also rdar://10065079