Skip to content

[loop-idiom] GVN fails to remove loads after loop-idiom recognition #11244

@atrick

Description

@atrick
Bugzilla Link 10872
Resolution FIXED
Resolved on Sep 06, 2011 18:34
Version trunk
OS All
Attachments -O3 optimized, -O3 -unroll-scev, test bitcode
CC @asl,@sunfishcode

Extended Description

Test case: SingleSource/Benchmarks/Stanford test.simple.Puzzle on A9 is 14% slower with -unroll-scev.

Running with -O3 optimizes to O3.ll (0.54s)

-unroll-scev produces O3-unroll-scev.ll (0.64s)

These runs were using r138990. -unroll-scev will soon be default, but -disable-unroll-scev will be available.

llc -mcpu=cortex-a9 -relocation-model=pic -disable-fp-elim -disable-non-leaf-fp-elim O3.ll

-unroll-scev exposes more opportunities for memset_pattern, resulting in:

call void @​memset_pattern16(i8* bitcast (i32* getelementptr inbounds ([13 x [512 x i32]]* @​p, i32 0, i32 6, i32 0) to i8*), i8* bitcast ([4 x i32]* @.memset_pattern3 to i8*), i32 12) nounwind

This is fine, but then GVN fails to remove the subsequent loads:

%tmp2.i = load i32* getelementptr inbounds ([13 x i32]* @​piecemax, i32 0, i32 0), align 4, !tbaa !​3

for.end.i: ; preds = %for.inc.i8, %if.then
%tmp14.i = load i32* getelementptr inbounds ([13 x i32]* @​class, i32 0, i32 0), align 4, !tbaa !​3

Removing %tmp2.i exposes lots of constant folding, but it is the removal of %tmp14.i that speeds up the benchmark.

Also rdar://10065079

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugzillaIssues migrated from bugzilla

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions