Skip to content

[sve] clang failed to tail folding optimization compare to gcc #63616

@vfdff

Description

@vfdff
real_t s311(struct args_t * func_args)
{
    real_t sum = (real_t)0.;
    for (int i = 0; i < LEN_1D; i++) {
        sum += a[i];
    }

    return sum;
}
  • gcc: use whilelo to fold the tail loop
.L2:
        ld1w    z31.s, p7/z, [x2, x0, lsl 2]
        add     x0, x0, x3
        fadda   s0, p7, s0, z31.s
        whilelo p7.s, w0, w1
        b.any   .L2
  • clang: normal branch for the kernel loop body .LBB0_1
.LBB0_1:                                // =>This Inner Loop Header: Depth=1
        ld1w    { z1.s }, p0/z, [x12, x10, lsl #2]
        add     x10, x10, x9
        cmp     x13, x10
        fadda   s0, p0, s0, z1.s
        b.ne    .LBB0_1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions