Skip to content

bad vector codegen on loops with wide accumulators #168775

@XeroOl

Description

@XeroOl
#include <stdint.h>

uint64_t example_1(uint64_t len, const unsigned char* input) {
    uint64_t total = 0;
    for (uint64_t i = 0; i < len; i++) {
        // example computation to be vectorized
        unsigned char output = input[i] ^ 0x07;
        // accumulator
        total += output;
    }
    return total;
}

The loop vectorizer optimizes the above function very poorly: it chooses a vectorization width of 2, when it should be able to use a much higher vectorization width, ie 16.

If you pick a narrower accumulator (ie, change the type of total to uint8_t), the vectorizer will choose a high width as expected.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions