Skip to content

Commit

Permalink
Align the base when doing strided loads from constant addresses
Browse files Browse the repository at this point in the history
When we codegen something like f[ramp(x + 1, 2, 16)], where f is an
internal allocation, we subtract the 1, do the dense load f[ramp(x, 1,
32)] and then take the odd lanes of the result. The reason for this is
that it's likely that there's an f[ramp(x, 2, 16)] nearby, and aligning
down the x+1 to x means we can share the dense loads and just
deinterleave.

This PR does the same when there's no x, just an odd constant. This
means that cases like f[ramp(64, 2, 16)] + f[ramp(65, 2, 16)] now
generate much better assembly. In one case I have it speeds up an entire
pipeline by 8%, because aligning the loads in this way causes them to
all be promoted off the stack into registers.
  • Loading branch information
abadams committed Nov 29, 2020
1 parent bfbfacd commit ed529e0
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/CodeGen_LLVM.cpp
Expand Up @@ -2093,7 +2093,7 @@ void CodeGen_LLVM::visit(const Load *op) {
// and do a different shuffle. This helps expressions like
// (f(2*x) + f(2*x+1) share loads
const Add *add = ramp->base.as<Add>();
const IntImm *offset = add ? add->b.as<IntImm>() : nullptr;
const IntImm *offset = add ? add->b.as<IntImm>() : ramp->base.as<IntImm>();
if (offset && offset->value & 1) {
base_a -= 1;
align_a = align_a - 1;
Expand Down

0 comments on commit ed529e0

Please sign in to comment.