Skip to content

Vector constants are not hoisted out of loops when there's a lot of non-vector code around them #160886

@hsivonen

Description

@hsivonen

Summary

An ldr corresponding to materializing const uint8x16_t ZERO_LT_AMP_CR = {0, 2, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 4, 1, 1, 1}; is obviously always loop-invariant. Furthermore, when the whole function has fewer vector values than there are vector registers, there cannot be too much register pressure on the vector registers, and such a constant should be hoisted to the top of the function.

When the function is small, LICM happens properly for such constants.

When the function is large, the ldr for the constant moves to immediately before use, and the load is from static memory.

In between these cases, each constant is loaded at the top of the function and immediately spilled to the stack. Then the constant is reloaded from the stack immediately before use.

Steps to reproduce

  1. Download nsHtml5TokenizerSIMD-expanded.cpp
  2. On Apple Silicon Mac using clang trunk, compile it with clang++ -Wno-everything -isysroot "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.2.sdk" nsHtml5TokenizerSIMD-expanded.cpp -O3 -S
  3. Inspect nsHtml5TokenizerSIMD-expanded.s.
  4. Comment out the line following the comment // Commenting out the following line allows vector constant LICM!
  5. Recompile and inspect the assembly again.

Actual results

On first compilation, the only tbl.16b instruction is preceded by ldr q3, [sp, #32] ; 16-byte Folded Reload.

On the second compilation, the ldr has moved upwards out of the loops.

Expected results

Expected vector constants always to be hoisted out of loops when there aren't more live vector values than there are vector registers regardless of how much ALU code exists in the overall function.

Additional info

I tried flipping the various boolean return values in MachineLICM.cpp to see if one of them was at fault, but I didn't find the cause.

Compiling on Mac differs from Compiler Explorer's armv8-a target. LLVM says it used the following on Mac:

Features:+aes,+altnzcv,+ccdp,+ccidx,+ccpp,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fptoint,+fullfp16,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+specrestrict,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8a
CPU:apple-m1
TuneCPU:apple-m1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions