This is a collection of various functions optimized for armv7 and neon.
The five holy laws
- Never return floating point values by value. It would work fine if
-mfloat-abi=hardwas supported everywhere, but sadly it's not. With the more common
-mfloat-abi=softfp, every time you do a
return my_float_value, it does either a
vstr, followed by a load operation in order to read the result back! Instead, use a non-const reference as first parameter. It allows super smooth inlining of your intermediate results without unnecessary loads and stores, just like it would do if hard floats were available (works for vector types too) !
- Try to minimize loads and stores. Though GCC doesn't support evolved
vstmiaand will generate poor code for operations on
float32x4x4_t, so handcoding them make sense in that case.
- Use vector types everywhere it makes sense. Functions prefixed with
vec4_directly work on
float32x4_t. Those prefixed with
mat44_directly work with
float32x4x4_t. Parameters are passed as references, so the compiler doesn't perform unnecessary ARM register transfers.
- Don't hard-code registers, but use dummy values instead for clobber, and let the compiler allocate registers as needed.
- A good clobber list is an empty clobber list. If you let the compiler handle loads for you, "memory" shouldn't even show up in your clobber list. The only item that might is "cc".
For best performance I usually use the following CFLAGS:
-mthumb -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad -O3 -ffast-math -fomit-frame-pointer -fstrict-aliasing -fgcse-las -funsafe-loop-optimizations -fsee -ftree-vectorize, with
-arch armv7 if it's gcc for iOS or
-march=armv7-a if it's eabi-none-gcc.
Several preprocessor macros, when defined, change the behaviour of the code. See
config-defaults.h for details…