Implementation of various math, img processing, etc functions for ARMv7 and NEON
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

ARMv7 Functions

This is a collection of various functions optimized for armv7 and neon.

The five holy laws

  1. Never return floating point values by value. It would work fine if -mfloat-abi=hard was supported everywhere, but sadly it's not. With the more common -mfloat-abi=softfp, every time you do a return my_float_value, it does either a fmrs or a vstr, followed by a load operation in order to read the result back! Instead, use a non-const reference as first parameter. It allows super smooth inlining of your intermediate results without unnecessary loads and stores, just like it would do if hard floats were available (works for vector types too) !
  2. Try to minimize loads and stores. Though GCC doesn't support evolved vldmia/vstmia and will generate poor code for operations on float32x4x4_t, so handcoding them make sense in that case.
  3. Use vector types everywhere it makes sense. Functions prefixed with vec3_ and vec4_ directly work on float32x4_t. Those prefixed with mat44_ directly work with float32x4x4_t. Parameters are passed as references, so the compiler doesn't perform unnecessary ARM register transfers.
  4. Don't hard-code registers, but use dummy values instead for clobber, and let the compiler allocate registers as needed.
  5. A good clobber list is an empty clobber list. If you let the compiler handle loads for you, "memory" shouldn't even show up in your clobber list. The only item that might is "cc".

Compilation flags

For best performance I usually use the following CFLAGS: -mthumb -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad -O3 -ffast-math -fomit-frame-pointer -fstrict-aliasing -fgcse-las -funsafe-loop-optimizations -fsee -ftree-vectorize, with -arch armv7 if it's gcc for iOS or -march=armv7-a if it's eabi-none-gcc.

Preprocessor macros

Several preprocessor macros, when defined, change the behaviour of the code. See config.h and config-defaults.h for details…