Skip to content

Performance tips

Roman edited this page Jun 3, 2020 · 14 revisions

Performance tips

Here are a few tips to keep in mind while writing code that uses Fastor

Turn on compiler optimisation

For fast code you need to ensure you first turn on compiler optimisation using the -O2/-O3 (-O3 in particular) compiler flag or /O2 under Visual Studio.

Use NDEBUG to compile your code

Fastor uses a lot sanity checks under the hood to verify the validity and consistency of expressions when they are being assigned from one to other, specifically under debug mode. The use of the compiler flag -DNDEBUG or /D:NDEBUG is highly recommended if you want fast code. This flag is activated by default under release mode by most compilers.

Activate SIMD vectorisation

Fastor can vectorise almost all operations using the CPU's vector instructions (also called SIMD intrinsics). This results in substantial performance improvements and you need to make sure to activate the appropriate SIMD vectorsation that your CPU supports through a compiler flag for instance, -msse2/-mavx/-mavx2/-mfma with GNU based compiler or /arch:[sse2,avx,avx512] with Visual Studio. In most cases while compiling with GCC/Clang/Intel providing the -march=native flag is sufficient.

Control compiler's inlining heuristics

If your code uses a lot of complex Fastor expressions it is beneficial to force the compiler to aggressively inline your functions by using the following additional compiler flags

GCC

-finline-limit=<a high value>

Clang

-llvm -inline-threshold=<a high value>

Intel

-inline-forceinline 

MSVC

/Ob2

Avoid passing by value

Avoid passing Fastor tensors by value to functions as this will make unnecessary and expensive copies. Instead always pass by reference. So if your function looks like this

void foo(Tensor<T,M,N> a) { .... }

you should change it to

void foo(const Tensor<T,M,N> &a) { .... }

Avoid assigning tensors of different order

This is not a performance bottleneck but rather something to keep in mind if you want utmost performance. While working with views, you should try to avoid assigning tensors of different, for instance

Tensor<double,3,4,5> a; 
Tensor<double,3,2> b;

// Assigning part of a third order tensor to a second order tensor
b(all,1) = a(all,2,0);

While understandably this syntax is quite convenient (and you should use it) keep in mind that in such cases Fastor has to create two set of offsets for the two tensors in order extract the part from one tensor and assign it to another tensor (of different order). This does not impact the performance much and in most cases you should not expect a degraded performance. However, in certain cases the compiler will simply give up doing too much work. A rather verbose workflow for this is to use the TensorMap feature to map your tensors to the same order first before assigning them for instance, in the above example you can do

Tensor<double,3,4,5> a; 
Tensor<double,3,2> b;
// map/pomote b to a third order tensor first - this does not copy b
TensorMap<double,3,2,1> bmap(b); 

// Now, assign a to bmap instead - this will also change b
bmap(all,1,0) = a(all,2,0);

In this case Fastor knows that both the left and the right hand sides are the same order and skips unnecessary offset computations.