Parallelization

kaskr edited this page Jun 25, 2016 · 5 revisions

BLAS

TMB uses the following BLAS kernels when calculating function value and derivatives

Function Gradient
dgemm dgemm
dsyrk dsymm
dtrsm dtrsm
dpotrf dpotri

If your model spends a significant amount of time in these BLAS operations you may benefit from an optimized BLAS library e.g. MKL or OpenBLAS for CPU or nvblas for GPU. For a good result it's critical that

  1. All required BLAS kernels are part of the library (currently not the case for nvblas ? ).
  2. The library should not add significant overhead for small matrices (OPENBLAS have had problems - is it still the case ? ).