Skip to content

Version 1.12

Compare
Choose a tag to compare
@hfp hfp released this 10 May 13:05
· 2794 commits to release since this release

This release aims to improve usability along with resolving several non-critical bugs. Beyond this, an implementation of the BLAS(-like) batched GEMM has been added (?GEMM_BATCH). The interface currently only supports the C/C++ language. However, it can be called implicitly (Fortran 77 like) or used by intercepting existing calls (static and dynamic linkage).

LIBXSMM has an interface for batched GEMMs since several versions supporting pointers as well as arrays of indexes plus Byte-sized strides to extract data from arrays of structures (AoS). The new BLAS interface only supports straight arrays of pointers to operand matrices but allows multiple groups of homogeneous batches. All batch interfaces are implemented in sequential (ST) and multi-threaded (MT) form plus synchronization in case of MT.

INTRODUCED

  • Interface and implementation of batched GEMMs (GEMM_BATCH).
  • Tensorflow wrapper code for LSTM operation.
  • Interceptor for GEMMM_BATCH, and GEMV.

IMPROVEMENTS / CHANGES

  • LSTM: enabled additional tensor formats for Bfloat16.
  • Validated with GNU GCC 9.1 release.

FIXES