Skip to content

@vpirogov vpirogov released this Oct 22, 2019 · 9 commits to rls-v1.1 since this release

This is a patch release containing following changes to v1.1:

  • Fixed zero padding for memory formats with rank 3 and below (f97e174)
  • Fixed 'deprecated std::copy' warning with Microsoft C++ Compiler (ee276af)
  • Fixed tail scaling for int8 inner product (f2b68c7)
  • Fixed correctness issue for int8 GEMM with N=1 (0dd5c13)
  • Sum does not override the data type for destination memory descriptor when used with any (5301981)
  • Addressed following corner cases in CPU convolution implementation:
    • Fixed tail processing in int8 depthwise convolution (7711b77)
    • Fixed bias padding in bfloat16 depthwise convolution (0696ba6)
    • Fixed correctness issue in s8s8 flavor of depthwise convolution (b614482)
    • Fixed correctness issue in dilated convolution weight gradient implementation (c6ec0f9)
Assets 10

@vpirogov vpirogov released this Oct 22, 2019 · 2 commits to rls-v1.0 since this release

This is a patch release containing following changes to v1.0.2:

  • Fixed zero padding for memory formats with rank 3 and below (4d78aaf)
  • Fixed tail scaling for int8 inner product (41b5a7e)
  • Sum does not override the data type for destination memory descriptor when used with any (e979eda)
  • Improved s8s8 GEMM and inner product performance (4b44aa5)
  • Reduced memory consumption of GEMM-based algorithm for convolution weight gradient (f46b044)
  • Fixed negative padding processing in pooling (48ba96a)
  • Addressed memory leak in GPU deconvolution (686fc41)
  • Addressed memory leak in GPU stream (1206b2f)
  • Fixed fp16 GEMM correctness on GPU (c2425d4)
  • Fixed GEMM correctness on GPU for the case of small M dimension (ac2683f)
  • Addressed following corner cases in CPU convolution implementation:
    • Fixed tail processing in int8 depthwise convolution (3a0943b)
    • Fixed bias padding in bfloat16 depthwise convolution (3d9af7c)
    • Fixed correctness issue in s8s8 flavor of depthwise convolution (e4d9049)
    • Fixed correctness issue in GEMM-based algorithm for 3D convolutions (161ac40)
    • Fixed corner case issues in Intel AVX512 implementation of convolution weight gradient (68f5124)
    • Disabled not supported cases for depthwise convolution weight gradient (5e6e6c8)
    • Convolution with 1x1 filter returns unimplemented for cases that have padding in spatial dimensions (9d7cc77)
    • Fixed negative padding support in general convolution kernel (b1c602a)
    • Fixed padding handling in depthwise convolution backpropagation (04712f6)
    • Added support for negative padding in h and d spatial dimensions (7ddce82)
    • Fixed segfault in strided convolution backpropagation (b04f3f5)
    • Fixed memory corruption in convolution backpropagation (8877bc9)
Assets 10

@vpirogov vpirogov released this Oct 9, 2019

This is a patch release containing following changes to v0.20.5:

  • Fixed performance regression in GEMM (cfc5c3d)
Assets 2

@tprimak tprimak released this Oct 8, 2019 · 17 commits to rls-v0.21 since this release

This is a patch release containing following changes to v0.21.1:

  • Fixed performance regression in GEMM (9534621)
  • Fixed int8 dilated convolution for some shapes with input heights <= dilation over the heights dimension (e68f151)
  • Addressed static initialization order issue in bf16 converters (ae8efde)
  • Fixed fast reference backward convolution dispatching for 3D-spatial case (5994d63)
Assets 2

@anita-intel anita-intel released this Oct 3, 2019 · 26 commits to rls-v1.1 since this release

Performance optimizations

  • Improved functionality performance with TBB threading achieving comparable performance with OpenMP threading.
  • Improved int8 and fp32 GEMM performance on system with Intel AVX-512 and Intel VNNI support.
  • Improved softmax performance for NHWC and corresponding blocked layouts.
  • Improved RNN cell performance and decreased dependency of RNN performance from the compiler vectorization capabilities.
  • Improved reorders performance for some shapes.

New functionality

  • Introduced layer normalization and binary elementwise primitives support (CPU engine).
  • Introduced swish (CPU and GPU engines) and gelu (GPU engine) activation support in elementwise primitive.
  • Introduced bfloat16 data type support in RNN cells (CPU engine).
  • Introduced initial int8 and bfloat16 data types support for GPU functionality.

Usability improvements

  • TBB threading support is promoted to production quality.
  • Introduced support for memory format any for memory-bound primitives backpropagation. This mechanism allows to match gradient memory format with source and destination memory formats from forward pass.
  • Changed default compiler flags to target Intel SSE4.1 instruction set to make builds portable.
  • (experimental) Introduced caching mechanism that reduces primitive creation time for repeated primitive creation. The functionality is disabled by default and has to be enabled in compile time.

Validation improvements

  • Extended benchdnn to cover all supported primitives.
  • Introduced robust validation method for RNN cells in benchdnn. The approach allows to replace activations with linear function to make error accumulation more predictable and decrease the number of false positives.
  • Extended convolution test coverage.

Thanks to the contributors

This release contains contributions from many Intel Performance Libraries developers as well as Ilia Taraban, Jacek Czaja @jczaja, William Tambellini @WilliamTambellini, Tomasz Kalina, Mateusz Guziak, Daniel Haidachuk, Konstantin Basargin @basargin, Aaron Johnson @aaronjohnson, and Jeremy Wong @jrmwng. We would also like to thank everyone who asked questions and reported issues.

Assets 10

@tprimak tprimak released this Sep 28, 2019 · 23 commits to rls-v0.21 since this release

This is a patch release containing following changes to Intel MKL-DNN v0.21:

  • Fixed output channel blocking logic in forward AVX2 convolution that could lead to incorrect result or segfault (6accb47)
  • Fixed int8 grouped convolution for some shapes with the number of input or output channels not being a multiple of 8 on Intel AVX512 systems (878ac2d)
Assets 2

@vpirogov vpirogov released this Sep 17, 2019 · 28 commits to rls-v0.21 since this release

Performance optimizations

  • Improved int8 and fp32 GEMM and inner product performance.
  • Improved reorder performance for certain shapes.
  • Improved RNN, LSTM, GRU and LBR-GRU training performance.

New functionality

  • Added GELU activation support.

Thanks to the contributors

This release contains contributions from many Intel Performance Libraries developers. We would also like to thank everyone who asked questions and reported issues.

Assets 6
Pre-release

@vpirogov vpirogov released this Sep 16, 2019 · 41 commits to rls-v1.1 since this release

This is a release candidate for DNNL v1.1. Please provide feedback and report bugs in Github issues.

Assets 2

@vpirogov vpirogov released this Sep 5, 2019 · 2 commits to rls-v0.20 since this release

This is a patch release containing following changes to Intel MKL-DNN v0.20.4:

  • Fixed out of bound memory access in GEMM-based grouped convolution weight update (3deeafa)
  • Fixed segmentation fault in AVX512 convolution for effective negative padding (f231ada)
  • Fixed correctness issue in strided depthwise convolution (d7484cb)
Assets 2
You can’t perform that action at this time.