Skip to content

Latest commit

 

History

History
306 lines (199 loc) · 11.6 KB

CHANGELOG.md

File metadata and controls

306 lines (199 loc) · 11.6 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

This release contains a breaking change for code using the TensorBase type directly. See rten-tensor changes. Code using the type aliases (TensorView etc.) should not be affected.

rten

  • Fixed incorrect calculation of update slice size in ScatterND operator (#157)

  • Fixed incorrect conversion of axis attribute for ArgMin and ArgMax operators (#142)

  • Support 1D inputs and padding in ConvTranspose (#156)

  • Support GatherND operator (#155)

  • Fixed uninitialized read in Gemm operator when alpha != 1 and beta == 0 (#150)

  • Support Softplus operator (#146)

  • Support converting ONNX models containing unnamed operator nodes (#143)

  • Support RandomNormal, RandomNormalLike, RandomUniformLike operators (#144)

  • Parallelize AveragePool operator (#138)

rten-imageproc

  • The mask matrix argument to find_contours now uses bool instead of i32 for elements. This improves performance / reduces memory usage for large masks.

rten-tensor

Breaking changes

This release changes the signature of the TensorBase struct from TensorBase<T, S: AsRef<[T]>, L: MutLayout> where T is the element type, S the storage and L the layout to TensorBase<S: Storage, L: MutLayout>. The element type is now available via S::Elem. The type of S used by views has changed from slices to new custom types. The TensorBase::from_data method still accepts both Vec<T> and slices as the data argument, and will convert to the appropriate storage struct.

Code using the type aliases (Tensor, TensorView, TensorViewMut etc.) does not need to change.

Changes

  • Refactored tensor storage types to fix a violation of Rust's unique ownership rules for mutable slices. This enables tests for rten-tensor and code using this crate to be run under Miri (#148).

  • Added TensorBase::{as_cow, into_cow} (named after std::borrow::Cow) to convert tensor storage to a type which is Cow-like. This is useful for writing code which works with either borrowed or owned tensors (#153).

rten-vecmath

  • Revised SIMD traits to make working with masks more ergonomic and efficient (#152). Integer and floating point types with the same number of lanes will now use the same mask type.

[0.8.0] - 2024-04-29

rten-tensor

  • Added Alloc trait which provides a simple allocator interface, and *_in-suffixed variants of several TensorBase methods, which allows specifying an allocator for the returned tensor's data buffer (#123).

rten-vecmath

  • Fixed crashes in several functions when running on pre-AVX2 x64 CPUs (see rten changes)

rten

New features

  • Support Elu operator (#132)

  • Support Reduce* operators that take axes as a dynamic input rather than static attribute (#132)

Bug fixes

  • Fixed crash in several operators when running on x64 CPUs that do not support AVX-2 instructions (#131, #134)

Performance improvements

  • Added a buffer pool that enables reuse of operator output and temporary buffers, avoiding the overhead of allocating and freeing large buffers using the system allocator (#108).

    Statistics about buffer pool usage are printed as part of RTEN_TIMING output.

  • Fixed a MatMul performance regression introduced in v0.7.0 due to virtual calls to get kernel tile size (#101)

  • Optimize convolutions by using SIMD operations for im2col transform (#104)

  • Parallelize depthwise convolution (#102)

  • Avoid redundant of zeroing buffers in Conv, OneHot, and various unary operations (#97, #99, #101, #106)

  • Optimize Unsqueeze by running in-place where possible (#96)

  • Optimize vector-matrix products where matrix is transposed (#94)

  • Reduced graph execution overhead by using faster hashing (#92)

  • Optimize ScatterND (#91)

  • Support AVX-512 acceleration for Exp, Sigmoid, Tanh, Softmax and Erf operators (#131). This requires nightly Rust and the avx512 feature enabled.

[0.7.0] - 2024-04-12

rten-tensor

  • Add Tensor::merge_axes method to simplify layouts (#78)

  • Add Tensor::{uninit, assume_init} methods for working with uninitialized buffers (#82)

rten

  • Reduced Graph::run overhead by reducing allocations (#89)

  • Added Model::partial_run API to speed up autoregressive / recurrent models by precomputing parts of the graph that depend only on inputs that are unchanging across loop iterations (#86)

  • Optimize MatMul and binary operators by avoiding unnecessary zeroing of output buffers (#82, #88)

  • Fixed incorrect output from Gemm operator when the bias is zero and the "C" input contained infinities / NaNs (#81)

  • Optimize matrix packing operations on Intel CPUs using AVX-2 instructions (#80)

  • Optimize Transpose operations where input dimensions are powers of 2 by using blocking and tiling (#78)

  • Exclude test files and tools from published crate (#77)

  • Optimize RNN operators for the case where the input sequence is short, by avoiding prepacking of weights in this case (#74)

[0.6.0] - 2024-03-31

rten

  • Updated AVX-512 support to work with latest Rust nightly releases (#58)

  • Improved performance of vector-matrix product operations (#61)

  • Slightly improved WASM matrix multiplication performance with a dedicated kernel (#64)

  • Fixed conversion of RNN operators (LSTM, GRU) that explicitly declare the direction as forward (#67)

  • Support tensors with 3 or 5+ dimensions in BatchNormalization operator (#68)

  • Support RandomUniform operator (#69)

  • Improve matrix prepacking performance by eliminating unnecessary zero-initialization of buffers (#70)

[0.5.0] - 2024-02-29

rten

  • Changed OperatorType enum in .rten schema from byte to ubyte, to allow for more operator types in future (#56)

  • Made Model instances Send, enabling use with PyO3 (#55)

  • The ONNX => rten model conversion tool is now an installable Python package called rten-convert (#53)

  • Implemented ReduceSumSquare operator (36bbf89f)

[0.4.0] - 2024-02-08

rten

  • Support count_include_pad attr in AveragePool operator (09ecb729)

  • Support license/version/provenance metadata in RTen models (#48)

  • Fix error when a negative index was used with Gather operator (573ded4c)

  • Improve performance of MatMul operator when row count of LHS is small and batch size is large (#51)

rten-imageproc

  • Optimized find_contours for large images (c471a6c, 7a14f43)

rten-tensor

  • Optimize TensorBase::map for contiguous tensors (5562fd23)
  • Add TensorBase::{from_fn, from_simple_fn} (5e654ea0)
  • Add TensorBase::try_from_data (18817907)
  • Support get_unchecked on owned/mutable tensors (06b02eaf)

[0.3.1] - 2024-01-23

  • Updated rten-vecmath dependency to latest version

[0.3.0] - 2024-01-23

Breaking changes

The static and dynamic tensor types (NdTensorBase, TensorBase) have been unified into a single implementation. Most code uses these via type aliases (NdTensor, Tensor etc.), which remain the same. However there have been some API changes as a result:

  • The View and NdView traits were combined into AsView. The recommended way to import this trait is via the prelude (use rten_tensor::prelude::*)

  • Some inherent methods of TensorBase moved to the AsView trait. You may need to add additional imports of this trait or the prelude.

  • NdTensor::from_data now has the same API signature as Tensor::from_data. This means the order of arguments is reversed compared to before. It is now from_data(shape, data). Creating tensors with custom strides is now done via from_data_with_strides or from_slice_with_strides.

  • Tensor methods for broadcasting and reshaping tensors now determine the rank of the result from the type of the shape argument. If passed an array, they return a static-rank view. If passed a slice, they return a dynamic-rank view.

  • Methods that insert, remove or swap axes now have an _axis suffix (eg. move_axis). Previously some of these methods had a _dim suffix.

  • The slice method now always returns a static rank view. Usage is tensor.slice::<M, _>(range) where M is the rank of the result. To create a view with a dynamic dimension count, use tensor.slice_dyn(range) instead.

New features

  • Implemented LayerNormalization operator (#44)
  • Added "Depth Anything" monocular depth estimation example (#44)
  • Added support for align_corners value for coordinate_transformation_mode attr in Resize operator (#44).

Performance improvements

  • Optimized index iteration for tensors (d3fd3c9)
  • Optimized col2im transform used by ConvTranspose (fbc541b)
  • Optimized depthwise convolution (20e83e8)
  • Improved performance on Arm via a better optimized GEMM kernel (#32) and vectorized kernels for other functions (#31).

[0.2.0] - 2024-01-03

  • Improved inference performance on ARM #30

[0.1.1] - 2024-01-01

  • Fix softmax operator on non-x64 / wasm32 platforms (59f4815)

[0.1.0] - 2023-12-31

Initial release.