Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

This release contains a breaking change for code using the TensorBase type directly. See rten-tensor changes. Code using the type aliases (TensorView etc.) should not be affected.

rten

Fixed incorrect calculation of update slice size in ScatterND operator (#157)
Fixed incorrect conversion of axis attribute for ArgMin and ArgMax operators (#142)
Support 1D inputs and padding in ConvTranspose (#156)
Support GatherND operator (#155)
Fixed uninitialized read in Gemm operator when alpha != 1 and beta == 0 (#150)
Support Softplus operator (#146)
Support converting ONNX models containing unnamed operator nodes (#143)
Support RandomNormal, RandomNormalLike, RandomUniformLike operators (#144)
Parallelize AveragePool operator (#138)

rten-imageproc

The mask matrix argument to find_contours now uses bool instead of i32 for elements. This improves performance / reduces memory usage for large masks.

rten-tensor

Breaking changes

This release changes the signature of the TensorBase struct from TensorBase<T, S: AsRef<[T]>, L: MutLayout> where T is the element type, S the storage and L the layout to TensorBase<S: Storage, L: MutLayout>. The element type is now available via S::Elem. The type of S used by views has changed from slices to new custom types. The TensorBase::from_data method still accepts both Vec<T> and slices as the data argument, and will convert to the appropriate storage struct.

Code using the type aliases (Tensor, TensorView, TensorViewMut etc.) does not need to change.

Changes

Refactored tensor storage types to fix a violation of Rust's unique ownership rules for mutable slices. This enables tests for rten-tensor and code using this crate to be run under Miri (#148).
Added TensorBase::{as_cow, into_cow} (named after std::borrow::Cow) to convert tensor storage to a type which is Cow-like. This is useful for writing code which works with either borrowed or owned tensors (#153).

rten-vecmath

Revised SIMD traits to make working with masks more ergonomic and efficient (#152). Integer and floating point types with the same number of lanes will now use the same mask type.

[0.8.0] - 2024-04-29

rten-tensor

Added Alloc trait which provides a simple allocator interface, and *_in-suffixed variants of several TensorBase methods, which allows specifying an allocator for the returned tensor's data buffer (#123).

rten-vecmath

Fixed crashes in several functions when running on pre-AVX2 x64 CPUs (see rten changes)

rten

New features

Support Elu operator (#132)
Support Reduce* operators that take axes as a dynamic input rather than static attribute (#132)

Bug fixes

Fixed crash in several operators when running on x64 CPUs that do not support AVX-2 instructions (#131, #134)

Performance improvements

Added a buffer pool that enables reuse of operator output and temporary buffers, avoiding the overhead of allocating and freeing large buffers using the system allocator (#108).

Statistics about buffer pool usage are printed as part of RTEN_TIMING output.
Fixed a MatMul performance regression introduced in v0.7.0 due to virtual calls to get kernel tile size (#101)
Optimize convolutions by using SIMD operations for im2col transform (#104)
Parallelize depthwise convolution (#102)
Avoid redundant of zeroing buffers in Conv, OneHot, and various unary operations (#97, #99, #101, #106)
Optimize Unsqueeze by running in-place where possible (#96)
Optimize vector-matrix products where matrix is transposed (#94)
Reduced graph execution overhead by using faster hashing (#92)
Optimize ScatterND (#91)
Support AVX-512 acceleration for Exp, Sigmoid, Tanh, Softmax and Erf operators (#131). This requires nightly Rust and the avx512 feature enabled.

[0.7.0] - 2024-04-12

rten-tensor

Add Tensor::merge_axes method to simplify layouts (#78)
Add Tensor::{uninit, assume_init} methods for working with uninitialized buffers (#82)

rten

Reduced Graph::run overhead by reducing allocations (#89)
Added Model::partial_run API to speed up autoregressive / recurrent models by precomputing parts of the graph that depend only on inputs that are unchanging across loop iterations (#86)
Optimize MatMul and binary operators by avoiding unnecessary zeroing of output buffers (#82, #88)
Fixed incorrect output from Gemm operator when the bias is zero and the "C" input contained infinities / NaNs (#81)
Optimize matrix packing operations on Intel CPUs using AVX-2 instructions (#80)
Optimize Transpose operations where input dimensions are powers of 2 by using blocking and tiling (#78)
Exclude test files and tools from published crate (#77)
Optimize RNN operators for the case where the input sequence is short, by avoiding prepacking of weights in this case (#74)

[0.6.0] - 2024-03-31

rten

Updated AVX-512 support to work with latest Rust nightly releases (#58)
Improved performance of vector-matrix product operations (#61)
Slightly improved WASM matrix multiplication performance with a dedicated kernel (#64)
Fixed conversion of RNN operators (LSTM, GRU) that explicitly declare the direction as forward (#67)
Support tensors with 3 or 5+ dimensions in BatchNormalization operator (#68)
Support RandomUniform operator (#69)
Improve matrix prepacking performance by eliminating unnecessary zero-initialization of buffers (#70)

[0.5.0] - 2024-02-29

rten

Changed OperatorType enum in .rten schema from byte to ubyte, to allow for more operator types in future (#56)
Made Model instances Send, enabling use with PyO3 (#55)
The ONNX => rten model conversion tool is now an installable Python package called rten-convert (#53)
Implemented ReduceSumSquare operator (36bbf89f)

[0.4.0] - 2024-02-08

rten

Support count_include_pad attr in AveragePool operator (09ecb729)
Support license/version/provenance metadata in RTen models (#48)
Fix error when a negative index was used with Gather operator (573ded4c)
Improve performance of MatMul operator when row count of LHS is small and batch size is large (#51)

rten-imageproc

Optimized find_contours for large images (c471a6c, 7a14f43)

rten-tensor

Optimize TensorBase::map for contiguous tensors (5562fd23)
Add TensorBase::{from_fn, from_simple_fn} (5e654ea0)
Add TensorBase::try_from_data (18817907)
Support get_unchecked on owned/mutable tensors (06b02eaf)

[0.3.1] - 2024-01-23

Updated rten-vecmath dependency to latest version

[0.3.0] - 2024-01-23

Breaking changes

The static and dynamic tensor types (NdTensorBase, TensorBase) have been unified into a single implementation. Most code uses these via type aliases (NdTensor, Tensor etc.), which remain the same. However there have been some API changes as a result:

The View and NdView traits were combined into AsView. The recommended way to import this trait is via the prelude (use rten_tensor::prelude::*)
Some inherent methods of TensorBase moved to the AsView trait. You may need to add additional imports of this trait or the prelude.
NdTensor::from_data now has the same API signature as Tensor::from_data. This means the order of arguments is reversed compared to before. It is now from_data(shape, data). Creating tensors with custom strides is now done via from_data_with_strides or from_slice_with_strides.
Tensor methods for broadcasting and reshaping tensors now determine the rank of the result from the type of the shape argument. If passed an array, they return a static-rank view. If passed a slice, they return a dynamic-rank view.
Methods that insert, remove or swap axes now have an _axis suffix (eg. move_axis). Previously some of these methods had a _dim suffix.
The slice method now always returns a static rank view. Usage is tensor.slice::<M, _>(range) where M is the rank of the result. To create a view with a dynamic dimension count, use tensor.slice_dyn(range) instead.

New features

Implemented LayerNormalization operator (#44)
Added "Depth Anything" monocular depth estimation example (#44)
Added support for align_corners value for coordinate_transformation_mode attr in Resize operator (#44).

Performance improvements

Optimized index iteration for tensors (d3fd3c9)
Optimized col2im transform used by ConvTranspose (fbc541b)
Optimized depthwise convolution (20e83e8)
Improved performance on Arm via a better optimized GEMM kernel (#32) and vectorized kernels for other functions (#31).

[0.2.0] - 2024-01-03

Improved inference performance on ARM #30

[0.1.1] - 2024-01-01

Fix softmax operator on non-x64 / wasm32 platforms (59f4815)

[0.1.0] - 2023-12-31

Initial release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

Changelog

Unreleased

rten

rten-imageproc

rten-tensor

Breaking changes

Changes

rten-vecmath

[0.8.0] - 2024-04-29

rten-tensor

rten-vecmath

rten

New features

Bug fixes

Performance improvements

[0.7.0] - 2024-04-12

rten-tensor

rten

[0.6.0] - 2024-03-31

rten

[0.5.0] - 2024-02-29

rten

[0.4.0] - 2024-02-08

rten

rten-imageproc

rten-tensor

[0.3.1] - 2024-01-23

[0.3.0] - 2024-01-23

Breaking changes

New features

Performance improvements

[0.2.0] - 2024-01-03

[0.1.1] - 2024-01-01

[0.1.0] - 2023-12-31

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

Unreleased

rten

rten-imageproc

rten-tensor

Breaking changes

Changes

rten-vecmath

[0.8.0] - 2024-04-29

rten-tensor

rten-vecmath

rten

New features

Bug fixes

Performance improvements

[0.7.0] - 2024-04-12

rten-tensor

rten

[0.6.0] - 2024-03-31

rten

[0.5.0] - 2024-02-29

rten

[0.4.0] - 2024-02-08

rten

rten-imageproc

rten-tensor

[0.3.1] - 2024-01-23

[0.3.0] - 2024-01-23

Breaking changes

New features

Performance improvements

[0.2.0] - 2024-01-03

[0.1.1] - 2024-01-01

[0.1.0] - 2023-12-31