All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
This release contains a breaking change for code using the TensorBase
type
directly. See rten-tensor
changes. Code using the type aliases (TensorView
etc.) should not be affected.
-
Fixed incorrect calculation of update slice size in
ScatterND
operator (#157) -
Fixed incorrect conversion of
axis
attribute forArgMin
andArgMax
operators (#142) -
Support 1D inputs and padding in
ConvTranspose
(#156) -
Support
GatherND
operator (#155) -
Fixed uninitialized read in
Gemm
operator whenalpha != 1
andbeta == 0
(#150) -
Support
Softplus
operator (#146) -
Support converting ONNX models containing unnamed operator nodes (#143)
-
Support
RandomNormal
,RandomNormalLike
,RandomUniformLike
operators (#144) -
Parallelize
AveragePool
operator (#138)
- The mask matrix argument to
find_contours
now usesbool
instead ofi32
for elements. This improves performance / reduces memory usage for large masks.
This release changes the signature of the TensorBase
struct from
TensorBase<T, S: AsRef<[T]>, L: MutLayout>
where T
is the element type, S
the storage and L
the layout to TensorBase<S: Storage, L: MutLayout>
. The
element type is now available via S::Elem
. The type of S
used by views has
changed from slices to new custom types. The TensorBase::from_data
method
still accepts both Vec<T>
and slices as the data
argument, and will convert
to the appropriate storage struct.
Code using the type aliases (Tensor
, TensorView
, TensorViewMut
etc.)
does not need to change.
-
Refactored tensor storage types to fix a violation of Rust's unique ownership rules for mutable slices. This enables tests for rten-tensor and code using this crate to be run under Miri (#148).
-
Added
TensorBase::{as_cow, into_cow}
(named afterstd::borrow::Cow
) to convert tensor storage to a type which isCow
-like. This is useful for writing code which works with either borrowed or owned tensors (#153).
- Revised SIMD traits to make working with masks more ergonomic and efficient (#152). Integer and floating point types with the same number of lanes will now use the same mask type.
- Added
Alloc
trait which provides a simple allocator interface, and*_in
-suffixed variants of severalTensorBase
methods, which allows specifying an allocator for the returned tensor's data buffer (#123).
- Fixed crashes in several functions when running on pre-AVX2 x64 CPUs (see
rten
changes)
-
Support
Elu
operator (#132) -
Support
Reduce*
operators that takeaxes
as a dynamic input rather than static attribute (#132)
- Fixed crash in several operators when running on x64 CPUs that do not support AVX-2 instructions (#131, #134)
-
Added a buffer pool that enables reuse of operator output and temporary buffers, avoiding the overhead of allocating and freeing large buffers using the system allocator (#108).
Statistics about buffer pool usage are printed as part of
RTEN_TIMING
output. -
Fixed a
MatMul
performance regression introduced in v0.7.0 due to virtual calls to get kernel tile size (#101) -
Optimize convolutions by using SIMD operations for im2col transform (#104)
-
Parallelize depthwise convolution (#102)
-
Avoid redundant of zeroing buffers in
Conv
,OneHot
, and various unary operations (#97, #99, #101, #106) -
Optimize
Unsqueeze
by running in-place where possible (#96) -
Optimize vector-matrix products where matrix is transposed (#94)
-
Reduced graph execution overhead by using faster hashing (#92)
-
Optimize
ScatterND
(#91) -
Support AVX-512 acceleration for
Exp
,Sigmoid
,Tanh
,Softmax
andErf
operators (#131). This requires nightly Rust and theavx512
feature enabled.
-
Add
Tensor::merge_axes
method to simplify layouts (#78) -
Add
Tensor::{uninit, assume_init}
methods for working with uninitialized buffers (#82)
-
Reduced
Graph::run
overhead by reducing allocations (#89) -
Added
Model::partial_run
API to speed up autoregressive / recurrent models by precomputing parts of the graph that depend only on inputs that are unchanging across loop iterations (#86) -
Optimize
MatMul
and binary operators by avoiding unnecessary zeroing of output buffers (#82, #88) -
Fixed incorrect output from
Gemm
operator when the bias is zero and the "C" input contained infinities / NaNs (#81) -
Optimize matrix packing operations on Intel CPUs using AVX-2 instructions (#80)
-
Optimize
Transpose
operations where input dimensions are powers of 2 by using blocking and tiling (#78) -
Exclude test files and tools from published crate (#77)
-
Optimize RNN operators for the case where the input sequence is short, by avoiding prepacking of weights in this case (#74)
-
Updated AVX-512 support to work with latest Rust nightly releases (#58)
-
Improved performance of vector-matrix product operations (#61)
-
Slightly improved WASM matrix multiplication performance with a dedicated kernel (#64)
-
Fixed conversion of RNN operators (LSTM, GRU) that explicitly declare the direction as forward (#67)
-
Support tensors with 3 or 5+ dimensions in
BatchNormalization
operator (#68) -
Support
RandomUniform
operator (#69) -
Improve matrix prepacking performance by eliminating unnecessary zero-initialization of buffers (#70)
-
Changed
OperatorType
enum in .rten schema from byte to ubyte, to allow for more operator types in future (#56) -
Made
Model
instancesSend
, enabling use with PyO3 (#55) -
The ONNX => rten model conversion tool is now an installable Python package called
rten-convert
(#53) -
Implemented
ReduceSumSquare
operator (36bbf89f)
-
Support
count_include_pad
attr in AveragePool operator (09ecb729) -
Support license/version/provenance metadata in RTen models (#48)
-
Fix error when a negative index was used with
Gather
operator (573ded4c) -
Improve performance of
MatMul
operator when row count of LHS is small and batch size is large (#51)
- Optimized
find_contours
for large images (c471a6c, 7a14f43)
- Optimize
TensorBase::map
for contiguous tensors (5562fd23) - Add
TensorBase::{from_fn, from_simple_fn}
(5e654ea0) - Add
TensorBase::try_from_data
(18817907) - Support
get_unchecked
on owned/mutable tensors (06b02eaf)
- Updated rten-vecmath dependency to latest version
The static and dynamic tensor types (NdTensorBase
, TensorBase
) have been
unified into a single implementation. Most code uses these via type aliases
(NdTensor
, Tensor
etc.), which remain the same. However there have been some
API changes as a result:
-
The
View
andNdView
traits were combined intoAsView
. The recommended way to import this trait is via the prelude (use rten_tensor::prelude::*
) -
Some inherent methods of
TensorBase
moved to theAsView
trait. You may need to add additional imports of this trait or the prelude. -
NdTensor::from_data
now has the same API signature asTensor::from_data
. This means the order of arguments is reversed compared to before. It is nowfrom_data(shape, data)
. Creating tensors with custom strides is now done viafrom_data_with_strides
orfrom_slice_with_strides
. -
Tensor methods for broadcasting and reshaping tensors now determine the rank of the result from the type of the shape argument. If passed an array, they return a static-rank view. If passed a slice, they return a dynamic-rank view.
-
Methods that insert, remove or swap axes now have an
_axis
suffix (eg.move_axis
). Previously some of these methods had a_dim
suffix. -
The
slice
method now always returns a static rank view. Usage istensor.slice::<M, _>(range)
whereM
is the rank of the result. To create a view with a dynamic dimension count, usetensor.slice_dyn(range)
instead.
- Implemented LayerNormalization operator (#44)
- Added "Depth Anything" monocular depth estimation example (#44)
- Added support for
align_corners
value forcoordinate_transformation_mode
attr in Resize operator (#44).
- Optimized index iteration for tensors (d3fd3c9)
- Optimized col2im transform used by ConvTranspose (fbc541b)
- Optimized depthwise convolution (20e83e8)
- Improved performance on Arm via a better optimized GEMM kernel (#32) and vectorized kernels for other functions (#31).
- Improved inference performance on ARM #30
- Fix softmax operator on non-x64 / wasm32 platforms (59f4815)
Initial release.