Releases
v0.26.0
Better support for model parallel, more reduction operations for allreduce (min, max, product), grouped allgather and reducedscatter, Petastorm reader level parallel shuffling, NVTabular data loader
Added
Spark Estimator: Added support for custom data loaders in KerasEstimator. (#3603 )
Spark Estimator: Added NVTabular data loader for KerasEstimator. (#3603 )
Spark Estimator: Added gradient accumulation support to Spark torch estimator. (#3681 )
TensorFlow: Added register_local_var
functionality to distributed optimizers and local gradient aggregators. (#3695 )
TensorFlow: Added support for local variables for BroadcastGlobalVariablesCallback
. (#3703 )
Enabled use of native ncclAvg
op for NCCL allreduces. (#3646 )
Added support for additional reduction operations for allreduce
(min, max, product). (#3660 )
Added 2D torus allreduce
using NCCL. (#3608 )
Added support for Petastorm reader level parallel shuffling. (#3665 )
Added random seed support for Lightning datamodule to generate reproducible data loading outputs. (#3665 )
Added support for int8
and uint8
allreduce
and grouped_allreduce
in TensorFlow. (#3649 )
Added support for batched memory copies in GPUAllgather
. (#3590 )
Added support for batched memory copies in GPUReducescatter
. (#3621 )
Added hvd.grouped_allgather()
and hvd.grouped_reducescatter()
operations. (#3594 )
Added warning messages if output tensor memory allocations fail. (#3594 )
Added register_local_source
and use_generic_names
funtionality to DistributedGradientTape
. (#3628 )
Added PartialDistributedGradientTape()
API for model parallel use cases. (#3643 )
Spark/Lightning: Added reader_worker_count
and reader_pool_type
. (#3612 )
Spark/Lightning: Added transformation_edit_fields
and transformation_removed_fields
param for EstimatorParams
. (#3651 )
TensorFlow: Added doc string for hvd.grouped_allreduce()
. (#3594 )
ROCm: Enabled alltoall
. (#3654 )
Changed
Default Petastorm reader pool is changed from process
to thread
for lower memory usage. (#3665 )
Keras: Support only legacy optimizers in Keras 2.11+. (#3725 )
Gloo: When negotiating, use gather
rather than allgather
. (#3633 )
Use packaging.version
instead of distutils
version classes. (#3700 )
Deprecated
Deprecated field shuffle_buffer_size
from EstimatorParams
. Use shuffle
to enable shuffle or not. (#3665 )
Removed
Build: Removed std::regex use for better cxxabi11 compatibility. (#3584 )
Fixed
TensorFlow: Fixed the optimizer iteration increments when backward_passes_per_step > 1
. (#3631 )
Fixed FuseResponses()
on BATCHED_D2D_PADDING
edge cases for Reducescatter and/or ROCm. (#3621 )
PyTorch: Fixed Reducescatter functions to raise HorovodInternalError
rather than RuntimeError
. (#3594 )
PyTorch on GPUs without GPU operations: Fixed grouped allreduce to set CPU device in tensor table. (#3594 )
Fixed race condition in PyTorch allocation handling. (#3639 )
Build: Fixed finding nvcc
(if not in $PATH
) with older versions of CMake. (#3682 )
Fixed reducescatter()
and grouped_reducescatter()
to raise clean exceptions for scalar inputs. (#3699 )
Updated Eigen submodule to fix build on macOS with aarch64. (#3619 )
Build: Correctly select files in torch/
directory to be hipified. (#3588 )
Build: Modify regex match for CUDA|ROCm in FindPytorch.cmake
. (#3593 )
Build: Fixed ROCm-specific build failure. (#3630 )
You can’t perform that action at this time.