Release Better support for model parallel, more reduction operations for allreduce (min, max, product), grouped allgather and reducedscatter, Petastorm reader level parallel shuffling, NVTabular data loader · horovod/horovod

Added

Spark Estimator: Added support for custom data loaders in KerasEstimator. (#3603)
Spark Estimator: Added NVTabular data loader for KerasEstimator. (#3603)
Spark Estimator: Added gradient accumulation support to Spark torch estimator. (#3681)
TensorFlow: Added register_local_var functionality to distributed optimizers and local gradient aggregators. (#3695)
TensorFlow: Added support for local variables for BroadcastGlobalVariablesCallback. (#3703)
Enabled use of native ncclAvg op for NCCL allreduces. (#3646)
Added support for additional reduction operations for allreduce (min, max, product). (#3660)
Added 2D torus allreduce using NCCL. (#3608)
Added support for Petastorm reader level parallel shuffling. (#3665)
Added random seed support for Lightning datamodule to generate reproducible data loading outputs. (#3665)
Added support for int8 and uint8 allreduce and grouped_allreduce in TensorFlow. (#3649)
Added support for batched memory copies in GPUAllgather. (#3590)
Added support for batched memory copies in GPUReducescatter. (#3621)
Added hvd.grouped_allgather() and hvd.grouped_reducescatter() operations. (#3594)
Added warning messages if output tensor memory allocations fail. (#3594)
Added register_local_source and use_generic_names funtionality to DistributedGradientTape. (#3628)
Added PartialDistributedGradientTape() API for model parallel use cases. (#3643)
Spark/Lightning: Added reader_worker_count and reader_pool_type. (#3612)
Spark/Lightning: Added transformation_edit_fields and transformation_removed_fields param for EstimatorParams. (#3651)
TensorFlow: Added doc string for hvd.grouped_allreduce(). (#3594)
ROCm: Enabled alltoall. (#3654)

Changed

Default Petastorm reader pool is changed from process to thread for lower memory usage. (#3665)
Keras: Support only legacy optimizers in Keras 2.11+. (#3725)
Gloo: When negotiating, use gather rather than allgather. (#3633)
Use packaging.version instead of distutils version classes. (#3700)

Deprecated

Deprecated field shuffle_buffer_size from EstimatorParams. Use shuffle to enable shuffle or not. (#3665)

Removed

Build: Removed std::regex use for better cxxabi11 compatibility. (#3584)

Fixed

TensorFlow: Fixed the optimizer iteration increments when backward_passes_per_step > 1. (#3631)
Fixed FuseResponses() on BATCHED_D2D_PADDING edge cases for Reducescatter and/or ROCm. (#3621)
PyTorch: Fixed Reducescatter functions to raise HorovodInternalError rather than RuntimeError. (#3594)
PyTorch on GPUs without GPU operations: Fixed grouped allreduce to set CPU device in tensor table. (#3594)
Fixed race condition in PyTorch allocation handling. (#3639)
Build: Fixed finding nvcc (if not in $PATH) with older versions of CMake. (#3682)
Fixed reducescatter() and grouped_reducescatter() to raise clean exceptions for scalar inputs. (#3699)
Updated Eigen submodule to fix build on macOS with aarch64. (#3619)
Build: Correctly select files in torch/ directory to be hipified. (#3588)
Build: Modify regex match for CUDA|ROCm in FindPytorch.cmake. (#3593)
Build: Fixed ROCm-specific build failure. (#3630)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support for model parallel, more reduction operations for allreduce (min, max, product), grouped allgather and reducedscatter, Petastorm reader level parallel shuffling, NVTabular data loader

Added

Changed

Deprecated

Removed

Fixed