Skip to content

Process sets, XLA support, improved GPU backend

Compare
Choose a tag to compare
@tgaddair tgaddair released this 06 Oct 17:52
· 275 commits to master since this release
66ad6d5

Added

  • Added process sets to concurrently run collective operations on subsets of Horovod processes in TensorFlow, PyTorch, and MXNet. (#2839, #3042, #3043, #3054, #3083, #3090)

  • Added XLA support for Allreduce via tf.function(jit_compile=True). (#3053)

  • Added fused buffer scaling and unpack/pack kernels on GPU. (#2973)

  • Added support for NCCL on CUDA 11.4. (#3182)

  • Added fp16 compression for MXNet. (#2987)

  • Added terminate_on_nan flag to Spark Lightning estimator. (#3088)

  • Added barrier() API to torch module to support simple synchronization among ranks and to achieve parity with PyTorch DDP and similar frameworks. #3139

  • Added params for customizing Tensorboard callback. (#3153)

  • Added hvd.cross_rank() for keras. (#3008)

  • Added barrier() API to torch module to support simple synchronization among ranks and to achieve parity with PyTorch DDP and similar frameworks. #3139

Changed

  • Implemented more asynchronous dependency handling on GPU. (#2963)

  • Ray: RayExecutor will now use the current placement group instead of always creating a new one. (#3134)

  • Lightning: turned off shuffling for validation dataset. (#2974)

  • Ray: RayExecutor will use the current placement group if one exists. (#3134)

  • Extended hvd.join() to return the last rank that joined. (#3097)

Removed

  • Spark/Keras: remove bare Keras support. (#3191)

Fixed

  • Fix Horovod develop/editable install mode and incremental builds. (#3074)

  • Estimator/Lightning: use lightning datamodule. (#3084)

  • Fix Horovod Spark StringType and numpy type mapping issue. (#3146)

  • Fixed error in Keras LearningRateScheduler. (#3135)

  • Fixed bug in Lightning Profiler on Ray. (#3122)

  • Fixed torch op lazy release to prevent OOM in elastic training. (#3110)

  • Lightning: Fixed usage of the checkpoint callback. (#3186)

  • Fixed MPICH support to use Intel MPI's implementation. (#3148)

  • Fixed race condition in PyTorch async dataloader. (#3120)

  • Keras: Fixed learning rate scheduler. (#3142, #3135)