You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Added
Ray: Added elastic keyword parameters to RayExecutor API: This API supports both static (non-elastic) and elastic Horovod jobs. (#3190)
TensorFlow: Added in-place broadcasting of variables. (#3128)
Elastic: Added support for resurrecting blacklisted hosts. (#3319)
MXNet: Added support for MXNet async dependency engine. (#3242, #2963)
Spark/Lightning: Added history to lightning estimator. (#3214)
Changed
Moved to CMake version 3.13 with first-class CUDA language support and re-enabled parallelized builds. Uses a temporary installation of CMake if CMake 3.13 is not found. (#3261, #3371)
Moved released Docker image horovod and horovod-cpu to Ubuntu 20.04 and Python 3.8. (#3393)
Spark Estimator: Don't shuffle row groups if training data requires non-shuffle (#3369)
Spark/Lightning: Reduced memory footprint of async dataloader. (#3239)
Elastic: Improved handling NCCL errors under elastic scenario. (#3112)
Spark/Lightning: Do not overwrite model with checkpoint by default. (#3201)
Make checkpoint name optional so that user can save to h5 format. (#3411)
Deprecated
Deprecated ElasticRayExecutor APIs in favor of the new RayExecutor API. (#3190)
Removed
Spark: Removed h5py<3 constraint as this is not needed anymore for Tensorflow >2.5.0. (#3301)
Fixed
Elastic Spark: Fixed indices in initial task-to-task registration. (#3410)
PyTorch: Fixed GIL-related deadlock with PyTorch 1.10.1. (#3352)
PyTorch: Fixed finalization of ProcessSetTable. (#3351)
Fixed remote trainers to point to the correct shared lib path. (#3258)
Fixed imports from tensorflow.python.keras with tensorflow 2.6.0+. (#3403)