Skip to content
Permalink
Branch: master
Commits on Aug 16, 2019
Commits on Aug 14, 2019
  1. Fixed Gloo control plane to work when world size is 1 (#1302)

    tgaddair committed Aug 14, 2019
  2. Fixed progress bars for TF Keras with Gloo controller, added and upda…

    tgaddair committed Aug 14, 2019
    …ted Keras examples (#1297)
  3. Added validation to horovodrun to ensure MPI or Gloo are available be…

    tgaddair committed Aug 14, 2019
    …fore launching training, added additional compilation options, and updated docs for Gloo (#1300)
Commits on Aug 13, 2019
  1. Changed Gloo mode to attempt changing directory into the current work…

    tgaddair committed Aug 13, 2019
    …ing directory that the command was launched from, if it exists, to be consistent with MPI (#1298)
  2. Refactored Gloo and MPI components into separate subdirectories, redu…

    tgaddair committed Aug 13, 2019
    …ced INFO logs to DEBUG, and fixed op order for Hierarchical Allreduce (#1294)
Commits on Aug 10, 2019
  1. Changed CUDA operations to claim a copy of the fusion buffer shared p…

    tgaddair committed Aug 10, 2019
    …ointer to prevent deallocation during finalization (#1288)
Commits on Jul 18, 2019
  1. Added guard to namedtuple class to prevent pyspark from overwriting i…

    tgaddair committed Jul 18, 2019
    …t and breaking serialization (#1180)
Commits on Jul 3, 2019
  1. Added developer docs to help onboard new contributors (#1190)

    tgaddair committed Jul 3, 2019
Commits on Apr 6, 2019
  1. Fixed issue where setting all parameters to fixed values more than on…

    tgaddair authored and alsrgv committed Apr 6, 2019
    …ce would attempt to allocate a negative size vector (#995)
    
    * Fixed issue where setting all parameters to fixed values more than once would attempt to allocate a negative size vector
    
    Signed-off-by: Travis Addair <taddair@uber.com>
    
    * Do not remove variables twice
    
    Signed-off-by: Travis Addair <taddair@uber.com>
    
    * Typo fix
    
    Signed-off-by: Travis Addair <taddair@uber.com>
Commits on Mar 12, 2019
Commits on Mar 9, 2019
  1. Refactor operations into separate components by framework (#826)

    tgaddair committed Mar 9, 2019
Commits on Mar 8, 2019
  1. Fixed parameter manager to use the previously set value for Bayesian …

    tgaddair committed Mar 8, 2019
    …parameters when changing a free parameter to a constant (#888)
Commits on Feb 20, 2019
  1. Fixed PyTorch with CUDA to use new DataType enum (#842)

    tgaddair committed Feb 20, 2019
    Signed-off-by: Travis Addair <taddair@uber.com>
Commits on Feb 19, 2019
Commits on Feb 6, 2019
  1. Improved autotuning scoring and parameter search process (#813)

    tgaddair authored and alsrgv committed Feb 6, 2019
Commits on Jan 10, 2019
  1. Fixed auto-tuning process to correctly bubble up parameters (#744)

    tgaddair committed Jan 10, 2019
Commits on Dec 14, 2018
  1. Added parallelization of allreduce during eager execution and synthet…

    tgaddair committed Dec 14, 2018
    …ic benchmark for TensorFlow (#704)
    
    Added parallelization of allreduce during eager execution and synthetic benchmark for TensorFlow
    
    Benchmark script is an alternative to running tf_cnn_benchmarks.py.
    
    On 1 GPU, we observed 189 im/s with this benchmark vs 195 im/s with tf_cnn_benchmarks. Similarly, with 16 GPUs across 8 workers, we observed 135 im/s vs 140 im/s.
    
    Running with eager execution reduced images per second to 60% of graph mode. Before this change to parallelize Horovod ops, eager execution ran at 6% the im/s of graph mode.
Commits on Dec 11, 2018
  1. Automatic parameter tuning for hierarchical allreduce, fusion buffer,…

    tgaddair committed Dec 11, 2018
    … and cycle time (#615)
Commits on Nov 25, 2018
  1. Moved Keras implementation into private module (#650)

    tgaddair committed Nov 25, 2018
    * Moved Keras implementation into private module to avoid importing standalone keras when using tf.keras
    
    * Fixed unit tests
Commits on Nov 3, 2018
Commits on Oct 15, 2018
  1. Broadcast optimizer options in addition to parameter state (#562)

    tgaddair committed Oct 15, 2018
    * Broadcast optimizer options in addition to parameter state
    
    * Added comment
    
    * Added tests for all the optimizer subclasses
    
    * Added comment
Commits on Oct 8, 2018
  1. Fixed broadcast_optimizer_state for stateless optimizers (#548)

    tgaddair authored and alsrgv committed Oct 8, 2018
Commits on Sep 29, 2018
  1. Added fp16 to pytorch_imagenet_resnet50 example (#535)

    tgaddair committed Sep 29, 2018
Commits on Sep 28, 2018
  1. FP16 support for GPU tensors in all frameworks (#529)

    tgaddair committed Sep 28, 2018
    * Initial support for FP16
    
    Bump version to a dev release
    
    Cast vars to fp16 before allreduce to compress gradients
    
    Abstracted compression algorithm into a class hierarchy and added algorithm flag to optimizer and allreduce signatures
    
    Changed compressor to set the dtype on initialization
    
    Resolved conflicts
    
    Additional conflicts
    
    Formatting
    
    More formats
    
    Updated license
    
    Added fp16 compression for Keras
    
    Added arguments to keras examples
    
    Fixed imports
    
    * Added compression to tf.keras
    
    * Added PyTorch compression API
    
    Added unit tests
    
    Whitespace
    
    * Added C interfaces and types
    
    * Forward declare
    
    * Removed Half from older versions of PyTorch
    
    * Added error for old version of PyTorch
    
    * Removed reference to float16
    
    * Updated examples, added compression to the Keras model load
    
    * Cleaned imports
    
    * Removed dependency on enums
    
    * Updated unit tests
    
    * Test compatability fix
    
    * Reverted version updates
    
    * Fixed message
    
    * Removed imports
    
    * Added cuda.HalfTensor to all PyTorch tests with CUDA
    
    * Only compare versions once
    
    * Renamed --fp16 in examples to --fp16-allreduce for clarity
    
    * Replaced assignment with set_
    
    * Modified compression algorithms to be stateless with optional context parameters
    
    * Removed optional ctx parameter
    
    * Replaced 0.4.2 with 1.0.0
    
    * Only run GPU tests with HalfTensors if fp16 is supported
Commits on Sep 26, 2018
  1. Added tf.keras support (#513)

    tgaddair committed Sep 26, 2018
    * Added support for tf.keras
    
    * Added unit tests
    
    * Refactoring
    
    * Fixed tests
    
    * Hide implementation modules
    
    * Moved _DistributedOptimizer into the impl file and wrapped with function
    
    * Added cooperative multiple inheritance
    
    * Backwards compatability with TensorFlow versions less than 1.4.0
    
    * Removed duplicate headers
Commits on Sep 14, 2018
  1. Fixed issue with dynamically linking PyTorch module on Mac OSX (#494)

    tgaddair committed Sep 14, 2018
    * Fixed issue with dynamically linking PyTorch module on Mac OSX due to hidden symbol
    
    * Changed to kleene star matching for consistency
Commits on Aug 2, 2018
  1. Added command line args to Keras ImageNet examples (#419)

    tgaddair committed Aug 2, 2018
    * Added command line args to Keras ImageNet example in line with those in the PyTorch examples
    
    * Addressed comments
    
    * Fixed checkpoint formatting
Commits on Jul 23, 2018
Commits on Jul 13, 2018
  1. Added custom load_model function to wrap the model optimizer with a H…

    tgaddair committed Jul 13, 2018
    …orovod DistributedOptimizer (#359)
    
    * Added custom load_model function to wrap the model optimizer with a Horovod DistributedOptimizer
    
    * Added Keras tests
    
    * Removed imports
    
    * Fixed license
    
    * Updated imagenet example
    
    * Fixed unit tests and API for compatability with Keras 2.0.0 and TensorFlow 1.1.0
    
    * Added guarded import of Keras to avoid race conditions
    
    * Reverted formatting changes to example and only load model on rank 0 node
    
    * Added additional unit tests for optional parameters, fixed issue with key when using custom_optimizers
    
    * Added broadcast tests
    
    * Updated example
    
    * Clear session between tests to reset variables
    
    * Updated comment
    
    * Execute all Keras unit tests in a custom TensorFlow session
    
    * Added save_model for parity with PyTorch API
    
    * Added assertions
    
    * Revert "Added save_model for parity with PyTorch API"
    
    This reverts commit e6381f0.
  2. Added PyTorch support for restoring optimizer state on model load an…

    tgaddair committed Jul 13, 2018
    …d broadcast (#371)
    
    * Added PyTorch support for restoring optimizer state on model load
    
    * Fixed examples, cleaned up API
    
    * Imports
    
    * Just in time variable declaration
    
    * Updated API, fixed test
    
    * Replaced iteritems() with items() for Python 3
    
    * Python 3 compatability issue
    
    * Added test for custom_state
    
    * Added comments, cache state_dict
    
    * Test not saving and loading the optimizer
    
    * Added broadcast_object function and changed optimizer state broadcast to use this instead
    
    * Fixed PyTorch 0.3.0 compatability and updated the docs
    
    * Removed save_model and load_model
    
    * Removed check
    
    * Init tensor from size, removed extra whitespace
Commits on Jul 3, 2018
  1. Made synchronous pure functional Horovod ops differentiable in PyTorch (

    tgaddair committed Jul 3, 2018
    #338)
    
    * Made synchronous pure functional Horovod ops differentiable in PyTorch and added unit tests to verify gradient correctness
    
    * Added comment
    
    * Fixed backward compatability with PyTorch version 0.3.0
    
    * Fixed example code to use variable api
Older
You can’t perform that action at this time.