Skip to content
Permalink
Branch: master
Commits on Mar 19, 2019
  1. Bugfix Dockerfile (#933)

    alsrgv committed Mar 19, 2019
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
  2. Bump Horovod version to 0.16.1 (#928)

    alsrgv committed Mar 19, 2019
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
  3. horovodrun: use os.execve() for mpirun (#931)

    alsrgv committed Mar 19, 2019
    safe_shell_exec uses line buffering which does not play very well with
    Keras and tqdm -- libraries that make use of progress bars.
    
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
  4. Bugfix module name in horovodrun (#929)

    alsrgv committed Mar 19, 2019
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
Commits on Mar 18, 2019
  1. Adopt horovodrun (#924)

    alsrgv committed Mar 18, 2019
    * Adopt horovodrun
    
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
    
    * Update docs
    
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
    
    * Add newline to split paragraphs
    
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
Commits on Mar 15, 2019
  1. Switch safe_shell_exec() from byte stdout/stderr to text stdout/stderr (

    alsrgv committed Mar 15, 2019
    #917)
    
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
  2. Use os.setsid() instead of os.setpgid() (#916)

    alsrgv committed Mar 15, 2019
    We've encountered an issue with launching ssh inside safe_shell_exec.
    OpenSSH makes use of tcsetattr() to set terminal properties, which gets
    propagated to whole process group.  setsid() provides better isolation
    of newly spawned process.
    
    Additional reading: https://en.wikipedia.org/wiki/Process_group
    
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
Commits on Mar 13, 2019
  1. Add HOROVOD_CUDA_HOME documentation (#911)

    alsrgv committed Mar 13, 2019
    * Add HOROVOD_CUDA_HOME documentation
    
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
    
    * Add `HOROVOD_CUDA_INCLUDE` and `HOROVOD_CUDA_LIB`
    
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
    
    * Copyedits
    
    Signed-off-by: Alex Sergeev <alexander.sergeev@live.com>
Commits on Mar 2, 2019
  1. Add PyPI download statistics (#873)

    alsrgv committed Mar 2, 2019
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
Commits on Feb 27, 2019
  1. Add issue templates (#865)

    alsrgv committed Feb 27, 2019
Commits on Feb 22, 2019
  1. Bugfix PyTorch ImageNet example (#853)

    alsrgv committed Feb 22, 2019
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
Commits on Feb 21, 2019
  1. Make tensorflow_mnist_eager.py Python2-friendly (#846)

    alsrgv committed Feb 21, 2019
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
  2. Docker: add mxnet to build tag, switch to horovod/horovod (#845)

    alsrgv committed Feb 21, 2019
    * Docker: add mxnet to build tag, switch to horovod/horovod
    
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
    
    * Update docs as well
    
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
    
    * Add HOROVOD_WITH_MXNET=1
    
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
Commits on Feb 20, 2019
  1. Bump version to 0.16.0 (#838)

    alsrgv committed Feb 20, 2019
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
Commits on Feb 19, 2019
  1. Add MXNet 1.4.0 to Dockerfile (#840)

    alsrgv committed Feb 19, 2019
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
  2. Update Dockerfile NCCL to 2.4.2 (#837)

    alsrgv committed Feb 19, 2019
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
  3. Switch Travis CI URL (#835)

    alsrgv committed Feb 19, 2019
    Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
Commits on Feb 4, 2019
  1. Align broadcast_variables() API with broadcast() (#807)

    alsrgv committed Feb 4, 2019
Commits on Feb 2, 2019
  1. Add a link to LF DL (#805)

    alsrgv committed Feb 2, 2019
Commits on Jan 30, 2019
  1. Fix Python 3.6 by migrating to Ubuntu (#795)

    alsrgv committed Jan 30, 2019
    * Fix Python 3.6 by migrating to Ubuntu
    
    * Fix python3-distutils
    
    * Removing Ubuntu 14.04 as it does not support OpenJDK 8
    
    * Add pandas to Keras dependencies
    
    * Install typing for PyTorch
Commits on Jan 26, 2019
  1. Optimize Horovod Timeline and add Cycle Markers (#782)

    alsrgv committed Jan 26, 2019
Commits on Jan 17, 2019
Commits on Jan 16, 2019
  1. Demonstrate proper model parallelism in PyTorch (#768)

    alsrgv committed Jan 16, 2019
Commits on Jan 15, 2019
  1. Make duplicate name test more robust (#762)

    alsrgv committed Jan 15, 2019
    * Make duplicate name test more robust
    
    * Resolve ~/.keras issue as well
  2. Fix CI: use per-rank dataset file for TF eager (#756)

    alsrgv committed Jan 15, 2019
  3. Improve coverage: add tests for hvd.*_async failure when multiple ten…

    alsrgv committed Jan 15, 2019
    …sors have the same name (#755)
Commits on Jan 14, 2019
  1. PyTorch MNIST example improvements (#754)

    alsrgv committed Jan 14, 2019
  2. Make PyTorch MNIST example more documented (#753)

    alsrgv committed Jan 14, 2019
  3. Filter warnings to avoid Travis CI failure (#751)

    alsrgv committed Jan 14, 2019
Commits on Jan 9, 2019
  1. Docker: Allow downgrades to make build deterministic (#741)

    alsrgv committed Jan 9, 2019
Commits on Jan 4, 2019
  1. Have TensorFlow handle memcpy for Hierarchical Allgather (#721)

    alsrgv committed Jan 4, 2019
Commits on Jan 3, 2019
  1. Run tests independently to avoid GC race condition (#730)

    alsrgv committed Jan 3, 2019
  2. Docs: note that Open MPI 3.1.3 has an issue (#729)

    alsrgv committed Jan 3, 2019
Commits on Dec 31, 2018
  1. Horovod in PySpark (#606)

    alsrgv committed Dec 31, 2018
Older
You can’t perform that action at this time.