Skip to content
This repository has been archived by the owner on Jan 6, 2023. It is now read-only.

implemented torchelastic.distributed.launch for oss #65

Closed
wants to merge 1 commit into from

Commits on Mar 20, 2020

  1. implemented torchelastic.distributed.launch for oss

    Summary:
    Implements an elastic launcher similar in usage as `torch.distributed.launch` with added functionalities:
    1. automagic RANK, LOCAL_RANK, WORLD_SIZE assignment.
    2. retries of failed workers as a group.
    3. support for membership changes between `min` and `max` sizes.
    
    Completely compatible with existing scripts that are compliant with `torch.distributed.launch`.
    
    Differential Revision: D20554522
    
    fbshipit-source-id: 3ced8b5cc5ffca03413aa8d93b84f1840cb172b0
    Kiuk Chung authored and facebook-github-bot committed Mar 20, 2020
    Configuration menu
    Copy the full SHA
    2ddb81b View commit details
    Browse the repository at this point in the history