Skip to content
Branch: master
Find file Copy path
Find file Copy path
1 contributor

Users who have contributed to this file

52 lines (36 sloc) 1.71 KB

Horovod with PyTorch

Horovod supports PyTorch and TensorFlow in similar ways.

Example (also see a full training example):

import torch
import horovod.torch as hvd

# Initialize Horovod

# Pin GPU to be used to process local rank (one GPU per process)

# Define dataset...
train_dataset = ...

# Partition dataset among workers using DistributedSampler
train_sampler =
    train_dataset, num_replicas=hvd.size(), rank=hvd.rank())

train_loader =, batch_size=..., sampler=train_sampler)

# Build model...
model = ...

optimizer = optim.SGD(model.parameters())

# Add Horovod Distributed Optimizer
optimizer = hvd.DistributedOptimizer(optimizer, named_parameters=model.named_parameters())

# Broadcast parameters from rank 0 to all other processes.
hvd.broadcast_parameters(model.state_dict(), root_rank=0)

for epoch in range(100):
   for batch_idx, (data, target) in enumerate(train_loader):
       output = model(data)
       loss = F.nll_loss(output, target)
       if batch_idx % args.log_interval == 0:
           print('Train Epoch: {} [{}/{}]\tLoss: {}'.format(
               epoch, batch_idx * len(data), len(train_sampler), loss.item()))


PyTorch support requires NCCL 2.2 or later. It also works with NCCL 2.1.15 if you are not using RoCE or InfiniBand.

You can’t perform that action at this time.