# Ray SGD - A Library for Distributed Deep Learning

© 2019-2020, Anyscale. All Rights Reserved

![Anyscale Academy](../images/AnyscaleAcademy_Logo_clearbanner_141x100.png)

[Ray SGD](https://docs.ray.io/en/latest/raysgd/raysgd.html) is a lightweight library for distributed deep learning. It provides thin wrappers around [PyTorch](https://pytorch.org) and [TensorFlow](https://tensorflow.org) native modules for data parallel training.

## About Ray SGD

The main features of Ray SGD are:

* **Ease of use:** You can scale PyTorch’s native `DistributedDataParallel` and TensorFlow’s `tf.distribute.MirroredStrategy` without the requirement to monitor individual nodes yourself.
* **Composability:** Ray SGD is built on top of the Ray Actor API, enabling seamless integration with existing Ray applications such as RLlib, Tune, and Serve.
* **Scale up and down:** You can start on a single CPU, then scale up to multi-node, multi-CPU, or multi-GPU clusters when needed. All it takes is changing two lines of code.

This [Ray blog post](https://medium.com/distributed-computing-with-ray/faster-and-cheaper-pytorch-with-raysgd-a5a44d4fd220) provides more information on the motivations for Ray SGD, such as the many steps you have to do yourself without it, and how it removes those steps.

## Example - Distributed Training for PyTorch 

This examples is adapted from the [Ray SGD documentation](https://docs.ray.io/en/latest/raysgd/raysgd.html). 

First, we initialize Ray and do the necessary imports, as before.

In [None]:
!../tools/start-ray.sh --check --verbose

In [None]:
import ray
from ray.util.sgd import TorchTrainer
from ray.util.sgd.torch.examples.train_example import LinearDataset

import torch
from torch.utils.data import DataLoader

In [None]:
ray.init(address='auto', ignore_reinit_error=True)

Now define several functions we'll need.

In [None]:
# Create a torch neural network:
def model_creator(config):
    return torch.nn.Linear(1, 1)

# Create an optimizer:
def optimizer_creator(model, config):
    """Returns optimizer."""
    return torch.optim.SGD(model.parameters(), lr=1e-2)

# Create data:
def data_creator(config):
    train_loader = DataLoader(LinearDataset(2, 5), config["batch_size"])
    val_loader = DataLoader(LinearDataset(2, 5), config["batch_size"])
    return train_loader, val_loader

Define a trainer

In [None]:
trainer = TorchTrainer(
    model_creator=model_creator,
    data_creator=data_creator,
    optimizer_creator=optimizer_creator,
    loss_creator=torch.nn.MSELoss,
    use_gpu=False,
    config={"batch_size": 64})

We won't run a full training process, but the following cell shows the core steps.

In [None]:
for i in range(10):
    stats = trainer.train()
    print(f'{i:2d}: {stats}')
trainer.shutdown()