# Distributed

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lukeconibear/intro_ml/blob/main/docs/04_distributed.ipynb)

In [1]:
# if you're using colab, then install the required modules
import sys

IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    pass

Examples of how to distribute deep learning on a High Performance Computer (HPC).

## Install Python environments

First, install the Python environments for the required HPC.

## ARC4

### Miniconda installer
```bash
# download miniconda (x86_64 for ARC4)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

# run miniconda, read terms, and set path
. Miniconda3-latest-Linux-x86_64.sh
```

### Conda environment

#### Clone pre-created environments

```bash
# clone - tensorflow 2.7.0 and ray
conda env create --file tf_ray_arc4.yml

# clone - pytorch 1.10 and ray
module load gnu/8.3.0
conda env create --file pytorch_ray_arc4.yml
```

#### Create your own

```bash
# create new - tensorflow 2.7.0 and ray
conda create -n tf_ray_arc4 -c conda-forge python==3.9.* cudatoolkit==11.2.* cudnn==8.1.*
conda activate tf_ray_arc4
pip install -U pip
pip install tensorflow==2.7.0
pip install -U ray
pip install -U ray[tune]

# create new - pytorch (1.10) and ray
module load gnu/8.3.0
conda create -n pytorch_ray_arc4 pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
conda activate pytorch_ray_arc4
pip install -U ray
pip install -U ray[tune]

# create new - jax
conda create -n jax python=3.8 cudatoolkit=11.2 cudatoolkit-dev=11.2 cudnn=8.2
conda activate jax
pip install -U jax
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
```

## Bede

### Miniconda installer
```bash
# Replace <project> with your project code
export DIR=/nobackup/projects/<project>/$USER

# download miniconda (ppc64le for Bede's hardware, not x86_64 as for ARC4)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-ppc64le.sh
 
# run miniconda
sh Miniconda3-latest-Linux-ppc64le.sh -b -p $DIR/miniconda
source miniconda/bin/activate
 
# update conda and set channels
conda update conda -y
conda config --prepend channels conda-forge
conda config --prepend channels https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
conda config --prepend channels https://opence.mit.edu
```

This is what my `~/.condarc` ends up as:
```bash
channel_priority: flexible
channels:
  - https://opence.mit.edu
  - https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
  - conda-forge
  - defaults
```

### Conda environment

#### Clone pre-created environments

```bash
# clone - tensorflow 2.7.0 and ray
conda env create --file tf_bede.yml

# clone - pytorch 1.10 and ray
conda env create --file pytorch_bede.yml

# clone - pytorch 1.9.0, cuda 10.2, and pytorch_geometric 2.0.3
module load gcc # require this for some of the libraries
conda env create --file pytorch_geometric_bede.yml
```

#### Create your own

```bash
# create an environment for pytorch
conda create -n pytorch pytorch torchvision cudatoolkit=10.2
 
# create an environment for tensorflow
conda create -n tf tensorflow

# create an environment for pytorch geometric
module load gcc
conda create -n pytorch_geometric pytorch cudatoolkit=10.2
conda activate pytorch_geometric

pip install torch-scatter
pip install torch-sparse
pip install torch-geometric
pip install torch-cluster
```

## JADE-2

### Miniconda installer

```bash
...
```

### Conda environment

#### Clone pre-created environments

```bash
...
```

#### Create your own

```bash
...
```

## Jupyter Notebook to HPC

It's preferable to use a static job on the HPC. To do this, you could test out different ideas locally in a Jupyter Notebook, then when ready convert this to an executable script (`.py`) and move it over. 

...

## Examples

These examples use [Ray Train](https://docs.ray.io/en/latest/train/train.html) in a static job on a HPC.
Ray handles most of the complexity of distributing the work, with minimal changes to your [TensorFlow](https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras) or [PyTorch](https://pytorch.org/tutorials/beginner/dist_overview.html) code.

- Python script examples:
  - TensorFlow
    - MNIST end-to-end: [`tensorflow_mnist_example.py`](https://github.com/lukeconibear/intro_ml/blob/main/distributed/tensorflow_mnist_example.py).  
    - MNIST tuning: [`tensorflow_tune_mnist_example.py`](https://github.com/lukeconibear/intro_ml/blob/main/distributed/tensorflow_tune_mnist_example.py).  
    - Train linear model with Ray Datasets: [`tensorflow_linear_dataset_example.py`](https://github.com/lukeconibear/intro_ml/blob/main/distributed/tensorflow_linear_dataset_example.py).  
  - PyTorch
    - Linear: [`pytorch_train_linear_example.py`](https://github.com/lukeconibear/intro_ml/blob/main/distributed/pytorch_train_linear_example.py).  
    - Fashion MNIST: [`pytorch_train_fashion_mnist_example.py`](https://github.com/lukeconibear/intro_ml/blob/main/distributed/pytorch_train_fashion_mnist_example.py).  
    - HuggingFace Transformer: [`pytorch_transformers_example.py`](https://github.com/lukeconibear/intro_ml/blob/main/distributed/pytorch_transformers_example.py).  
    - Tune linear model with Ray Datasets: [`pytorch_tune_linear_dataset_example.py`](https://github.com/lukeconibear/intro_ml/blob/main/distributed/pytorch_tune_linear_dataset_example.py).  
- Then submit the job to HPC (choose one and update the Python script within it):
  - [ARC4](https://arcdocs.leeds.ac.uk/systems/arc4.html) (SGE)  
    - CPU: [`ray_train_on_arc4_cpu.bash`](https://github.com/lukeconibear/intro_ml/blob/main/distributed/ray_train_on_arc4_cpu.bash).  
    - GPU: [`ray_train_on_arc4_gpu.bash`](https://github.com/lukeconibear/intro_ml/blob/main/distributed/ray_train_on_arc4_gpu.bash).  
  - [Bede](https://bede-documentation.readthedocs.io/en/latest/) (SLURM)
    - GPU: [`ray_train_on_bede.bash`](https://github.com/lukeconibear/intro_ml/blob/main/distributed/ray_train_on_bede.bash).  
  - [JADE-2](http://docs.jade.ac.uk/en/latest/index.html) (SLURM)
    - GPU: ...



https://keras.io/guides/distributed_training/

Synchronous data-parallel training on all available GPUs:

In [None]:
# distribution_strategy = tf.distribute.MirrorStratergy()
# with distribution_strategy.scope():
#     # Everything that creates variables should be under the strategy scope.
#     # In general this is only model construction and compile()
#     model = build_model()
#     model.compile(optimiser, loss)
#     model.fit(dataset, epochs=epochs, callbacks=callbacks)  

should the `model.fit` call be inside or outside the scope?

#$ -cwd

not

#$ -cwd -V

so have to specific the reproducible environment with the job submission (not copied over from the terminal)

## Exercises

```{admonition} Exercise 1

...

```

## {ref}`Solutions <distributed>`

## Key Points

```{important}

- [x] _..._

```

## Further information

### Good practices

- ...

### Other options

- [Horovod](https://horovod.ai/)
- [DeepSpeed](https://www.deepspeed.ai/)
 
### Resources

- ...