
[**Open in Colab**](https://colab.research.google.com/github/stas00/porting/blob/master/transformers/deepspeed/DeepSpeed_on_colab_CLI.ipynb)

Last modified: Thu 10 Jun 2021 04:06:56 PM

# transformers + deepspeed CLI

This notebook demonstrates how to setup `transformers` + `deepspeed` on colab to be run as an external process.

You can of course use it under any notebook environment.

It's possible to run `transformers` + `deepspeed` inside the notebook as well: 

**XXX**: make another notebook with a demo that isn't CLI



## Setting up the correct environment

In order to run `transformers` with `deepspeed`, you need:
1. enough general RAM. Different users seem to get a instance with different size of allocated general RAM. Try `!free -h` and if your process gets killed, you probably run out of memory. If you can't get enough memory you can turn `cpu_offload` off in `ds_config.json` below.
2. matching cuda versions. Your pytorch needs to be built with the exact cuda version as you system-wide installed cuda. This is normally not needed to run `pytorch` alone, but is needed for building CUDA extensions, like DeepSpeed. You will find full documentation [here](https://huggingface.co/transformers/main_classes/trainer.html#installation-notes).

Since we can't control which cuda version colab has it can be tricky to find the right matching pytorch version. So this notebook will save you time by already showing you all the required versions you need to install.

Surely, this notebook will get outdated in time. So make sure you check for the latest version of it at https://github.com/stas00/porting/blob/master/transformers/deepspeed/ and please let me know if it needs to be updated if deepspeed stops building.

As I mentioned earlier if Deepspeed builds but the training gets killed you got a colab instance with too little RAM. There is no need to contact me then as there is nothing I can do about it.

In [31]:
# Free colab seems to give different amount of general RAM to different users or even the same users at different times.

!free -h

              total        used        free      shared  buff/cache   available
Mem:            12G        561M        6.4G        1.1M        5.8G         11G
Swap:            0B          0B          0B


In [32]:
# check which nvidia drivers and cuda version is running

!nvidia-smi

Thu Jun 10 23:06:30 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [10]:
# need to match the system-wide installed cuda-11 for deepspeed to compile
# so install the matching pytorch

!pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html



Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.8.1+cu111
[?25l  Downloading https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp37-cp37m-linux_x86_64.whl (1982.2MB)
[K     |█████████████▌                  | 834.1MB 1.4MB/s eta 0:13:19tcmalloc: large alloc 1147494400 bytes == 0x55b7f238a000 @  0x7f6d29fa6615 0x55b7b977ccdc 0x55b7b985c52a 0x55b7b977fafd 0x55b7b9870fed 0x55b7b97f3988 0x55b7b97ee4ae 0x55b7b97813ea 0x55b7b97f37f0 0x55b7b97ee4ae 0x55b7b97813ea 0x55b7b97f032a 0x55b7b9871e36 0x55b7b97ef853 0x55b7b9871e36 0x55b7b97ef853 0x55b7b9871e36 0x55b7b97ef853 0x55b7b9871e36 0x55b7b98f43e1 0x55b7b98546a9 0x55b7b97bfcc4 0x55b7b9780559 0x55b7b97f44f8 0x55b7b978130a 0x55b7b97ef3b5 0x55b7b97ee7ad 0x55b7b97813ea 0x55b7b97ef3b5 0x55b7b978130a 0x55b7b97ef3b5
[K     |█████████████████               | 1055.7MB 1.3MB/s eta 0:12:13tcmalloc: large alloc 1434370048 bytes == 0x55b8369e0000 @  0x7f6d29fa6615 0x55b7b977ccdc 0x55b7b985c52a 0x55b7b977fa

In [26]:
# either install the release
#!pip install deepspeed
# or the master 
!pip install git+https://github.com/microsoft/deepspeed

# remove any previously cached deepspeed objects as they can be incompatible with this new build
#!rm -r /root/.cache/torch_extensions/

Collecting git+https://github.com/microsoft/deepspeed
  Cloning https://github.com/microsoft/deepspeed to /tmp/pip-req-build-6zgk1x0y
  Running command git clone -q https://github.com/microsoft/deepspeed /tmp/pip-req-build-6zgk1x0y
  Running command git submodule update --init --recursive -q
Building wheels for collected packages: deepspeed
  Building wheel for deepspeed (setup.py) ... [?25l[?25hdone
  Created wheel for deepspeed: filename=deepspeed-0.4.1+71ecf7e-cp37-none-any.whl size=468578 sha256=f5294c4731cf8c43a6558b285b3b288e2879f047bed761f313a578db5af0d1d5
  Stored in directory: /tmp/pip-ephem-wheel-cache-e8njzjyu/wheels/33/7c/6d/1ac44092dd4e4b5ddd1dec9474fed46ec3fe5588be7b6ffe9e
Successfully built deepspeed


In [33]:
%%bash
git clone https://github.com/huggingface/transformers
cd transformers
# examples change a lot so let's pick a sha that we know this notebook will work with
# comment out/remove the next line if you want the master
git checkout  d2753dcbec712350
pip install -e .
pip install -r examples/pytorch/translation/requirements.txt

# if needed free up some space used by cached pip packages
# rm -rf /root/.cache/pip


Obtaining file:///content/transformers
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: transformers
  Found existing installation: transformers 4.7.0.dev0
    Can't uninstall 'transformers'. No files were found to uninstall.
  Running setup.py develop for transformers
Successfully installed transformers
Collecting py7zr
  Downloading https://files.pythonhosted.org/packages/db/1c/d3e3a80fa8901fc232ec11ec0f2886c7e06cf38f3f40876438ada5659211/py7zr-0.16.1-py3-none-any.whl (65kB)
Collecting pyzstd<0.15.0,>=0.14.4
  Downloading https://files.pythonhosted.org/packages/a3/e9/fe897f8bb96163645a5b2d3a60ff8bfa6fcdedff4691a3c6c861b0324ef4/pyzstd-0.14.4-cp37-cp37m-manylinux2014_x86_64.whl (2.2MB)
Collecting b

fatal: destination path 'transformers' already exists and is not an empty directory.
HEAD is now at d2753dcbe add relevant description to tqdm in examples (#11927)


In [28]:
%%bash

cd transformers

cat <<'EOT' > ds_config.json
{
    "fp16": {
        "enabled": "auto",
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    },

    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto"
        }
    },

    "scheduler": {
        "type": "WarmupLR",
        "params": {
            "warmup_min_lr": "auto",
            "warmup_max_lr": "auto",
            "warmup_num_steps": "auto"
        }
    },

    "zero_optimization": {
        "stage": 2,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "allgather_partitions": true,
        "allgather_bucket_size": 2e8,
        "overlap_comm": true,
        "reduce_scatter": true,
        "reduce_bucket_size": 2e8,
        "contiguous_gradients": true
    },

    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 2000,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}
EOT


In [29]:
#!ls -l transformers
#!cat transformers/ds_config.json

## Running Traning + Evaluation CLI style

In [30]:
!cd transformers; export BS=16; rm -r output_dir; \
PYTHONPATH=src USE_TF=0 CUDA_VISIBLE_DEVICES=0 deepspeed --num_gpus=1 examples/pytorch/translation/run_translation.py \
--model_name_or_path t5-small --output_dir output_dir --adam_eps 1e-06 --evaluation_strategy=steps \
--do_train --do_eval --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 \
--max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir  \
--per_device_train_batch_size $BS --per_device_eval_batch_size $BS --predict_with_generate --sortish_sampler \
--val_max_target_length 128 --warmup_steps 500 --max_train_samples 2000 --max_eval_samples 500 \
--dataset_name wmt16 --dataset_config ro-en --source_lang en --target_lang ro \
--source_prefix "translate English to Romanian: " --deepspeed ds_config.json --fp16

[2021-06-10 23:00:33,483] [INFO] [runner.py:360:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --output_dir output_dir --adam_eps 1e-06 --evaluation_strategy=steps --do_train --do_eval --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 --max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --predict_with_generate --sortish_sampler --val_max_target_length 128 --warmup_steps 500 --max_train_samples 2000 --max_eval_samples 500 --dataset_name wmt16 --dataset_config ro-en --source_lang en --target_lang ro --source_prefix translate English to Romanian:  --deepspeed ds_config.json --fp16
[2021-06-10 23:00:34,618] [INFO] [launch.py:73:main] 0 NCCL_VERSION 2.7.8
[2021-06-10 23:00:34,618] [I