
◄ [**Open in Colab**](https://colab.research.google.com/github/stas00/porting/blob/master/transformers/deepspeed/DeepSpeed_on_colab_CLI.ipynb) ►

**Last modified: Mon 21 Mar 2022 03:24:37 PM PDT**

# transformers + deepspeed CLI

This notebook demonstrates how to setup `transformers` + `deepspeed` on colab to be run as an external process.

You can of course use it under any notebook environment.

It's possible to run `transformers` + `deepspeed` inside the notebook as well:

**XXX**: make another notebook with a demo that isn't CLI



## Setting up the correct environment

In order to run `transformers` with `deepspeed`, you need:
1. enough general RAM. Different users seem to get a instance with different size of allocated general RAM. Try `!free -h` and if your process gets killed, you probably run out of memory. If you can't get enough memory you can turn `cpu_offload` off in `ds_config.json` below.
2. matching cuda versions. Your pytorch needs to be built with the same major cuda version as you system-wide installed cuda. This is normally not needed to run `pytorch` alone, but is needed for building CUDA extensions, like DeepSpeed. You will find full documentation [here](https://huggingface.co/transformers/main_classes/trainer.html#installation-notes).

Since we can't control which cuda version colab has it can be tricky to find the right matching pytorch version. So this notebook will save you time by already showing you all the required versions you need to install.

Surely, this notebook will get outdated in time. So make sure you check for the latest version of it at https://github.com/stas00/porting/blob/master/transformers/deepspeed/ and please let me know if it needs to be updated if deepspeed stops building.

As I mentioned earlier if Deepspeed builds but the training gets killed you got a colab instance with too little RAM. There is no need to contact me then as there is nothing I can do about it.

In [1]:
# Free colab seems to give different amount of general RAM to different users or even the same users at different times.

!free -h

               total        used        free      shared  buff/cache   available
Mem:            12Gi       677Mi       9.6Gi       1.0Mi       2.4Gi        11Gi
Swap:             0B          0B          0B


In [2]:
# check which nvidia drivers and cuda version is running

!nvidia-smi

Fri Nov 17 06:53:45 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P8     9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [26]:
# need to match the system-wide installed cuda-11 for deepspeed to compile
# so install the matching pytorch

# pt-1.8.1 works too
# !pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

# pt-1.11
!pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html



Looking in links: https://download.pytorch.org/whl/cu113/torch_stable.html


In [4]:
# either install the release
#!pip install deepspeed
# or the master
!pip install git+https://github.com/microsoft/deepspeed

# remove any previously cached deepspeed objects as they can be incompatible with this new build
#!rm -r /root/.cache/torch_extensions/

Collecting git+https://github.com/microsoft/deepspeed
  Cloning https://github.com/microsoft/deepspeed to /tmp/pip-req-build-pr5hqgmr
  Running command git clone --filter=blob:none --quiet https://github.com/microsoft/deepspeed /tmp/pip-req-build-pr5hqgmr
  Resolved https://github.com/microsoft/deepspeed to commit a3926bbbf6d0025b5c6076a280e6b91ebd08aada
  Running command git submodule update --init --recursive -q
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting hjson (from deepspeed==0.12.4+a3926bbb)
  Downloading hjson-3.1.0-py3-none-any.whl (54 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.0/54.0 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ninja (from deepspeed==0.12.4+a3926bbb)
  Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.2/307.2 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
Collecting pynvml (from deepspeed

In [20]:
%%bash

pip install datasets transformers trl bitsandbytes

Collecting bitsandbytes
  Downloading bitsandbytes-0.41.2.post2-py3-none-any.whl (92.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 92.6/92.6 MB 2.1 MB/s eta 0:00:00
Installing collected packages: bitsandbytes
Successfully installed bitsandbytes-0.41.2.post2


In [24]:
!pip install huggingface_hub

from huggingface_hub import notebook_login

notebook_login()



VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Running Traning + Evaluation CLI style

In [28]:
!export BS=16; rm -rf output_dir; \
PYTHONPATH=src USE_TF=0 CUDA_VISIBLE_DEVICES=0 deepspeed --num_gpus=1 finetune_llama.py --per_device_train_batch_size 4 --per_device_eval_batch_size 1 # --deepspeed ds_config.json --fp16

[2023-11-17 08:11:23,103] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Detected CUDA_VISIBLE_DEVICES=0 but ignoring it because one or several of --include/--exclude/--num_gpus/--num_nodes cl args were used. If you want to use CUDA_VISIBLE_DEVICES don't pass any of these arguments to deepspeed.
[2023-11-17 08:11:25,760] [INFO] [runner.py:570:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None finetune_llama.py --per_device_train_batch_size 4 --per_device_eval_batch_size 1
[2023-11-17 08:11:27,170] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-17 08:11:31,047] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.15.5-1+cuda11.8
[2023-11-17 08:11:31,047] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.15.5-1
[2023-11-17 08:11:31,047] [INFO

In [30]:
import torch
print(torch.__version__)


1.11.0+cu113
