From 28f3e602b679b51cc0b7d2bf13103199964bdadd Mon Sep 17 00:00:00 2001 From: NatalieC323 <127177614+NatalieC323@users.noreply.github.com> Date: Fri, 17 Mar 2023 17:35:43 +0800 Subject: [PATCH 1/4] Update requirements.txt --- examples/images/diffusion/requirements.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/images/diffusion/requirements.txt b/examples/images/diffusion/requirements.txt index d0af35353b66..59d027fcf60f 100644 --- a/examples/images/diffusion/requirements.txt +++ b/examples/images/diffusion/requirements.txt @@ -1,10 +1,10 @@ albumentations==1.3.0 -opencv-python==4.6.0 +opencv-python==4.6.0.66 pudb==2019.2 prefetch_generator imageio==2.9.0 imageio-ffmpeg==0.4.2 -torchmetrics==0.6 +torchmetrics==0.7 omegaconf==2.1.1 test-tube>=0.7.5 streamlit>=0.73.1 From 8fac53a3f140ae432ed96c90ffc1aa0fe9e44468 Mon Sep 17 00:00:00 2001 From: NatalieC323 <127177614+NatalieC323@users.noreply.github.com> Date: Fri, 17 Mar 2023 17:39:10 +0800 Subject: [PATCH 2/4] Update environment.yaml --- examples/images/diffusion/environment.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/images/diffusion/environment.yaml b/examples/images/diffusion/environment.yaml index 5164be72e556..f4b1bebd7fc8 100644 --- a/examples/images/diffusion/environment.yaml +++ b/examples/images/diffusion/environment.yaml @@ -3,7 +3,7 @@ channels: - pytorch - defaults dependencies: - - python=3.9.12 + - python=3.8.16 - pip=20.3 - cudatoolkit=11.3 - pytorch=1.12.1 From 38ebe705af38b8d6f21c95bab81d099ff93d8748 Mon Sep 17 00:00:00 2001 From: NatalieC323 <127177614+NatalieC323@users.noreply.github.com> Date: Fri, 17 Mar 2023 18:26:36 +0800 Subject: [PATCH 3/4] Update README.md --- examples/images/diffusion/README.md | 41 +++++++++++++++++------------ 1 file changed, 24 insertions(+), 17 deletions(-) diff --git a/examples/images/diffusion/README.md b/examples/images/diffusion/README.md index cc57f6d54a8e..34e9dc1a7498 100644 --- a/examples/images/diffusion/README.md +++ b/examples/images/diffusion/README.md @@ -40,15 +40,14 @@ This project is in rapid development. ### Option #1: install from source #### Step 1: Requirements -A suitable [conda](https://conda.io/) environment named `ldm` can be created -and activated with: +To begin with, make sure your operating system has the cuda version suitable for this exciting training session, which is cuda11.6/11.8. For your convience, we have set up the rest of packages here. You can create and activate a suitable [conda](https://conda.io/) environment named `ldm` : ``` conda env create -f environment.yaml conda activate ldm ``` -You can also update an existing [latent diffusion](https://github.com/CompVis/latent-diffusion) environment by running +You can also update an existing [latent diffusion](https://github.com/CompVis/latent-diffusion) environment by running: ``` conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch @@ -57,32 +56,38 @@ pip install transformers diffusers invisible-watermark #### Step 2: install lightning -Install Lightning version later than 2022.01.04. We suggest you install lightning from source. +Install Lightning version later than 2022.01.04. We suggest you install lightning from source. Notice that the default download path of pip should be within the conda environment, or you may need to specify using 'which pip' and redirect the path into conda environment. -##### From Source +##### From Source: ``` git clone https://github.com/Lightning-AI/lightning.git pip install -r requirements.txt python setup.py install ``` -##### From pip +##### From pip: ``` pip install pytorch-lightning ``` -#### Step 3:Install [Colossal-AI](https://colossalai.org/download/) From Our Official Website +#### Step 3:Install [Colossal-AI](https://colossalai.org/download/) From Our Official Website: -##### From pip +You can install the latest version (0.2.7) from our official website or from source. Notice that the suitable version for this training is colossalai(0.2.5), which stands for torch(1.12.1). -For example, you can install v0.2.0 from our official website. +##### Download suggested verision for this training: + +``` +pip install colossalai=0.2.5 +``` + +##### Download the latest version from pip for latest torch version: ``` pip install colossalai ``` -##### From source +##### From source: ``` git clone https://github.com/hpcaitech/ColossalAI.git @@ -92,10 +97,12 @@ cd ColossalAI CUDA_EXT=1 pip install . ``` -#### Step 3:Accelerate with flash attention by xformers(Optional) +#### Step 4:Accelerate with flash attention by xformers(Optional) + +Notice that xformers will accelerate the training process in cost of extra disk space. The suitable version of xformers for this training process is 0.12.0. You can download xformers directly via pip. For more release versions, feel free to check its official website: [XFormers](./https://pypi.org/project/xformers/) ``` -pip install xformers +pip install xformers==0.0.12 ``` ### Option #2: Use Docker @@ -174,8 +181,7 @@ you should the change the `data.file_path` in the `config/train_colossalai.yaml` ## Training -We provide the script `train_colossalai.sh` to run the training task with colossalai, -and can also use `train_ddp.sh` to run the training task with ddp to compare. +We provide the script `train_colossalai.sh` to run the training task with colossalai. Meanwhile, we have enlightened other training process such as DDP model in PyTorch. You can also use `train_ddp.sh` to run the training task with ddp to compare the corresponding performance. In `train_colossalai.sh` the main command is: @@ -193,9 +199,10 @@ python main.py --logdir /tmp/ --train --base configs/train_colossalai.yaml --ckp You can change the trainging config in the yaml file -- devices: device number used for training, default 8 -- max_epochs: max training epochs, default 2 -- precision: the precision type used in training, default 16 (fp16), you must use fp16 if you want to apply colossalai +- devices: device number used for training, default = 8 +- max_epochs: max training epochs, default = 2 +- precision: the precision type used in training, default = 16 (fp16), you must use fp16 if you want to apply colossalai +- placement_policy: the training strategy supported by Colossal AI, defult = 'cuda', which refers to loading all the parameters into cuda memory. On the other hand, 'cpu' refers to 'cpu offload' strategy while 'auto' enables 'Gemini', both featured by Colossal AI. - more information about the configuration of ColossalAIStrategy can be found [here](https://pytorch-lightning.readthedocs.io/en/latest/advanced/model_parallel.html#colossal-ai) From 7b185e5ebcbf84076ea83963f4df41f4c086fb37 Mon Sep 17 00:00:00 2001 From: NatalieC323 <127177614+NatalieC323@users.noreply.github.com> Date: Mon, 20 Mar 2023 11:46:28 +0800 Subject: [PATCH 4/4] Update environment.yaml --- examples/images/diffusion/environment.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/images/diffusion/environment.yaml b/examples/images/diffusion/environment.yaml index ec6bb8a532af..d1ec69c1a585 100644 --- a/examples/images/diffusion/environment.yaml +++ b/examples/images/diffusion/environment.yaml @@ -3,7 +3,7 @@ channels: - pytorch - defaults dependencies: - - python=3.8.16 + - python=3.9.12 - pip=20.3 - cudatoolkit=11.3 - pytorch=1.12.1