Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[examples] Solving the diffusion issue of incompatibility issue#3169 #3170

Merged
merged 5 commits into from
Mar 20, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 24 additions & 17 deletions examples/images/diffusion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,15 +40,14 @@ This project is in rapid development.
### Option #1: install from source
#### Step 1: Requirements

A suitable [conda](https://conda.io/) environment named `ldm` can be created
and activated with:
To begin with, make sure your operating system has the cuda version suitable for this exciting training session, which is cuda11.6/11.8. For your convience, we have set up the rest of packages here. You can create and activate a suitable [conda](https://conda.io/) environment named `ldm` :

```
conda env create -f environment.yaml
conda activate ldm
```

You can also update an existing [latent diffusion](https://github.com/CompVis/latent-diffusion) environment by running
You can also update an existing [latent diffusion](https://github.com/CompVis/latent-diffusion) environment by running:

```
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
Expand All @@ -57,32 +56,38 @@ pip install transformers diffusers invisible-watermark

#### Step 2: install lightning

Install Lightning version later than 2022.01.04. We suggest you install lightning from source.
Install Lightning version later than 2022.01.04. We suggest you install lightning from source. Notice that the default download path of pip should be within the conda environment, or you may need to specify using 'which pip' and redirect the path into conda environment.

##### From Source
##### From Source:
```
git clone https://github.com/Lightning-AI/lightning.git
pip install -r requirements.txt
python setup.py install
```

##### From pip
##### From pip:
NatalieC323 marked this conversation as resolved.
Show resolved Hide resolved

```
pip install pytorch-lightning
```

#### Step 3:Install [Colossal-AI](https://colossalai.org/download/) From Our Official Website
#### Step 3:Install [Colossal-AI](https://colossalai.org/download/) From Our Official Website:

##### From pip
You can install the latest version (0.2.7) from our official website or from source. Notice that the suitable version for this training is colossalai(0.2.5), which stands for torch(1.12.1).

For example, you can install v0.2.0 from our official website.
##### Download suggested verision for this training:

```
pip install colossalai=0.2.5
```

##### Download the latest version from pip for latest torch version:

```
pip install colossalai
```

##### From source
##### From source:

```
git clone https://github.com/hpcaitech/ColossalAI.git
Expand All @@ -92,10 +97,12 @@ cd ColossalAI
CUDA_EXT=1 pip install .
```

#### Step 3:Accelerate with flash attention by xformers(Optional)
#### Step 4:Accelerate with flash attention by xformers(Optional)

Notice that xformers will accelerate the training process in cost of extra disk space. The suitable version of xformers for this training process is 0.12.0. You can download xformers directly via pip. For more release versions, feel free to check its official website: [XFormers](./https://pypi.org/project/xformers/)

```
pip install xformers
pip install xformers==0.0.12
```

### Option #2: Use Docker
Expand Down Expand Up @@ -174,8 +181,7 @@ you should the change the `data.file_path` in the `config/train_colossalai.yaml`

## Training

We provide the script `train_colossalai.sh` to run the training task with colossalai,
and can also use `train_ddp.sh` to run the training task with ddp to compare.
We provide the script `train_colossalai.sh` to run the training task with colossalai. Meanwhile, we have enlightened other training process such as DDP model in PyTorch. You can also use `train_ddp.sh` to run the training task with ddp to compare the corresponding performance.

In `train_colossalai.sh` the main command is:

Expand All @@ -193,9 +199,10 @@ python main.py --logdir /tmp/ --train --base configs/train_colossalai.yaml --ckp

You can change the trainging config in the yaml file

- devices: device number used for training, default 8
- max_epochs: max training epochs, default 2
- precision: the precision type used in training, default 16 (fp16), you must use fp16 if you want to apply colossalai
- devices: device number used for training, default = 8
- max_epochs: max training epochs, default = 2
- precision: the precision type used in training, default = 16 (fp16), you must use fp16 if you want to apply colossalai
- placement_policy: the training strategy supported by Colossal AI, defult = 'cuda', which refers to loading all the parameters into cuda memory. On the other hand, 'cpu' refers to 'cpu offload' strategy while 'auto' enables 'Gemini', both featured by Colossal AI.
- more information about the configuration of ColossalAIStrategy can be found [here](https://pytorch-lightning.readthedocs.io/en/latest/advanced/model_parallel.html#colossal-ai)


Expand Down
2 changes: 1 addition & 1 deletion examples/images/diffusion/environment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ channels:
- pytorch
- defaults
dependencies:
- python=3.9.12
- python=3.8.16
NatalieC323 marked this conversation as resolved.
Show resolved Hide resolved
- pip=20.3
- cudatoolkit=11.3
- pytorch=1.12.1
Expand Down
4 changes: 2 additions & 2 deletions examples/images/diffusion/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
albumentations==1.3.0
opencv-python==4.6.0
opencv-python==4.6.0.66
pudb==2019.2
prefetch_generator
imageio==2.9.0
imageio-ffmpeg==0.4.2
torchmetrics==0.6
torchmetrics==0.7
omegaconf==2.1.1
test-tube>=0.7.5
streamlit>=0.73.1
Expand Down