Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple GPU setup help #812

Open
BotLifeGamer opened this issue Sep 9, 2023 · 8 comments
Open

Multiple GPU setup help #812

BotLifeGamer opened this issue Sep 9, 2023 · 8 comments

Comments

@BotLifeGamer
Copy link

Hello I haven't found a guide for Multiple gpu setup for Kohya has anyone got a step by step guide I keep getting errors trying to go by this on my own. There is no clear guide for this. be greatly appreciated if someone can guide me in the right direction.

@BootsofLagrangian
Copy link
Contributor

aceelerate launch --num_processes=[NUM_YOUR_GPUS_PER_MACHINE] --num_machines=[NUM_YOUR_INDEPENDENT_MACHINES] --multi_gpus --gpu_ids=[GPU_IDS] "train_network.py" args...

If you have 4 gpus and one machine, give args as
accelerate launch --num_processes=4 --multi_gpu --num_machines=1 --gpu_ids=0,1,2,3 "train_network.py" args...

@BotLifeGamer
Copy link
Author

BotLifeGamer commented Sep 9, 2023

aceelerate launch --num_processes=[NUM_YOUR_GPUS_PER_MACHINE] --num_machines=[NUM_YOUR_INDEPENDENT_MACHINES] --multi_gpus --gpu_ids=[GPU_IDS] "train_network.py" args...

If you have 4 gpus and one machine, give args as accelerate launch --num_processes=4 --multi_gpu --num_machines=1 --gpu_ids=0,1,2,3 "train_network.py" args...

Thanks for the reply I'm slowly learning everything as I go along me and another friend spent hrs trying to figure it out before I asked read previous posts. So where does the arg go into what file into train_network.py?

@NEXTAltair
Copy link

paperspace gradiertを使ってA6000二枚で学習させる場合は、"accelerate config"をターミナルから設定すれば"bmaltais/kohya_ss"での学習が実行できた
"sd-scripts"で学習させる場合でも引数を設定せずに"accelerate"を使えば複数GPUに対応できた経験がある

"When using Paperspace Gradient with two A6000 GPUs for training, by initiating accelerate config from the terminal, training with bmaltais/kohya_ss became possible. Also, when training with sd-scripts, I recall being able to support multiple GPUs by using accelerate without setting specific arguments.

What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:all

----------------------------------------------------------------------------------------------------------------------------------------------------------------In which compute environment are you running?                                                                                                                   
This machine                                                                                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------Which type of machine are you using?                                                                                                                            
No distributed training                                                                                                                                         
Do you want to run your training on CPU only (even if a GPU / Apple Silicon device is available)? [yes/NO]:NO                                                   
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO                                                                                               
Do you want to use DeepSpeed? [yes/NO]: NO                                                                                                                      
**What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:all**                                                            
----------------------------------------------------------------------------------------------------------------------------------------------------------------Do you wish to use FP16 or BF16 (mixed precision)?                                                                                                              
fp16   

@BootsofLagrangian
Copy link
Contributor

aceelerate launch --num_processes=[NUM_YOUR_GPUS_PER_MACHINE] --num_machines=[NUM_YOUR_INDEPENDENT_MACHINES] --multi_gpus --gpu_ids=[GPU_IDS] "train_network.py" args...
If you have 4 gpus and one machine, give args as accelerate launch --num_processes=4 --multi_gpu --num_machines=1 --gpu_ids=0,1,2,3 "train_network.py" args...

Thanks for the reply I'm slowly learning everything as I go along me and another friend spent hrs trying to figure it out before I asked read previous posts. So where does the arg go into what file into train_network.py?

You can identify args of train_network.py using following command line in terminal or prompt in sd-scripts directory.

python train_network.py -h

And if you want to use multi-gpus in sd-scripts, you need to know what accelerate library is.

@BotLifeGamer
Copy link
Author

as accelerate launch --num_processes=4 --multi_gpu --num_machines=1 --gpu_ids=0,1,2,3 "train_network.py"

Does this look like I'm not the right path??

D:\Kohya_ss\kohya_ss>accelerate launch --num_processes=2 --multi_gpu --num_machines=1 --gpu_ids=0,1 "train_network.py" -- --resolution 1024
NOTE: Redirects are currently not supported in Windows or MacOs.
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [AIBOT]:29500 (system error: 10049 - The requested address is not valid in its context.).
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [AIBOT]:29500 (system error: 10049 - The requested address is not valid in its context.).
prepare tokenizer
prepare tokenizer
Using DreamBooth method.
Using DreamBooth method.
prepare images.
0 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (1024, 1024)
enable_bucket: False

[Dataset 0]
loading image sizes.
0it [00:00, ?it/s]
prepare dataset
No data found. Please verify arguments (train_data_dir must be the parent of folders with images) / 画像がありません。引数指定を確認してください(train_data_dirには画像があるフォルダではなく、画像があるフォル ダの親フォルダを指定する必要があります)
prepare images.
0 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (1024, 1024)
enable_bucket: False

[Dataset 0]
loading image sizes.
0it [00:00, ?it/s]
prepare dataset
No data found. Please verify arguments (train_data_dir must be the parent of folders with images) / 画像がありません。引数指定を確認してください(train_data_dirには画像があるフォルダではなく、画像があるフォル ダの親フォルダを指定する必要があります)

@BootsofLagrangian
Copy link
Contributor

BootsofLagrangian commented Sep 16, 2023

@BotLifeGamer

Here is a example command lines for training lora

accelerate launch --num_processes=2 --multi_gpu --num_machines=1 --gpu_ids=0,1 "train_network.py" --pretrained_model_name_or_path=[huggingface_path or base model path to use] --network_module=networks.lora --save_model_as=safetensors --caption_extension=".txt" --seed="42" --training_comment=[some comment ] --output_name=[output_model_name] --train_data_dir=./training/img --output_dir=./training/model --logging_dir=./training/logs --logging_dir=./training/logs --network_alpha=[LINEAR_ALPHA] --network_dim=[LINEAR_RANK] --network_args "conv_rank=[CONV_RANK]" "conv_alpha=[CONV_ALPHA]" --resolution=%RESOLUTION% --train_batch_size=%BATCH_SIZE% --learning_rate=%LEARNING_RATE% --unet_lr=%UNET_LR% --text_encoder_lr=%TE_LR% --max_train_steps=%TRAINING_STEP% --lr_warmup_steps=%WARMUP_STEP% --save_every_n_epochs=1 --lr_scheduler=%LR_SCHEDULER% --lr_scheduler_num_cycles=%LR_CYCLES% --optimizer_type=%OPTIMIZER% --optimizer_args %OPTIMIZER_ARGS% --max_grad_norm=1.0 --noise_offset=%NOISE_OFFSET% --mixed_precision=%PRECISION% --save_precision=%PRECISION% --enable_bucket --bucket_no_upscale --random_crop --bucket_reso_steps=%BUCKET_RESO_STEPS% --max_token_length=225 --shuffle_caption --xformers --gradient_checkpointing --persistent_data_loader_workers

If you want to do full fine tuning model, use "fine_tune.py" instead of "train_network.py"

@Charmandrigo
Copy link

what is the setup for two machines on the same network? I am failing to get that part setup, my second machine seems to be right, but the main one I have no idea what to place on the ip and port because when I run a training it says the port is already on use (by the kohya ui itself running on main)

@BootsofLagrangian
Copy link
Contributor

@Charmandrigo
Sorry for that I have only experience of one machine training. But I think accelerate support multi-machine training.
If you run accelerate config, you can find options for multi-machine training for DDP.
And from now, kohya's sd-scripts supports only DDP, not ZeRO or FSDP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants