Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple GPU Lora training is not working. #838

Closed
leonary opened this issue Sep 28, 2023 · 1 comment
Closed

Multiple GPU Lora training is not working. #838

leonary opened this issue Sep 28, 2023 · 1 comment

Comments

@leonary
Copy link

leonary commented Sep 28, 2023

I have successfully configured two A40 graphics cards to perform Lora training. During training, both cards are observed to be utilized, but the training speed does not improve significantly. The time required for training is almost the same as using a single card, and the number of epochs increases from 1 to 2. Furthermore, the training results achieved with two cards (the capability displayed by Lora) are even worse than those obtained with a single card.
I would like to know if it is possible to accelerate Lora training using multiple cards. If so, what should I do? Apart from setting the Accelerate config, are there any additional steps required?

@kohya-ss
Copy link
Owner

kohya-ss commented Oct 1, 2023

In multiple GPU training, the number of the images multiplied by GPU count is trained with single step. So it is recommended to use --max_train_epochs for training same amount as the single GPU training.

For the result of LoRA, I think it may be overfitted by multiple GPU training.

@leonary leonary closed this as completed Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants