Skip to content

Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA#1416

Merged
younesbelkada merged 14 commits into
huggingface:mainfrom
pacman100:smangrul/fsdp+qlora
Mar 13, 2024
Merged

Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA#1416
younesbelkada merged 14 commits into
huggingface:mainfrom
pacman100:smangrul/fsdp+qlora

Conversation

@pacman100

Copy link
Copy Markdown
Contributor

What does this PR do?

  1. prepare_model_for_kbit_training and peft_module_casting_to_bf16 should be disabled when using FSDP+QLoRA or DeepSpeed ZeRO3+QLoRA.

This PR should be merged after Transformers PR huggingface/transformers#29587

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@pacman100 pacman100 marked this pull request as ready for review March 12, 2024 14:05
@pacman100

Copy link
Copy Markdown
Contributor Author

cc @younesbelkada

@younesbelkada younesbelkada left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much !

Comment thread trl/trainer/sft_trainer.py
pacman100 and others added 3 commits March 13, 2024 13:02
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
@younesbelkada

Copy link
Copy Markdown
Contributor

I can't repro the CI failure locally and on main, will fix in a follow up PR!

@younesbelkada younesbelkada merged commit 58c0888 into huggingface:main Mar 13, 2024
kashif pushed a commit that referenced this pull request Mar 14, 2024
* don't do mp casting

* don't use `prepare_for_kbit` when using fsdp+qlora or dsz3+qlora

* changes to enable fsdp+qlora and dsz3+qlora

* revert

* Update sft_trainer.py

* quality

* fix deprecation using changes from PR #1415

* fixes

* quality

* Update trl/trainer/sft_trainer.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* quality

* relaunch tests

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
kashif pushed a commit to fe1ixxu/trl that referenced this pull request Mar 15, 2024
* don't do mp casting

* don't use `prepare_for_kbit` when using fsdp+qlora or dsz3+qlora

* changes to enable fsdp+qlora and dsz3+qlora

* revert

* Update sft_trainer.py

* quality

* fix deprecation using changes from PR huggingface#1415

* fixes

* quality

* Update trl/trainer/sft_trainer.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* quality

* relaunch tests

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024
* don't do mp casting

* don't use `prepare_for_kbit` when using fsdp+qlora or dsz3+qlora

* changes to enable fsdp+qlora and dsz3+qlora

* revert

* Update sft_trainer.py

* quality

* fix deprecation using changes from PR huggingface#1415

* fixes

* quality

* Update trl/trainer/sft_trainer.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* quality

* relaunch tests

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
yxliu-TAMU pushed a commit to mincheolseong/ECEN743-GRPO-Project-Proposal that referenced this pull request Apr 20, 2025
* don't do mp casting

* don't use `prepare_for_kbit` when using fsdp+qlora or dsz3+qlora

* changes to enable fsdp+qlora and dsz3+qlora

* revert

* Update sft_trainer.py

* quality

* fix deprecation using changes from PR huggingface#1415

* fixes

* quality

* Update trl/trainer/sft_trainer.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* quality

* relaunch tests

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants