Implement DoRA #1474

BenjaminBossan · 2024-02-16T17:31:20Z

State:

Only Linear layer supported, not conv or embedding
Quantized layers not supported

HuggingFaceDocBuilderDev · 2024-02-16T17:35:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

nbasyl · 2024-02-19T10:32:31Z

@BenjaminBossan Hi Benjamin, may I have your email, I am the author of DoRA, and would like to help with integrating DoRA into PEFT.

BenjaminBossan · 2024-02-19T11:18:27Z

Thanks @nbasyl. You can contact me at benjamin@ + HF domain (to clarify: huggingface.co).

Might make sense to refactor this in the future to avoid a signficant amount of code duplication.

BenjaminBossan · 2024-02-20T14:14:15Z

@nbasyl Did you reach out yet? I didn't get any mail so far.

nbasyl · 2024-02-20T14:34:46Z

@BenjaminBossan Sorry for the late reply, was consulting with my manager regrading this, I just sent out the email, please check!

RonanKMcGovern · 2024-02-21T12:58:22Z

Great this is being added, is the PR at a point where I can test it out? Seems like I just add use_dora=True in the Lora config?

(btw, does LoRA alpha become redundant then if use_dora is True)?

BenjaminBossan · 2024-02-21T16:46:33Z

@RonanKMcGovern There are still a few smaller kinks to work out, so don't test unless you're ready to test again once the next commits roll in.

LoRA alpha should still be relevant when DoRA is being used.

RonanKMcGovern · 2024-02-22T10:38:55Z

Just gave this a quick spin and seems to do better on ppl than LoRA with the same r. Thanks for the nice work.

BenjaminBossan · 2024-02-22T11:57:01Z

Just gave this a quick spin and seems to do better on ppl than LoRA with the same r. Thanks for the nice work.

Thanks, sounds fantastic. If you can share anything like scripts, that would be great.

aliencaocao · 2024-02-22T12:27:52Z

Hi @nbasyl do you think the concept of differential LR by Lora+ (https://arxiv.org/abs/2402.12354) can be integrated into DoRa? Both paper came out almost exact same time and they dont seem mutually exclusive.

BenjaminBossan · 2024-02-22T13:19:58Z

@aliencaocao I skimmed the paper and code and I think there is no technical limitation to combining the two. I don't know if the gains from the two approaches would be additive though.

RonanKMcGovern · 2024-02-22T13:58:06Z

https://arxiv.org/abs/2402.12354

very nice, is there a pr for this yet for HF?

RonanKMcGovern · 2024-02-22T13:59:52Z

Just gave this a quick spin and seems to do better on ppl than LoRA with the same r. Thanks for the nice work.

Thanks, sounds fantastic. If you can share anything like scripts, that would be great.

I'm using the huggingface trainer and this LoRA config for Qwen models:

from peft import LoraConfig, get_peft_model

# Initialize LoRA configuration
config = LoraConfig(
    r=8, 
    lora_alpha=32, 
    target_modules=[
      "q_proj",
      "k_proj",
      "v_proj",
      "o_proj",
      # "self_attn.rotary_emb.inv_freq",
      "gate_proj",
      "up_proj",
      "down_proj",
      # "input_layernorm.weight",
      # "post_attention_layernorm.weight",
      # "model.norm.weight",
      # "lm_head.weight"
    ],
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM",
    use_dora=True
)

BenjaminBossan · 2024-02-22T15:15:29Z

very nice, is there a pr for this yet for HF?

The code is published, there is a function here that you can simply call on your LoRA model and optimizer class:

https://github.com/nikhil-ghosh-berkeley/loraplus/blob/c8d30388f5d1ec6d8c93ff67330e4d36937d48dc/loraplus.py#L31

So far, the PEFT repo does not contain any code related to training directly, as we want to keep it agnostic with regard to the training method. If there is a big interest though, we could think about adding this function as a helper function for convenience. But that's a different topic than this PR ;)

BenjaminBossan · 2024-02-27T14:03:02Z

Thanks @RonanKMcGovern. It would be much appreciated if you could try if the last few changes helped with the runtime performance.

aliencaocao · 2024-02-28T04:55:08Z

@BenjaminBossan On a RTX 3090 finetuning llava1.6 mistral 7b, I get half the throughput as without dora but with lora.

sayakpaul · 2024-02-28T04:56:33Z

And with DoRA?

aliencaocao · 2024-02-28T04:58:36Z

What i meant was, with dora is half the speed as without dora but with lora

sayakpaul · 2024-02-28T04:59:29Z

Oh okay. On the same rank?

sayakpaul · 2024-02-28T05:00:07Z

If not, maybe try with a lower rank which should hopefully compensate for the throughput without disturbing the performance.

aliencaocao · 2024-02-28T05:00:25Z

Yes same rank

nbasyl · 2024-02-28T07:36:12Z

Hi, one way to further reduce the GPU cost (# of trainable parameters) as well as the training time is to only finetune the magnitude component of certain modules while finetuning both the magnitude and directional component for the remaining modules. You can refer to Sec.5.6 of the paper for more details. This feature can be added in another PR to give user more flexibility.

aliencaocao · 2024-02-28T07:37:28Z

so is double the cost to be expected here because there is double the trainable params?

nbasyl · 2024-02-28T08:26:35Z

If the same rank is utilized, the number of trainable parameters is only slightly higher, approximately 0.01%, compared to LoRA. However, the current computation overhead arises from the need to calculate the weight's norm as well as the calculation of the second term in the given formulation (can't reuse the first term (base_result) due to dropout alignment issue).

nbasyl · 2024-02-28T08:32:31Z

Hi, one way to further reduce the GPU cost (# of trainable parameters) as well as the training time is to only finetune the magnitude component of certain modules while finetuning both the magnitude and directional component for the remaining modules. You can refer to Sec.5.6 of the paper for more details. This feature can be added in another PR to give user more flexibility.

@BenjaminBossan, I have already implemented this feature. Once our code is released (should be released soon), you can take a look at it and see if you can integrate this functionality with another pull request (PR). Or we can discuss via email to start working on this new PR earlier.

BenjaminBossan · 2024-02-28T09:55:13Z

On a RTX 3090 finetuning llava1.6 mistral 7b, I get half the throughput as without dora but with lora.

Thanks for the feedback. On my puny 2060 with bloomz-560m and fp16, I got 15-20% slowdown during training with DoRA enabled, same rank. So there seem to be pretty big differences based on the exact model or settings. @nbasyl do you have some numbers that you could share?

one way to further reduce the GPU cost (# of trainable parameters) as well as the training time is to only finetune the magnitude component of certain modules while finetuning both the magnitude and directional component for the remaining modules. You can refer to Sec.5.6 of the paper for more details.

It would make the whole thing more complex. Are the gains big enough to be worth it?

I'll definitely take a closer look and check what else we can do to improve runtime performance while keeping the code flexible and maintainable.

RonanKMcGovern · 2024-02-28T12:38:50Z

BTW, this merge is causing some breaking changes on packages using the code of peft, such as unsloth: unslothai/unsloth#201

TypeError                                 Traceback (most recent call last)
Cell In[23], line 2
      1 # Do model patching and add fast LoRA weights
----> 2 model = FastLanguageModel.get_peft_model(
      3     model,
      4     r = 8,
      5     target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
      6                       "gate_proj", "up_proj", "down_proj",],
      7     lora_alpha = 32,
      8     lora_dropout = 0, # Dropout = 0 is currently optimized
      9     bias = "none",    # Bias = "none" is currently optimized
     10     use_gradient_checkpointing = True,
     11     random_state = 3407,
     12 )

File /usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py:1313, in FastLlamaModel.get_peft_model(model, r, target_modules, lora_alpha, lora_dropout, bias, layers_to_transform, layers_pattern, use_gradient_checkpointing, random_state, max_seq_length, use_rslora, init_lora_weights, loftq_config, **kwargs)
   1310 if not SUPPORTS_RSLORA: del arguments["use_rslora"]
   1312 lora_config = LoraConfig(**arguments)
-> 1313 model = _get_peft_model(model, lora_config)
   1315 model = FastLlamaModel.patch_peft_model(model, use_gradient_checkpointing)
   1316 return model

File /usr/local/lib/python3.10/dist-packages/peft/mapping.py:136, in get_peft_model(model, peft_config, adapter_name, mixed)
    134 if peft_config.is_prompt_learning:
    135     peft_config = _prepare_prompt_learning_config(peft_config, model_config)
--> 136 return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)

File /usr/local/lib/python3.10/dist-packages/peft/peft_model.py:1059, in PeftModelForCausalLM.__init__(self, model, peft_config, adapter_name)
   1058 def __init__(self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default") -> None:
-> 1059     super().__init__(model, peft_config, adapter_name)
   1060     self.base_model_prepare_inputs_for_generation = self.base_model.prepare_inputs_for_generation

File /usr/local/lib/python3.10/dist-packages/peft/peft_model.py:126, in PeftModel.__init__(self, model, peft_config, adapter_name)
    124     self._peft_config = None
    125     cls = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type]
--> 126     self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
    127     self.set_additional_trainable_modules(peft_config, adapter_name)
    129 if getattr(model, "is_gradient_checkpointing", True):

File /usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py:111, in LoraModel.__init__(self, model, config, adapter_name)
    110 def __init__(self, model, config, adapter_name) -> None:
--> 111     super().__init__(model, config, adapter_name)

File /usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py:147, in BaseTuner.__init__(self, model, peft_config, adapter_name)
    144         self.peft_config.update(peft_config)
    146 self.active_adapter = adapter_name
--> 147 self.inject_adapter(self.model, adapter_name)
    149 # Copy the peft_config in the injected model.
    150 self.model.peft_config = self.peft_config

File /usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py:302, in BaseTuner.inject_adapter(self, model, adapter_name)
    300     is_target_modules_in_base_model = True
    301     parent, target, target_name = _get_submodules(model, key)
--> 302     self._create_and_replace(peft_config, adapter_name, target, target_name, parent, current_key=key)
    304 if not is_target_modules_in_base_model:
    305     raise ValueError(
    306         f"Target modules {peft_config.target_modules} not found in the base model. "
    307         f"Please check the target modules and try again."
    308     )

File /usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py:182, in LoraModel._create_and_replace(self, lora_config, adapter_name, target, target_name, parent, current_key)
    172     target.update_layer(
    173         adapter_name,
    174         r,
   (...)
    179         use_dora=lora_config.use_dora,
    180     )
    181 else:
--> 182     new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs)
    183     if adapter_name != self.active_adapter:
    184         # adding an additional adapter: it is not automatically trainable
    185         new_module.requires_grad_(False)

File /usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py:257, in LoraModel._create_new_module(lora_config, adapter_name, target, **kwargs)
    255 new_module = None
    256 for dispatcher in dispatchers:
--> 257     new_module = dispatcher(target, adapter_name, lora_config=lora_config, **kwargs)
    258     if new_module is not None:  # first match wins
    259         break

File /usr/local/lib/python3.10/dist-packages/peft/tuners/lora/layer.py:858, in dispatch_default(target, adapter_name, lora_config, **kwargs)
    856         kwargs["fan_in_fan_out"] = lora_config.fan_in_fan_out = False
    857     kwargs.update(lora_config.loftq_config)
--> 858     new_module = Linear(target, adapter_name, **kwargs)
    859 elif isinstance(target_base_layer, Conv1D):
    860     if not kwargs["fan_in_fan_out"]:

File /usr/local/lib/python3.10/dist-packages/peft/tuners/lora/layer.py:289, in Linear.__init__(self, base_layer, adapter_name, r, lora_alpha, lora_dropout, fan_in_fan_out, is_target_conv_1d_layer, init_lora_weights, use_rslora, use_dora, **kwargs)
    286 self.fan_in_fan_out = fan_in_fan_out
    288 self._active_adapter = adapter_name
--> 289 self.update_layer(
    290     adapter_name,
    291     r,
    292     lora_alpha=lora_alpha,
    293     lora_dropout=lora_dropout,
    294     init_lora_weights=init_lora_weights,
    295     use_rslora=use_rslora,
    296     use_dora=use_dora,
    297 )
    298 self.is_target_conv_1d_layer = is_target_conv_1d_layer

TypeError: LoraLayer_update_layer() got an unexpected keyword argument 'use_dora'

nbasyl · 2024-02-29T07:54:24Z

On a RTX 3090 finetuning llava1.6 mistral 7b, I get half the throughput as without dora but with lora.

Thanks for the feedback. On my puny 2060 with bloomz-560m and fp16, I got 15-20% slowdown during training with DoRA enabled, same rank. So there seem to be pretty big differences based on the exact model or settings. @nbasyl do you have some numbers that you could share?

one way to further reduce the GPU cost (# of trainable parameters) as well as the training time is to only finetune the magnitude component of certain modules while finetuning both the magnitude and directional component for the remaining modules. You can refer to Sec.5.6 of the paper for more details.

It would make the whole thing more complex. Are the gains big enough to be worth it?

I'll definitely take a closer look and check what else we can do to improve runtime performance while keeping the code flexible and maintainable.

I have tried finetuning LLaMA-7B/13B on 3090/4090/V100/A100 and only also got around 20% slow down. I am assuming that such drastic slowdown for @aliencaocao is probably caused by the optimization problem of deepspeed which is used by the LLaVA code base.

nbasyl · 2024-02-29T07:59:51Z

On a RTX 3090 finetuning llava1.6 mistral 7b, I get half the throughput as without dora but with lora.

Thanks for the feedback. On my puny 2060 with bloomz-560m and fp16, I got 15-20% slowdown during training with DoRA enabled, same rank. So there seem to be pretty big differences based on the exact model or settings. @nbasyl do you have some numbers that you could share?

one way to further reduce the GPU cost (# of trainable parameters) as well as the training time is to only finetune the magnitude component of certain modules while finetuning both the magnitude and directional component for the remaining modules. You can refer to Sec.5.6 of the paper for more details.

It would make the whole thing more complex. Are the gains big enough to be worth it?

I'll definitely take a closer look and check what else we can do to improve runtime performance while keeping the code flexible and maintainable.

The gain here is not the accuracy improvement, but in the reduction of trainable parameters. You can refer to this table.

Besides. for my case, I implemented this feature under the PEFT package framework and didn't modify the code too much, so I think it wouldn't be as complicated as you thought.

peterjc123 · 2024-02-29T08:56:41Z

According to my experiment, DoRA is ~2x slower than LoRA, and I am able to achieve lower loss with DoRA. But it is a little bit not worth it compared to training with QLoRA on a larger model because that gives a much lower loss, the GPU memory usage is slighter larger and it is only 20%-30% slower.
Some numbers & settings
GPU: 1x Nvidia 3080 (10GB)
LoRA rank: 64
LoRA target modules: All except lm_head & embeddings
LoRA dropout: 0.05
Model: Qwen 1.5 - 1.8B
Max Length: 512
Packing: True
Gradient Checkpointing: True
Batch: 2
Gradient Accumulation Steps: 8
Optim: Adam8Bit
GPU mem usage: ~6800MB
Use Flash Attention2: True
Dtype: BFloat16
LoRA timings: 2.5-2.7 s/Iter
DoRA timings: 5.3-5.5 s/Iter
4-bit QLoRA timings on Qwen 1.5-4B: 6.9-7.1 s/Iter

BenjaminBossan · 2024-02-29T14:24:37Z

@nbasyl Okay, this suggestion sounds good for the purpose of saving on the number of trainable parameters (even if runtime would probably not change much). I'm not sure yet how to configure this so that it would work with most or all model architectures, we'd probably have to add a new config argument for that. If your code is to be released soon, we can wait for that and add it to PEFT afterwards.

When it comes to VeRA, would DVoRA help with reducing the overhead of calculating the weight norm (since W0 and BA are fixed) or is it the same cost as DoRA?

@peterjc123 Thanks for providing your settings. I tried to replicate (using Qwen1.5-0.5B for memory reasons) and got ~30% (2060) to ~40% (T4) slower training with DoRA activated.

it is a little bit not worth it compared to training with QLoRA on a larger model

I'll investigate if we can make DoRA work with bnb. It could be a bit tricky when it comes to calculating the weight norm, let's see.

BenjaminBossan · 2024-03-01T14:53:12Z

I created a PR to support DoRA with bnb (QDoRA): #1518

I tested it on a small use case and it worked. If someone wants to give it a spin and report the results, that would be fantastic.

152334H · 2024-03-02T12:49:19Z

when launching llava training (using deepspeed --include 'localhost:0,1,3,4') with use_dora=True added here, dora works perfectly on ZeRO3, but mysteriously deadlocks on ZeRO2.

ZeRO2

with a zero 2 config, I get a deadlock:

Loading checkpoint shards: 100%|__________________________________________________________________________________| 4/4 [00:03<00:00,  1.04it/s]
Loading checkpoint shards: 100%|__________________________________________________________________________________| 4/4 [00:03<00:00,  1.09it/s]
Adding LoRA adapters...
Loading checkpoint shards: 100%|__________________________________________________________________________________| 4/4 [00:04<00:00,  1.00s/it]
Loading checkpoint shards: 100%|__________________________________________________________________________________| 4/4 [00:03<00:00,  1.15it/s]
# < --- waiting for 10 minutes here, nothing happens

ZeRO3

with a zero 3 config, I mysteriously do not (??) get a deadlock:

[2024-03-02 12:41:55,286] [INFO] [partition_parameters.py:343:__exit__] finished initializing model - num_params = 687, num_elems = 7.57B
Loading checkpoint shards: 100%|__________________________________________________________________________________| 4/4 [00:07<00:00,  1.87s/it]
Loading checkpoint shards: 100%|__________________________________________________________________________________| 4/4 [00:07<00:00,  1.89s/it]
Loading checkpoint shards: 100%|__________________________________________________________________________________| 4/4 [00:07<00:00,  1.89s/it]
Loading checkpoint shards: 100%|__________________________________________________________________________________| 4/4 [00:07<00:00,  1.96s/it]Adding LoRA adapters...
openai/clip-vit-large-patch14-336 is already loaded, `load_model` called again, skipping.                                                       openai/clip-vit-large-patch14-336 is already loaded, `load_model` called again, skipping.
openai/clip-vit-large-patch14-336 is already loaded, `load_model` called again, skipping.
openai/clip-vit-large-patch14-336 is already loaded, `load_model` called again, skipping.                                                       Formatting inputs...Skip in lazy mode
Formatting inputs...Skip in lazy mode
Parameter Offload: Total persistent parameters: 7418880 in 521 params
# < --- (training run proceeds afterwards here)

peterjc123 · 2024-03-03T15:21:18Z

I created a PR to support DoRA with bnb (QDoRA): #1518

I tested it on a small use case and it worked. If someone wants to give it a spin and report the results, that would be fantastic.

@BenjaminBossan

With bf16=True, your implementation OOM when batch = 2 and the configuration I posted above. It also throws the following warning.

The input hidden states seems to be silently casted in float32, this might be related to the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in bfloat16.

And then, I debugged your code and found out that the dtype of weight_norm in apply_dora is torch.float32. So the dtype of the output of the LoRA layer is torch.float32, too.

-> weight_norm = self._get_weight_norm(weight, lora_weight, scaling)
(Pdb) s
--Call--
> peft/src/peft/tuners/lora/layer.py(172)_get_weight_norm()
-> def _get_weight_norm(self, weight, lora_weight, scaling) -> torch.Tensor:
(Pdb) n
> peft/src/peft/tuners/lora/layer.py(174)_get_weight_norm()
-> weight = weight + scaling * lora_weight
(Pdb) p weight.dtype
torch.bfloat16
(Pdb) p scaling
0.25
(Pdb) p lora_weight.dtype
torch.bfloat16
(Pdb) n
> peft/src/peft/tuners/lora/layer.py(175)_get_weight_norm()
-> weight_norm = torch.linalg.norm(weight, dim=1)
(Pdb) p weight.dtype
torch.bfloat16
(Pdb) n
> peft/tuners/lora/layer.py(176)_get_weight_norm()
-> return weight_norm
(Pdb) p weight_norm.dtype
torch.float32

As you can see, torch.linalg.norm returns a float32 tensor regardless of the bfloat16 input. (probably because AMP is effective?)
So I have to change that line to

weight_norm = torch.linalg.norm(weight, dim=1).to(weight.dtype)

After fixing this, it is still super slow, yields about 12.96s/it.
Update: And the sad thing is that the validation loss of the trained model with DoRA+QLoRA is actually worse than that trained with original QLoRA. I guess I'll try something like LoftQ to get better initialized values for the quantized weight matrix first.

BenjaminBossan · 2024-03-05T13:42:13Z

when launching llava training (using deepspeed --include 'localhost:0,1,3,4') with use_dora=True added here, dora works perfectly on ZeRO3, but mysteriously deadlocks on ZeRO2.

Very strange, typically PEFT has issues with ZeRO3, not ZeRO2. I don't know enough about DS to tell what could be the cause of the deadlock here.

So I have to change that line to

Thanks for investigating, I pushed this change to the PR.

After fixing this, it is still super slow, yields about 12.96s/it.

That's unfortunate, but not unexpected. The issue is that we need to have an additional dequantization step for DoRA, as we have to calculate the weight norm of the quantized weight + LoRA. I couldn't come up with a way to avoid this or somehow cache the results. If you or someone else has an idea, please let me know.

Add DoRA (Weight-Decomposed Low-Rank Adaptation). https://arxiv.org/abs/2402.09353 To use this with LoRA, add use_dora=True to the LoraConfig. Currently only supports nn.Linear layers, not other types or quantized linear layers like bnb.

arash2060 · 2024-03-25T15:19:04Z

src/peft/tuners/lora/layer.py

+        lora_weight = lora_B.weight @ lora_A.weight
+        magnitude = self.lora_magnitude_vector[active_adapter]
+        weight = self.get_base_layer().weight
+        weight_norm = self._get_weight_norm(weight, lora_weight, scaling)


should this be the following? There are a few other places that we need to do the same, otherwise we get mismatching dimensions.

weight_norm = self._get_weight_norm(transpose(weight, self.fan_in_fan_out), lora_weight, scaling)

Thanks for pointing this out. Indeed, models that use Conv1D like GPT2 wouldn't work right now. I created a PR to fix this: #1588.

[WIP] Try to implement DoRA

6242f4a

https://arxiv.org/abs/2402.09353

napsternxg mentioned this pull request Feb 18, 2024

Initial Commit to implement Dora Adapter based on LoRA #1479

Closed

BenjaminBossan added 3 commits February 19, 2024 15:34

Merge branch 'main' into feat-dora

32ffc50

Better var name

53a5e25

Update all signatures of update_layer

18fb476

Might make sense to refactor this in the future to avoid a signficant amount of code duplication.

Fix to how weight_norm is calculated

e4a677f

younesbelkada mentioned this pull request Feb 21, 2024

Plans to add DoRA? huggingface/transformers#29153

Closed

BenjaminBossan added 2 commits February 21, 2024 18:58

Fix implementation

c42bb0a

Fix a few failing tests

b915ea2

Some adjustments to correctly consider scaling

8bd22d7

Make style

ecf7160

BenjaminBossan added 2 commits February 22, 2024 15:32

Small fix of calculation of weight norm

07f3e43

Make style

5b22170

BenjaminBossan changed the title ~~[WIP] Try to implement DoRA~~ [WIP] Implement DoRA Feb 22, 2024

BenjaminBossan added 2 commits February 23, 2024 12:17

Add documentation for DoRA

f15dbae

Improve DoRA init test

1e6d1d7

younesbelkada mentioned this pull request Feb 28, 2024

DoRA Support OpenAccess-AI-Collective/axolotl#1328

Closed

5 tasks

DumoeDss mentioned this pull request Feb 28, 2024

有计划支持DoRA嘛，说是比LoRA效果好 hiyouga/LLaMA-Factory#2512

Closed

1 task

This was referenced Mar 6, 2024

add DoRA training feature to sdxl dreambooth lora script huggingface/diffusers#7233

Closed

add DoRA training feature to sdxl dreambooth lora script huggingface/diffusers#7235

Merged

nbasyl mentioned this pull request Mar 18, 2024

I find some confusion code in pefy nbasyl/DoRA#2

Open

arash2060 reviewed Mar 25, 2024

View reviewed changes

This was referenced May 5, 2024

[FR] (Q)DoRA pytorch/torchtune#893

Open

[WIP] Addition of Dora pytorch/torchtune#936

Open

Implement DoRA #1474

Implement DoRA #1474

Conversation

BenjaminBossan commented Feb 16, 2024 • edited

HuggingFaceDocBuilderDev commented Feb 16, 2024

nbasyl commented Feb 19, 2024

BenjaminBossan commented Feb 19, 2024 • edited

BenjaminBossan commented Feb 20, 2024

nbasyl commented Feb 20, 2024

RonanKMcGovern commented Feb 21, 2024

BenjaminBossan commented Feb 21, 2024

RonanKMcGovern commented Feb 22, 2024

BenjaminBossan commented Feb 22, 2024

aliencaocao commented Feb 22, 2024

BenjaminBossan commented Feb 22, 2024

RonanKMcGovern commented Feb 22, 2024

RonanKMcGovern commented Feb 22, 2024

BenjaminBossan commented Feb 22, 2024

BenjaminBossan commented Feb 27, 2024

aliencaocao commented Feb 28, 2024

sayakpaul commented Feb 28, 2024

aliencaocao commented Feb 28, 2024

sayakpaul commented Feb 28, 2024

sayakpaul commented Feb 28, 2024

aliencaocao commented Feb 28, 2024

nbasyl commented Feb 28, 2024

aliencaocao commented Feb 28, 2024

nbasyl commented Feb 28, 2024 • edited

nbasyl commented Feb 28, 2024

BenjaminBossan commented Feb 28, 2024

RonanKMcGovern commented Feb 28, 2024

nbasyl commented Feb 29, 2024

nbasyl commented Feb 29, 2024

peterjc123 commented Feb 29, 2024 • edited

BenjaminBossan commented Feb 29, 2024

BenjaminBossan commented Mar 1, 2024

152334H commented Mar 2, 2024

ZeRO2

ZeRO3

peterjc123 commented Mar 3, 2024 • edited

BenjaminBossan commented Mar 5, 2024

arash2060 Mar 25, 2024

Choose a reason for hiding this comment

BenjaminBossan Mar 25, 2024

Choose a reason for hiding this comment

BenjaminBossan commented Feb 16, 2024 •

edited

BenjaminBossan commented Feb 19, 2024 •

edited

nbasyl commented Feb 28, 2024 •

edited

peterjc123 commented Feb 29, 2024 •

edited

peterjc123 commented Mar 3, 2024 •

edited