More convenient way to initialize LoftQ #1543

Related to huggingface#1532 At the moment, using LoftQ is quite cumbersome, as shown in this example: https://github.com/huggingface/peft/tree/7e84dec20b3106bdd0a90ba8e80187f0aec835b7/examples/loftq_finetuning Essentially, users have to: 1. Load the non-quantized model with LoftQ (which can be quite huge) 2. Modify the PEFT config 3. Save the adapter 4. Unwrap the base model with custom functions 5. Save the base model with modified weights (i.e. a whole copy of the base model) 6. Load the base model from step 5 with bnb quantization 7. Load the adapter from step 3 Yes, there is a helper script to do this, but this still has the advantage that we need to load the non-quantized model and that we have to create a completely new model checkpoint with the modified weights. This PR aims to make this process more convenient by adding a single function replace_lora_weights_loftq. This function takes the bnb-quantized LoRA model as input. Then it goes through each module with LoRA weights, lazily loads the corresponding non-quantized weights one at a time using safetensors, computes the quantization error, and replaces the LoRA weights with LoftQ-initialized LoRA weights. This is much more convenient because we only require very little extra memory thanks to lazy loading, and we don't have to keep an extra copy of the weights. While working on this, I still found that LoftQ initialization often did not seem to help a lot, as mentioned in huggingface#1532. I measured this by creating (1) logits with the base model, (2) with the quantized+LoRA model, and (3) with the quantized+LoRA+LoftQ model. The expectation is that (1) should be closer to (3) than to (2). This was often not the case. I therefore added the possibility to run a check each time that we replace a LoRA weight with the LoftQ weights. If this check returns True, we proceed to the next weight, otherwise we discard the change. That way, we only make the replacement with LoftQ weights if we see a real improvement. Of course, this is only a form of greedy optimization, but it seems to work in practice. And since it's optional, users can choose not to use it. This PR is not yet finished since I ran into an issue with matching the key names from safetensors not matching. Furthermore, for now this doesn't support 8bit quantization and the num_iter arguments of LoftQ, which I'm not sure is really working. However, I guess the replace_lora_weights_loftq function could be called multiple times in a row.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Makes more sense than from the automatic merge

Better results, bigger margins.

…-initialization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More convenient way to initialize LoftQ #1543

More convenient way to initialize LoftQ #1543

Commits on Mar 7, 2024

Commits on Mar 11, 2024

Commits on Mar 12, 2024

Commits on Mar 13, 2024

Commits on Mar 19, 2024

Commits on Mar 20, 2024