fix: Load 8-bit quantized models for eval after fine-tuning #3606

jeffkinnison · 2023-09-13T23:34:20Z

Errors

After training or fine-tuning, the best model checkpoint is loaded for evaluation. When loading an 8-bit quantized model that was fine-tuned on GPU, the following errors occur:

With no handling, the call to load_state_dict here raises RuntimeError: Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. Please call module.cuda() before module.load_state_dict()"
If 1. is handled, a number of unexpected keys are returned from load_state_dict and an AssertionError is raised

Causes

These issues can be reproduced by running tests/integration_tests/test_llm.py::test_llm_finetuning_strategies with 8-bit quantization. Both issues are the result of custom handling in bitsandbytes. They are caused by

Moving an 8-bit parameter object to GPU creates a number of metadata matrices behind the scenes. These are added to model state on the fly during the move to GPU, and thus do not exist in a version of the model that has not been put on GPU. A check during load_state_dict in bitsandbytes raises the RuntimeError.
When saving a state dict, bitsandbytes adds a number of weight_format entries to the state dict behind the scenes. These are metadata entries that are used in load_state_dict to reconstruct the quantized parameters. Since these weight_format entries are never registered in model state, on load they are returned in the unexpected_keys list. On load for eval, we assert that no unexpected keys were returned.

Workaround

This update puts in a workaround that addresses both issues. For 8-bit quantized models only, at the call to load_state_dict we first move the model to GPU and back to solve 1., then we ensure that the only unexpected keys are weight_format keys to handle 2. This should unblock 8-bit quantization for the time being, though we should double-check model quality.

github-actions · 2023-09-13T23:35:16Z

Unit Test Results

  6 files ±0   6 suites ±0 38m 58s ⏱️ - 6m 2s
31 tests ±0 26 ✔️ ±0   5 💤 ±0 0 ❌ ±0
82 runs ±0 66 ✔️ ±0 16 💤 ±0 0 ❌ ±0

Results for commit f037e0f. ± Comparison against base commit d15a0c5.

♻️ This comment has been updated with latest results.

arnavgarg1 · 2023-09-14T16:06:15Z

ludwig/trainers/trainer.py

+                        if torch.cuda.is_available():
+                            self.model.model.cuda()
+                            self.model.model.cpu()


Ooof, maybe one callout here might be that the .cuda() call is unique and overriden for the Linear8Bit layers which internally does some stuff for 8BitParameters?

Yeah, it's not clear to me why we need to move to GPU then back to CPU like this. Comment would be great so I don't need to read the full PR description.

Added a comment and removed the move to CPU. It turns out the model was on GPU all along: self.model.device reports that it is on CPU, but self.model.model.device and deeper modules in the model all report that that they are on GPU.

tgaddair · 2023-09-14T16:33:57Z

ludwig/trainers/trainer.py

+                        only_weights_format_keys = ["weights_format" in k for k in unexpected_keys]
+                        assert (
+                            unexpected_keys == [] or only_weights_format_keys
+                        ), f"Unexpected keys found in state dict: {unexpected_keys}"


Add something about the only_weights_format_keys to the error message.

Added in b8b487d

Hmm, am I missing something, I still don't see anything in the assert message about it?

tgaddair · 2023-09-14T20:00:53Z

ludwig/trainers/trainer.py

+                        # to a RuntimeError in `load_state_dict`. Explicitly call `model.cuda()` to make sure the
+                        # matrices are part of model state. This workaround is necessary because the matrices are
+                        # deleted during the model's forward pass.
+                        if self.device == torch.device("cuda"):


Check self.device.type == "cuda" as the device might be cuda:0, etc.

jeffkinnison added 2 commits September 13, 2023 19:09

add 8-bit finetuning test and skip unused keys

bdd662b

8-bit quantization state dict workaround

5e7c15e

documentation

9229bb5

jeffkinnison requested review from tgaddair, arnavgarg1, justinxzhao and Infernaught September 13, 2023 23:40

Merge branch 'master' into 8bit-quant-load-error

163aaf7

arnavgarg1 reviewed Sep 14, 2023

View reviewed changes

tgaddair reviewed Sep 14, 2023

View reviewed changes

justinxzhao approved these changes Sep 14, 2023

View reviewed changes

jeffkinnison added 2 commits September 14, 2023 14:07

documentation and remove .cpu() call

b8b487d

documentation and device check fix

f959712

tgaddair reviewed Sep 14, 2023

View reviewed changes

jeffkinnison added 2 commits September 14, 2023 16:53

device check fix

9ddba4d

device check fix

f037e0f

tgaddair approved these changes Sep 14, 2023

View reviewed changes

tgaddair merged commit fe2f306 into master Sep 15, 2023
16 of 17 checks passed

tgaddair deleted the 8bit-quant-load-error branch September 15, 2023 04:01

jeffkinnison mentioned this pull request Sep 16, 2023

fix: Check underlying model device type when moving 8-bit quantized models to GPU at eval #3622

Merged

jeffkinnison mentioned this pull request Jan 25, 2024

fix: Add Nested quantization check #3916

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Load 8-bit quantized models for eval after fine-tuning #3606

fix: Load 8-bit quantized models for eval after fine-tuning #3606

jeffkinnison commented Sep 13, 2023

github-actions bot commented Sep 13, 2023 •

edited

Loading

arnavgarg1 Sep 14, 2023

tgaddair Sep 14, 2023

jeffkinnison Sep 14, 2023

tgaddair Sep 14, 2023

jeffkinnison Sep 14, 2023

tgaddair Sep 14, 2023

tgaddair Sep 14, 2023

fix: Load 8-bit quantized models for eval after fine-tuning #3606

fix: Load 8-bit quantized models for eval after fine-tuning #3606

Conversation

jeffkinnison commented Sep 13, 2023

Errors

Causes

Workaround

github-actions bot commented Sep 13, 2023 • edited Loading

Unit Test Results

arnavgarg1 Sep 14, 2023

Choose a reason for hiding this comment

tgaddair Sep 14, 2023

Choose a reason for hiding this comment

jeffkinnison Sep 14, 2023

Choose a reason for hiding this comment

tgaddair Sep 14, 2023

Choose a reason for hiding this comment

jeffkinnison Sep 14, 2023

Choose a reason for hiding this comment

tgaddair Sep 14, 2023

Choose a reason for hiding this comment

tgaddair Sep 14, 2023

Choose a reason for hiding this comment

github-actions bot commented Sep 13, 2023 •

edited

Loading