Max/4bit #131

maxjeblick · 2023-06-01T12:00:27Z

This PR adds 4bit training/inference.
I have tested with EleutherAI/gpt-neox-20b where 8 bit yields OOM. I have also compared EleutherAI/gpt-j-6B 4bit/8bit training and its corresponding chat responses.

There is an issue with merging LORA back which can yield to OOM issues: #130 I'll have a look and see if it can be fixed and added to this PR.

I'm also not sure about this line: https://github.com/h2oai/h2o-llmstudio/blob/main/llm_studio/src/utils/modeling_utils.py#L72
It's currently left as is, I haven't found any issues with it so far.

psinger · 2023-06-01T17:28:58Z

I'm also not sure about this line: https://github.com/h2oai/h2o-llmstudio/blob/main/llm_studio/src/utils/modeling_utils.py#L72
It's currently left as is, I haven't found any issues with it so far.

IIRC this was needed for training and loading weights in different precisions

app_utils/sections/experiment.py

llm_studio/python_configs/text_causal_language_modeling_config.py

pascal-pfeiffer · 2023-06-02T09:54:23Z

Thank you Max! Will need to test a bit more, but looking great so far.

let's also add some words in the README.md about 4bit training?

llm_studio/src/utils/modeling_utils.py

psinger · 2023-06-02T12:56:20Z

Getting errors when training in 4bit, and then doing inference on float16.
While loading the weights.
Score is correct though.

Might be related to my comment above.

psinger · 2023-06-02T12:58:33Z

One more note:
Inference in 4bit is extremely slow for me.

Would it maybe make sense to force 8bit in Chat window?
Or even better, make it somehow an option?
Might be a separate issue.

psinger · 2023-06-02T13:42:27Z

Found this:
https://github.com/huggingface/peft/blob/main/examples/fp4_finetuning/finetune_fp4_opt_bnb_peft.py

Here they are also setting llm_int8_threshold - does it have an impact in 4bit?

I think we might also be fine setting it to float16 instead of bfloat16.

maxjeblick · 2023-06-02T13:48:16Z

Here they are also setting llm_int8_threshold - does it have an impact in 4bit?

Should not affect 4bit training:
https://github.com/huggingface/transformers/blob/main/src/transformers/utils/bitsandbytes.py#L133

psinger · 2023-06-02T14:55:42Z

Not sure this was lucky or has impact, but I got better inference speed with these settings:

elif cfg.architecture.backbone_dtype == "int4":
        kwargs["device_map"] = {"": cfg.environment._device}
        quantization_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_compute_dtype=torch.float16,
            llm_int8_has_fp16_weight=True,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4",
            llm_int8_threshold=0.0
        )
        # need to force pretrained
        cfg.architecture.pretrained = True
        kwargs["torch_dtype"] = torch.float16

psinger · 2023-06-05T08:42:19Z

Let's add this also to int8:

kwargs["torch_dtype"] = torch.float16

And do you think bnb_4bit_use_double_quant=True is useful?

maxjeblick · 2023-06-05T08:50:13Z

And do you think bnb_4bit_use_double_quant=True is useful?

From here:

Other options include bnb_4bit_use_double_quant which uses a second quantization after the first one to save an additional 0.4 bits per parameter

I haven't tested this option yet; in theory, it should allow you to finetune even larger models. I can compare infernece speed and enable it if it turns out to be similar.

maxjeblick · 2023-06-05T10:19:31Z

For h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v2 (evaluate only, that is train with 0 epochs) bnb_4bit_use_double_quant does not seem to have any noticeable impact, both time-wise, as well as max-GPU memory-wise.
Not sure how, if at all, it affects training performance. Let's maybe keep it disabled for now?

maxjeblick · 2023-06-05T10:37:21Z

Getting errors when training in 4bit, and then doing inference on float16.

Is this issue still present? I started a new experiment in fp16, using previous 4bit weights; that works.

psinger · 2023-06-05T11:47:19Z

Getting errors when training in 4bit, and then doing inference on float16.

Is this issue still present? I started a new experiment in fp16, using previous 4bit weights; that works.

I think after you made the changes above it works now.

psinger · 2023-06-05T14:15:39Z

@maxjeblick did you try pushing to HF and reloading?

maxjeblick · 2023-06-05T14:19:19Z

@maxjeblick did you try pushing to HF and reloading?

I tried downloading the model from UI and subsequently loading from the unzipped folder, following the model card example. That worked.

psinger · 2023-06-05T14:22:35Z

Cool, if possible lets also try the HF push.

psinger

Thanks!

maxjeblick · 2023-06-05T16:03:54Z

Checked HF hub, it also works.

maxjeblick added 6 commits May 26, 2023 09:38

add 4bit training

70d0629

add 4bit training

6eeba3f

use prepare_model_for_kbit_training

6598bb0

update transformers version

285d4ee

fix chat

812763a

Merge branch 'main' into max/4bit

e3c3682

maxjeblick requested a review from psinger June 1, 2023 12:00

maxjeblick and others added 6 commits June 1, 2023 12:04

Update requirements.txt

899f4ed

load model weights on CPU when downloading the model

23a8840

load model weights on CPU when downloading the model for 4bit/8bit

70bb9d9

better logging when merging lora

a7965cc

better logging

dee693e

better logging

1f1702f

pascal-pfeiffer reviewed Jun 2, 2023

View reviewed changes

app_utils/sections/experiment.py Outdated Show resolved Hide resolved

llm_studio/python_configs/text_causal_language_modeling_config.py Outdated Show resolved Hide resolved

maxjeblick added 2 commits June 2, 2023 11:54

addressing pr comments

de5a5d2

add 4bit instructions in readme

bcae186

psinger reviewed Jun 2, 2023

View reviewed changes

llm_studio/src/utils/modeling_utils.py Outdated Show resolved Hide resolved

add 4bit dtype check

caff0e1

disable gradient checkpointing

3419f09

update 4bit config

03c4128

add 16 bit dtype to int8 training

a40f809

Merge branch 'main' into max/4bit

f53ffb9

psinger approved these changes Jun 5, 2023

View reviewed changes

maxjeblick merged commit 6b140b1 into main Jun 5, 2023
5 checks passed

maxjeblick deleted the max/4bit branch June 5, 2023 16:04

maxjeblick mentioned this pull request Jun 5, 2023

[FEATURE] 4bit training #128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max/4bit #131

Max/4bit #131

maxjeblick commented Jun 1, 2023

psinger commented Jun 1, 2023

pascal-pfeiffer commented Jun 2, 2023 •

edited

Loading

psinger commented Jun 2, 2023 •

edited

Loading

psinger commented Jun 2, 2023

psinger commented Jun 2, 2023 •

edited

Loading

maxjeblick commented Jun 2, 2023 •

edited

Loading

psinger commented Jun 2, 2023

psinger commented Jun 5, 2023

maxjeblick commented Jun 5, 2023

maxjeblick commented Jun 5, 2023

maxjeblick commented Jun 5, 2023

psinger commented Jun 5, 2023

psinger commented Jun 5, 2023

maxjeblick commented Jun 5, 2023

psinger commented Jun 5, 2023

psinger left a comment

maxjeblick commented Jun 5, 2023

Max/4bit #131

Max/4bit #131

Conversation

maxjeblick commented Jun 1, 2023

psinger commented Jun 1, 2023

pascal-pfeiffer commented Jun 2, 2023 • edited Loading

psinger commented Jun 2, 2023 • edited Loading

psinger commented Jun 2, 2023

psinger commented Jun 2, 2023 • edited Loading

maxjeblick commented Jun 2, 2023 • edited Loading

psinger commented Jun 2, 2023

psinger commented Jun 5, 2023

maxjeblick commented Jun 5, 2023

maxjeblick commented Jun 5, 2023

maxjeblick commented Jun 5, 2023

psinger commented Jun 5, 2023

psinger commented Jun 5, 2023

maxjeblick commented Jun 5, 2023

psinger commented Jun 5, 2023

psinger left a comment

Choose a reason for hiding this comment

maxjeblick commented Jun 5, 2023

pascal-pfeiffer commented Jun 2, 2023 •

edited

Loading

psinger commented Jun 2, 2023 •

edited

Loading

psinger commented Jun 2, 2023 •

edited

Loading

maxjeblick commented Jun 2, 2023 •

edited

Loading