Skip to content

DeepSpeed Zero3 and Peft LoRA fp16 issue #138

@conceptofmind

Description

@conceptofmind

Hi all,

I am having an issue when running Peft LoRA with DeepSpeed Zero3.

Error:

ValueError: fp16 is enabled but the following parameters have dtype that is not fp16: base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_A.weight, 
base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_A.weight, 
base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_A.weight, 
base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.3.attention.query_key_value.lora_A.weight,

How to reproduce:

CLI:

deepspeed finetune_pythia.py --per_device_train_batch_size 1 --output_dir /home/training_scripts/pythia-1.4b --fp16 --deepspeed configs/ds_z3_config.json

Code:

from transformers import AutoTokenizer, GPTNeoXForCausalLM
from peft import get_peft_model, LoraConfig, TaskType

peft_config = LoraConfig(
  task_type=TaskType.CAUSAL_LM, 
  inference_mode=False, 
  r=16, 
  lora_alpha=32, 
  lora_dropout=0.1,
)

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-1.4b")
model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/pythia-1.4b")
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

DeepSpeed Zero 3 config:

{
    "fp16": {
        "enabled": "auto",
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    },

    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto"
        }
    },

    "scheduler": {
        "type": "WarmupLR",
        "params": {
            "warmup_min_lr": "auto",
            "warmup_max_lr": "auto",
            "warmup_num_steps": "auto"
        }
    },

    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e9,
        "stage3_gather_16bit_weights_on_model_save": true
    },

    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 2000,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

Any help would be greatly appreciated.

Thank you,

Enrico

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions