DeepSpeed Zero-3 is not compatible with `low_cpu_mem_usage=True` or with passing a `device_map`. #306

rafael-ariascalles · 2023-04-13T14:13:33Z

I am trying to train a flan-t5-xxl model using "INT8 training of large models in Colab using PEFT LoRA and bits_and_bytes" but using multiple GPUs DeepSpeed and Accelerate

I am cinstantiate the model as:

model = AutoModelForSeq2SeqLM.from_pretrained(
    model_name_or_path,
    load_in_8bit=True,
    device_map={'': 0},
    torch_dtype=torch.float16
)

but I get the following error:

ValueError: DeepSpeed Zero-3 is not compatible with `low_cpu_mem_usage=True` or with passing a `device_map`.
[13:55:22] ERROR    failed (exitcode: 1) local_rank: 0 (pid: 5364) of binary: /opt/conda/envs/pytorch/bin/python3.9

the deepspeed config:

compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 1
  gradient_clipping: 1.0
  offload_optimizer_device: none
  offload_param_device: cpu
  zero3_init_flag: true
  zero3_save_16bit_model: false
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_config: {}
fsdp_config: {}
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp8
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

any advice @pacman100

I was follow the advice in the thread #93 (comment)
to add decive_map

The text was updated successfully, but these errors were encountered:

pacman100 · 2023-04-14T11:30:58Z

Hello @rafael-ariascalles, as the error suggests, DeepSpeed isn't can't be used when using device_map or low_cpu_mem_usage. The reason is that device_map/low_cpu_mem_usage lead to naive model pipeline parallelism and DeepSpeed is meant for sharded data parallelism. These 2 can't be used together because of the way they are implemented. INT8 + DeepSpeed also isn't supported.

You can use PEFT + Gradient Checkpointing + DeepSpeed ZeRO-3 for your use case.

rafael-ariascalles · 2023-04-14T12:10:29Z

Thanks , I’ll try in that way.

FareenaFatima · 2023-08-11T17:45:09Z

Hi, I am also getting the same error: I have also installed accelerate from pip but othing seems to be working. Please help me as I am a newbie

Cell In[84], line 13
11 checkpoint = "LaMini-T5-738M"
12 Tokenizer = AutoTokenizer.from_pretrained(checkpoint)
---> 13 model = AutoModelForSeq2SeqLM.from_pretrained(
14 checkpoint, torch_dtype=torch.float16, device_map='cpu', low_cpu_mem_usage = True)

File c:\Users\user\Desktop\ChatPDF\venv\Lib\site-packages\transformers\models\auto\auto_factory.py:493, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
491 elif type(config) in cls._model_mapping.keys():
492 model_class = _get_model_class(config, cls._model_mapping)
--> 493 return model_class.from_pretrained(
494 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
495 )
496 raise ValueError(
497 f"Unrecognized configuration class {config.class} for this kind of AutoModel: {cls.name}.\n"
498 f"Model type should be one of {', '.join(c.name for c in cls._model_mapping.keys())}."
P499 )

File c:\Users\user\Desktop\ChatPDF\venv\Lib\site-packages\transformers\modeling_utils.py:2251, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
2247 raise ValueError(
2248 "DeepSpeed Zero-3 is not compatible with low_cpu_mem_usage=True or with passing a device_map."
2249 )
2250 elif not is_accelerate_available():
-> 2251 raise ImportError(
...
2258 return_unused_kwargs=True,
2259 **kwargs,
2260 )

ImportError: Using low_cpu_mem_usage=True or a device_map requires Accelerate: pip install accelerate

rafael-ariascalles closed this as completed Apr 14, 2023

zzisbeauty mentioned this issue Aug 29, 2023

执行基于现有alpaca-plus进行SFT时报错，请老师帮帮忙 ymcui/Chinese-LLaMA-Alpaca#834

Closed

5 tasks

jasel-lewis mentioned this issue Dec 19, 2023

Error: DeepSpeed Zero-3 is not compatible with low_cpu_mem_usage=True or with passing a device_map aws/amazon-sagemaker-feedback#24

Open

3 tasks

mariokostelac mentioned this issue Jan 24, 2024

Deepspeed not partitioning the model across GPUs axolotl-ai-cloud/axolotl#1129

Open

8 tasks

g-batalhao-a mentioned this issue Apr 1, 2024

DeepSpeed error with LoRa TIGER-AI-Lab/TIGERScore#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSpeed Zero-3 is not compatible with `low_cpu_mem_usage=True` or with passing a `device_map`. #306

DeepSpeed Zero-3 is not compatible with `low_cpu_mem_usage=True` or with passing a `device_map`. #306

rafael-ariascalles commented Apr 13, 2023

pacman100 commented Apr 14, 2023

rafael-ariascalles commented Apr 14, 2023

FareenaFatima commented Aug 11, 2023

DeepSpeed Zero-3 is not compatible with low_cpu_mem_usage=True or with passing a device_map. #306

DeepSpeed Zero-3 is not compatible with low_cpu_mem_usage=True or with passing a device_map. #306

Comments

rafael-ariascalles commented Apr 13, 2023

pacman100 commented Apr 14, 2023

rafael-ariascalles commented Apr 14, 2023

FareenaFatima commented Aug 11, 2023

DeepSpeed Zero-3 is not compatible with `low_cpu_mem_usage=True` or with passing a `device_map`. #306

DeepSpeed Zero-3 is not compatible with `low_cpu_mem_usage=True` or with passing a `device_map`. #306