Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepSpeed Zero-3 is not compatible with low_cpu_mem_usage=True or with passing a device_map. #306

Closed
rafael-ariascalles opened this issue Apr 13, 2023 · 3 comments

Comments

@rafael-ariascalles
Copy link

I am trying to train a flan-t5-xxl model using "INT8 training of large models in Colab using PEFT LoRA and bits_and_bytes" but using multiple GPUs DeepSpeed and Accelerate

I am cinstantiate the model as:

model = AutoModelForSeq2SeqLM.from_pretrained(
    model_name_or_path,
    load_in_8bit=True,
    device_map={'': 0},
    torch_dtype=torch.float16
)

but I get the following error:

ValueError: DeepSpeed Zero-3 is not compatible with `low_cpu_mem_usage=True` or with passing a `device_map`.
[13:55:22] ERROR    failed (exitcode: 1) local_rank: 0 (pid: 5364) of binary: /opt/conda/envs/pytorch/bin/python3.9   

the deepspeed config:

compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 1
  gradient_clipping: 1.0
  offload_optimizer_device: none
  offload_param_device: cpu
  zero3_init_flag: true
  zero3_save_16bit_model: false
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_config: {}
fsdp_config: {}
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp8
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

any advice @pacman100

I was follow the advice in the thread #93 (comment)
to add decive_map

@pacman100
Copy link
Contributor

Hello @rafael-ariascalles, as the error suggests, DeepSpeed isn't can't be used when using device_map or low_cpu_mem_usage. The reason is that device_map/low_cpu_mem_usage lead to naive model pipeline parallelism and DeepSpeed is meant for sharded data parallelism. These 2 can't be used together because of the way they are implemented. INT8 + DeepSpeed also isn't supported.

You can use PEFT + Gradient Checkpointing + DeepSpeed ZeRO-3 for your use case.

@rafael-ariascalles
Copy link
Author

Thanks , I’ll try in that way.

@FareenaFatima
Copy link

Hi, I am also getting the same error: I have also installed accelerate from pip but othing seems to be working. Please help me as I am a newbie

Cell In[84], line 13
11 checkpoint = "LaMini-T5-738M"
12 Tokenizer = AutoTokenizer.from_pretrained(checkpoint)
---> 13 model = AutoModelForSeq2SeqLM.from_pretrained(
14 checkpoint, torch_dtype=torch.float16, device_map='cpu', low_cpu_mem_usage = True)

File c:\Users\user\Desktop\ChatPDF\venv\Lib\site-packages\transformers\models\auto\auto_factory.py:493, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
491 elif type(config) in cls._model_mapping.keys():
492 model_class = _get_model_class(config, cls._model_mapping)
--> 493 return model_class.from_pretrained(
494 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
495 )
496 raise ValueError(
497 f"Unrecognized configuration class {config.class} for this kind of AutoModel: {cls.name}.\n"
498 f"Model type should be one of {', '.join(c.name for c in cls._model_mapping.keys())}."
P499 )

File c:\Users\user\Desktop\ChatPDF\venv\Lib\site-packages\transformers\modeling_utils.py:2251, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
2247 raise ValueError(
2248 "DeepSpeed Zero-3 is not compatible with low_cpu_mem_usage=True or with passing a device_map."
2249 )
2250 elif not is_accelerate_available():
-> 2251 raise ImportError(
...
2258 return_unused_kwargs=True,
2259 **kwargs,
2260 )

ImportError: Using low_cpu_mem_usage=True or a device_map requires Accelerate: pip install accelerate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants