[BUG] ValueError: max() arg is an empty sequence using bf16 zero stage3 #2820

sujithjoseph · 2023-02-12T23:59:39Z

│ /opt/conda/lib/python3.7/site-packages/deepspeed/runtime/zero/stage3.py:307  │
│ in <listcomp>                                                                │
│                                                                              │
│    304 │   │   │   max([                                                     │
│    305 │   │   │   │   max(tensor.numel(),                                   │
│    306 │   │   │   │   │   tensor.ds_numel) for tensor in fp16_partitioned_g │
│ ❱  307 │   │   │   ]) for fp16_partitioned_group in self.fp16_partitioned_gr │
│    308 │   │   ])                                                            │
│    309 │   │   print_rank_0(                                                 │
│    310 │   │   │   f'Largest partitioned param numel = {largest_partitioned_ │
╰──────────────────────────────────────────────────────────────────────────────╯
ValueError: max() arg is an empty sequence

To Reproduce
Steps to reproduce the behavior:
Happened during finetuning on flan 11b model . Here is the entire error gist - https://gist.github.com/sujithjoseph/c410514acfccc76974a8130a8afd2169

Here is the deepspeed config https://gist.github.com/sujithjoseph/92bf27de6bba704b57c3b9eb7aa00365

ds_report output
ds report - https://gist.github.com/sujithjoseph/c725de5fb38bb3c20e4fb6fd55f63848

System info (please complete the following information):

OS: Debian GNU/Linux 10 (buster)
GPU count and types [ 1 machine with 4 A100s - 40G*4]
Python version 3.7

Launcher context
Are you launching your experiment with the deepspeed launcher, MPI, or something else? Accelerate + PEFT

deepspeed_config:

deepspeed_config_file: zero_stage3_offload_config.json
zero3_init_flag: true

Additional context

I assume that bf16 configs and fp16 configs are interchangeable

    "bf16": {
        "enabled": true,
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    }

The text was updated successfully, but these errors were encountered:

sujithjoseph · 2023-02-13T00:48:54Z

Error also appears with fp16 instead of bf16 in deepspeed config and with zero3_init_flag: false in accelerate config with deepspeed as well.

sujithjoseph · 2023-02-13T01:29:39Z

With Stage2 and no offsets , Get a different error

 /opt/conda/lib/python3.7/site-packages/deepspeed/runtime/zero/stage_1_and_2. │
│ py:323 in __init__                                                           │
│                                                                              │
│    320 │   │   │   │   self.flatten_dense_tensors_aligned(                   │
│    321 │   │   │   │   │   self.round_robin_bit16_groups[i],                 │
│    322 │   │   │   │   │   self.nccl_start_alignment_factor *                │
│ ❱  323 │   │   │   │   │   dist.get_world_size(group=self.real_dp_process_gr │
│    324 │   │   │   │   │   │   torch.cuda.current_device()))                 │
│    325 │   │   │   see_memory_usage(f"After flattening and moving param grou │
│    326 │   │   │   │   │   │   │    force=False)                             │
│                                                                              │
│ /opt/conda/lib/python3.7/site-packages/deepspeed/runtime/zero/stage_1_and_2. │
│ py:862 in flatten_dense_tensors_aligned                                      │
│                                                                              │
│    859 │                                                                     │
│    860 │   # create a flat tensor aligned at the alignment boundary          │
│    861 │   def flatten_dense_tensors_aligned(self, tensor_list, alignment):  │
│ ❱  862 │   │   return self.flatten(align_dense_tensors(tensor_list, alignmen │
│    863 │                                                                     │
│    864 │   ############### Independent Partition Gradient ################## │
│    865 │   def reduce_independent_p_g_buckets_and_remove_grads(self, param,  │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: torch.cat(): expected a non-empty list of Tensors

Could this be an issue with dataset?

sujithjoseph · 2023-02-13T02:51:33Z

Was able to sort it out using the below accelerate + DS config. Now dealing with an OOM issue, but not sure why the previous DeepSpeed config didnt work

compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 1
  gradient_clipping: 1.0
  offload_optimizer_device: cpu
  offload_param_device: cpu
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: true
dynamo_backend: 'NO'
fsdp_config: {}
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'bf16'
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
use_cpu: false

sujithjoseph · 2023-02-13T03:03:44Z

How can we estimate the # of GPUs Needed (each with 40 GB) . for flan-t5-11b with cpu param / optimizer offloading , 0.49GB | offload_param=cpu , offload_optimizer=cpu , zero_init=1 is what the estimate is.

sujithjoseph · 2023-02-13T03:21:54Z

With Batch size as 1, it works without OOM. How can we estimate the # of GPUs needed for batchsize of 4 or 8, without trial and error. With batch size as 1, I see only 26903MiB used per GPU max. For batch size as 2, It works for some time (3-4 hours) with 8 40 GB GPUs with almost all 40 GB utilized and then goes into OOM. How can I cap the GPU memory used?

sujithjoseph · 2023-02-13T18:01:58Z

With the following deepspeed config

deepspeed_config:
  gradient_accumulation_steps: 2
  gradient_clipping: 1.0
  offload_optimizer_device: cpu
  offload_param_device: cpu
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
  bf16:enabled: true

and
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
set in the code, would deepspeed use tf32 or bf16 ?

tjruwase · 2023-02-14T22:48:11Z

@sujithjoseph, deepspeed should use bf16. Are you observing something different?

sujithjoseph · 2023-02-16T00:53:12Z

@tjruwase , it did work with bf16. The only Question I have is Can i use max memory to restrict the memory used by the model during fine-tuning like the below one used for inference etc.

max_memory={0: "25GIB", "cpu":"120GB"}

model = load_checkpoint_and_dispatch(
model, model_id, device_map="auto", max_memory=max_memory, no_split_module_classes=["T5Block"]
)

tjruwase · 2023-02-16T02:17:28Z

Got it. I don't have experience with those memory restriction flags, which seem to be Accelerate flags. I don't think those flags are hooked into deepspeed. Can you please pose this question on their forum? I think we can work with them to enable the desired feature.

zhenlohuang · 2023-03-16T07:55:57Z

@sujithjoseph I faced the same issues as your memtioned above, both issue in stage 2 and stage 3 were the same to you. Did you have any workaround for this?

shaowei-su · 2023-04-01T20:42:28Z

Ran into exact same error when running DeepSpeed on Ray. Following this thread.

SupetZYK · 2023-04-13T15:08:19Z

same error RuntimeError: torch.cat(): expected a non-empty list of Tensors when accelerate.prepare. So how to solve it?

tjruwase · 2023-04-13T15:26:00Z

@zhenlohuang, @shaowei-su, @SupetZYK, it seems that @sujithjoseph resolved the original issue with the following
#2820 (comment)

If the workaround does not work for you, please open a new issue and share details to help us repro. Thanks!

shaowei-su · 2023-04-15T21:31:21Z

@tjruwase I was able to run DS + stage 3 + fp16 by disabling optimizer section in the DS config, which I found negative impacts on the model quality.

If I switch to DS + stage 2, then it's the same runtime error @SupetZYK posted above.

  File "/home/default_user/.conda/envs/user/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
    self.optimizer = self._configure_zero_optimizer(basic_optimizer)
  File "/home/default_user/.conda/envs/user/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1547, in _configure_zero_optimizer
    optimizer = DeepSpeedZeroOptimizer(
  File "/home/default_user/.conda/envs/user/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 324, in __init__
    self.flatten_dense_tensors_aligned(
  File "/home/default_user/.conda/envs/user/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 867, in flatten_dense_tensors_aligned
    return self.flatten(align_dense_tensors(tensor_list, alignment))
RuntimeError: torch.cat(): expected a non-empty list of Tensors

tjruwase · 2023-05-15T14:58:40Z

@shaowei-su and @SupetZYK, it seems you are both seeing a different error from the original posting. Can you please open a new issue and share details for repro? I will close this in the meantime. Thanks!

bestpredicts · 2023-05-23T06:45:43Z

same error here,any update?

seongminp · 2023-06-08T06:23:30Z

FWIW I got this error when I put my model accidentally on inference mode.
My peft config had inference_mode: True.

Wesley-Jzy · 2023-06-09T08:05:57Z

Same error when using loralib with zero2 & 3

tjruwase · 2023-09-06T13:06:48Z

@bestpredicts and @Wesley-Jzy, are you able to provide repro steps?

awan-10 · 2023-09-08T21:16:52Z

Can people in this thread please downgrade to HF transformers 4.31.0 and try?

sujithjoseph added bug Something isn't working training labels Feb 12, 2023

sujithjoseph mentioned this issue Feb 16, 2023

How to save / form the config.json after fine-tuning - Flan T5 11b huggingface/peft#93

Closed

tjruwase self-assigned this Feb 21, 2023

tjruwase closed this as completed May 15, 2023

tjruwase reopened this Sep 6, 2023

tjruwase mentioned this issue Sep 6, 2023

Handle empty parameter groups #4277

Merged

tjruwase closed this as completed in #4277 Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] ValueError: max() arg is an empty sequence using bf16 zero stage3 #2820

[BUG] ValueError: max() arg is an empty sequence using bf16 zero stage3 #2820

sujithjoseph commented Feb 12, 2023 •

edited

Loading

sujithjoseph commented Feb 13, 2023

sujithjoseph commented Feb 13, 2023

sujithjoseph commented Feb 13, 2023 •

edited

Loading

sujithjoseph commented Feb 13, 2023

sujithjoseph commented Feb 13, 2023 •

edited

Loading

sujithjoseph commented Feb 13, 2023

tjruwase commented Feb 14, 2023

sujithjoseph commented Feb 16, 2023

tjruwase commented Feb 16, 2023

zhenlohuang commented Mar 16, 2023

shaowei-su commented Apr 1, 2023

SupetZYK commented Apr 13, 2023

tjruwase commented Apr 13, 2023

shaowei-su commented Apr 15, 2023

tjruwase commented May 15, 2023

bestpredicts commented May 23, 2023

seongminp commented Jun 8, 2023

Wesley-Jzy commented Jun 9, 2023

tjruwase commented Sep 6, 2023

awan-10 commented Sep 8, 2023

[BUG] ValueError: max() arg is an empty sequence using bf16 zero stage3 #2820

[BUG] ValueError: max() arg is an empty sequence using bf16 zero stage3 #2820

Comments

sujithjoseph commented Feb 12, 2023 • edited Loading

deepspeed_config:

sujithjoseph commented Feb 13, 2023

sujithjoseph commented Feb 13, 2023

sujithjoseph commented Feb 13, 2023 • edited Loading

sujithjoseph commented Feb 13, 2023

sujithjoseph commented Feb 13, 2023 • edited Loading

sujithjoseph commented Feb 13, 2023

tjruwase commented Feb 14, 2023

sujithjoseph commented Feb 16, 2023

tjruwase commented Feb 16, 2023

zhenlohuang commented Mar 16, 2023

shaowei-su commented Apr 1, 2023

SupetZYK commented Apr 13, 2023

tjruwase commented Apr 13, 2023

shaowei-su commented Apr 15, 2023

tjruwase commented May 15, 2023

bestpredicts commented May 23, 2023

seongminp commented Jun 8, 2023

Wesley-Jzy commented Jun 9, 2023

tjruwase commented Sep 6, 2023

awan-10 commented Sep 8, 2023

sujithjoseph commented Feb 12, 2023 •

edited

Loading

sujithjoseph commented Feb 13, 2023 •

edited

Loading

sujithjoseph commented Feb 13, 2023 •

edited

Loading