Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'torch.dtype' object has no attribute 'element_size' #30304

Closed
4 tasks done
ashmalvayani opened this issue Apr 17, 2024 · 6 comments · Fixed by #30133
Closed
4 tasks done

AttributeError: 'torch.dtype' object has no attribute 'element_size' #30304

ashmalvayani opened this issue Apr 17, 2024 · 6 comments · Fixed by #30133

Comments

@ashmalvayani
Copy link

System Info

transformer version: 4.40.0.dev
python version: 3.10
torch: 2.0.1 cu11.7

I am fine-tuning https://huggingface.co/CohereForAI/c4ai-command-r-v01 model with axolotl framework. The config lora.yaml file is as follows:

base_model: CohereForAI/c4ai-command-r-v01
trust_remote_code: true

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
    - path: Data_Clean3.json
      ds_type: json
      type: alpaca
dataset_prepared_path: /last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0
val_set_size: 0.0
output_dir: ./outputs/c4ai_lora

sequence_len: 1024
sample_packing: true
pad_to_sequence_len: true

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

adapter: qlora
lora_model_dir:
sample_packing: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.0
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
# bf16: auto
# fp16: 
# tf32: false
bf16: false
fp16: true
tf32: true

gradient_checkpointing: false  # don't use with fsdp_activation_checkpointing
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 0
saves_per_epoch: 1
debug:
weight_decay: 0.0
# deepspeed: deepspeed_configs/zero3.json

special_tokens:
  bos_token: "<BOS_TOKEN>"
  eos_token: "<|END_OF_TURN_TOKEN|>"
  pad_token: "<PAD>"

To reproduce, install the axolotl environment and run the following:

CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" accelerate launch -m axolotl.cli.train examples/cohere-command/lora.yaml

The problem exists when I run the quantized model in 4 bit, in 8 bit it does not have any issues and run smoothly.

Who can help?

@pacman100 @SunMarc @younesbelkada

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Install axolotl environment from: https://github.com/OpenAccess-AI-Collective/axolotl
  2. run
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" accelerate launch -m axolotl.cli.train examples/cohere-command/lora.yaml
  1. You may change the dataset as mentioned in the axolotl environment.

Expected behavior

It should start the training, but in 4_bit quantization qlora it's causing error.

@younesbelkada
Copy link
Contributor

Hi @ashmalvayani
Can you share the full traceback ? I am wondering if you did not have the correct PEFT version as we recently fixed that on transformers #30162

@hiyouga
Copy link
Contributor

hiyouga commented Apr 18, 2024

It seems that there is still a problem in Transformers (see huggingface/peft#1635)

nb_params = (
quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
)

It can be resolved after this pr is merged
#30133

@ashmalvayani
Copy link
Author

ashmalvayani commented Apr 18, 2024

Hi @ashmalvayani Can you share the full traceback ? I am wondering if you did not have the correct PEFT version as we recently fixed that on transformers #30162

Please find below the complete traceback:

CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" accelerate launch -m axolotl.cli.train examples/cohere-command/lora.yaml
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `8`
                More than one GPU was found, enabling multi-GPU training.
                If this was unintended please pass in `--num_processes=1`.
        `--num_machines` was set to a value of `1`
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[2024-04-18 14:41:38,913] [INFO] [datasets.<module>:58] [PID:1765365] PyTorch version 2.0.1 available.
[2024-04-18 14:41:38,913] [INFO] [datasets.<module>:58] [PID:1765362] PyTorch version 2.0.1 available.
[2024-04-18 14:41:38,913] [INFO] [datasets.<module>:58] [PID:1765363] PyTorch version 2.0.1 available.
[2024-04-18 14:41:38,949] [INFO] [datasets.<module>:58] [PID:1765359] PyTorch version 2.0.1 available.
[2024-04-18 14:41:38,973] [INFO] [datasets.<module>:58] [PID:1765361] PyTorch version 2.0.1 available.
[2024-04-18 14:41:38,992] [INFO] [datasets.<module>:58] [PID:1765364] PyTorch version 2.0.1 available.
[2024-04-18 14:41:38,998] [INFO] [datasets.<module>:58] [PID:1765358] PyTorch version 2.0.1 available.
[2024-04-18 14:41:39,083] [INFO] [datasets.<module>:58] [PID:1765360] PyTorch version 2.0.1 available.
[2024-04-18 14:41:40,627] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,627] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,640] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,646] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,647] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,650] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,651] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,659] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:43,726] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765362] Created a temporary directory at /tmp/tmpqvejhx60
[2024-04-18 14:41:43,727] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765362] Writing /tmp/tmpqvejhx60/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,779] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765359] Created a temporary directory at /tmp/tmpy1t_1x1q
[2024-04-18 14:41:43,779] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765359] Writing /tmp/tmpy1t_1x1q/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,780] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765363] Created a temporary directory at /tmp/tmpbipza1zr
[2024-04-18 14:41:43,780] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765363] Writing /tmp/tmpbipza1zr/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,787] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765365] Created a temporary directory at /tmp/tmpxtst6jkt
[2024-04-18 14:41:43,787] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765365] Writing /tmp/tmpxtst6jkt/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,814] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765361] Created a temporary directory at /tmp/tmply3_95zj
[2024-04-18 14:41:43,814] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765361] Writing /tmp/tmply3_95zj/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,821] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765358] Created a temporary directory at /tmp/tmpjdkafe1p
[2024-04-18 14:41:43,822] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765358] Writing /tmp/tmpjdkafe1p/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,841] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765364] Created a temporary directory at /tmp/tmpd20a66wt
[2024-04-18 14:41:43,841] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765364] Writing /tmp/tmpd20a66wt/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,876] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765360] Created a temporary directory at /tmp/tmpdh55n_uz
[2024-04-18 14:41:43,877] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765360] Writing /tmp/tmpdh55n_uz/_remote_module_non_scriptable.py
[2024-04-18 14:41:44,672] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765363] [RANK:5] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,672] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765363] [RANK:5] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,681] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765365] [RANK:7] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,681] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765365] [RANK:7] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,682] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765362] [RANK:4] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,682] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765362] [RANK:4] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,710] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765359] [RANK:1] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,710] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765359] [RANK:1] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,787] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765364] [RANK:6] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,787] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765364] [RANK:6] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,791] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765361] [RANK:3] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,791] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765361] [RANK:3] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,802] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765358] [RANK:0] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,802] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765358] [RANK:0] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,806] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765360] [RANK:2] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,806] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765360] [RANK:2] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,856] [INFO] [axolotl.normalize_config:182] [PID:1765363] [RANK:5] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:44,868] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:44,870] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765363] Added key: store_based_barrier_key:1 to store for rank: 5
[2024-04-18 14:41:44,878] [INFO] [axolotl.normalize_config:182] [PID:1765365] [RANK:7] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:44,880] [INFO] [axolotl.normalize_config:182] [PID:1765362] [RANK:4] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:44,889] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:44,890] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765365] Added key: store_based_barrier_key:1 to store for rank: 7
[2024-04-18 14:41:44,890] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:44,891] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765362] Added key: store_based_barrier_key:1 to store for rank: 4
[2024-04-18 14:41:44,918] [INFO] [axolotl.normalize_config:182] [PID:1765359] [RANK:1] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:44,931] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:44,932] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765359] Added key: store_based_barrier_key:1 to store for rank: 1
[2024-04-18 14:41:44,993] [INFO] [axolotl.normalize_config:182] [PID:1765364] [RANK:6] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:45,005] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:45,006] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765364] Added key: store_based_barrier_key:1 to store for rank: 6
[2024-04-18 14:41:45,010] [INFO] [axolotl.normalize_config:182] [PID:1765358] [RANK:0] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:45,019] [INFO] [axolotl.normalize_config:182] [PID:1765361] [RANK:3] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:45,020] [INFO] [axolotl.normalize_config:182] [PID:1765360] [RANK:2] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:45,021] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:45,021] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-04-18 14:41:45,023] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765358] Added key: store_based_barrier_key:1 to store for rank: 0
[2024-04-18 14:41:45,030] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:45,031] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:45,031] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765361] Added key: store_based_barrier_key:1 to store for rank: 3
[2024-04-18 14:41:45,032] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765360] Added key: store_based_barrier_key:1 to store for rank: 2
[2024-04-18 14:41:45,033] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765360] Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,033] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765365] Rank 7: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,034] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765363] Rank 5: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,034] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765358] Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,035] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765359] Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,035] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765362] Rank 4: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,037] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765364] Rank 6: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,042] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765361] Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
                                 dP            dP   dP 
                                 88            88   88 
      .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 
      88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 
      88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 
      `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP 
                                                       
                                                       

****************************************
**** Axolotl Dependency Versions *****
  accelerate: 0.28.0         
        peft: 0.10.0         
transformers: 4.40.0.dev0    
         trl: 0.8.2.dev0     
       torch: 2.0.1          
bitsandbytes: 0.43.0         
****************************************
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,028] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765362] [RANK:4] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,029] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765362] [RANK:4] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,029] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765362] [RANK:4] PAD: 0 / <PAD>
[2024-04-18 14:41:46,029] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765362] [RANK:4] UNK: None / None
[2024-04-18 14:41:46,029] [INFO] [axolotl.load_tokenizer:293] [PID:1765362] [RANK:4] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,048] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765364] [RANK:6] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,048] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765364] [RANK:6] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,048] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765364] [RANK:6] PAD: 0 / <PAD>
[2024-04-18 14:41:46,048] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765364] [RANK:6] UNK: None / None
[2024-04-18 14:41:46,048] [INFO] [axolotl.load_tokenizer:293] [PID:1765364] [RANK:6] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,054] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765360] [RANK:2] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,054] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765360] [RANK:2] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,054] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765360] [RANK:2] PAD: 0 / <PAD>
[2024-04-18 14:41:46,054] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765360] [RANK:2] UNK: None / None
[2024-04-18 14:41:46,054] [INFO] [axolotl.load_tokenizer:293] [PID:1765360] [RANK:2] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765363] [RANK:5] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765363] [RANK:5] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765363] [RANK:5] PAD: 0 / <PAD>
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765365] [RANK:7] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765363] [RANK:5] UNK: None / None
[2024-04-18 14:41:46,083] [INFO] [axolotl.load_tokenizer:293] [PID:1765363] [RANK:5] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765365] [RANK:7] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765365] [RANK:7] PAD: 0 / <PAD>
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765365] [RANK:7] UNK: None / None
[2024-04-18 14:41:46,083] [INFO] [axolotl.load_tokenizer:293] [PID:1765365] [RANK:7] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,084] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765359] [RANK:1] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,084] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765359] [RANK:1] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,084] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765359] [RANK:1] PAD: 0 / <PAD>
[2024-04-18 14:41:46,084] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765359] [RANK:1] UNK: None / None
[2024-04-18 14:41:46,084] [INFO] [axolotl.load_tokenizer:293] [PID:1765359] [RANK:1] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,157] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765358] [RANK:0] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,157] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765358] [RANK:0] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,157] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765358] [RANK:0] PAD: 0 / <PAD>
[2024-04-18 14:41:46,157] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765358] [RANK:0] UNK: None / None
[2024-04-18 14:41:46,157] [INFO] [axolotl.load_tokenizer:293] [PID:1765358] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-18 14:41:46,160] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765358] [RANK:0] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:46,179] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765358] [RANK:0] Prepared dataset loaded from disk...
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,305] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765361] [RANK:3] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,305] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765361] [RANK:3] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,305] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765361] [RANK:3] PAD: 0 / <PAD>
[2024-04-18 14:41:46,305] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765361] [RANK:3] UNK: None / None
[2024-04-18 14:41:46,305] [INFO] [axolotl.load_tokenizer:293] [PID:1765361] [RANK:3] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-18 14:41:50,947] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765364] [RANK:6] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,947] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765361] [RANK:3] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,947] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765362] [RANK:4] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,948] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765359] [RANK:1] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,948] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765360] [RANK:2] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,948] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765363] [RANK:5] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,948] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765365] [RANK:7] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,966] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765364] [RANK:6] Prepared dataset loaded from disk...
[2024-04-18 14:41:50,965] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765365] [RANK:7] Prepared dataset loaded from disk...
[2024-04-18 14:41:50,974] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765362] [RANK:4] Prepared dataset loaded from disk...
[2024-04-18 14:41:50,974] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765360] [RANK:2] Prepared dataset loaded from disk...
[2024-04-18 14:41:50,974] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765361] [RANK:3] Prepared dataset loaded from disk...
[2024-04-18 14:41:50,976] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765359] [RANK:1] Prepared dataset loaded from disk...
[2024-04-18 14:41:50,985] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765363] [RANK:5] Prepared dataset loaded from disk...
[2024-04-18 14:41:51,193] [DEBUG] [axolotl.log:61] [PID:1765358] [RANK:0] total_num_tokens: 74_255_688
[2024-04-18 14:41:52,742] [DEBUG] [axolotl.log:61] [PID:1765358] [RANK:0] `total_supervised_tokens: 54_401_856`
[2024-04-18 14:41:59,207] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,289] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,394] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,419] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,477] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,533] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,534] [DEBUG] [axolotl.log:61] [PID:1765358] [RANK:0] data_loader_len: 8972
[2024-04-18 14:41:59,620] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,667] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,857] [INFO] [axolotl.log:61] [PID:1765358] [RANK:0] sample_packing_eff_est across ranks: [0.7693688273429871, 0.7703332304954529, 0.7695566415786743, 0.7701941728591919, 0.7695893049240112, 0.7702759504318237, 0.7708409428596497, 0.770734429359436]
[2024-04-18 14:41:59,858] [DEBUG] [axolotl.log:61] [PID:1765358] [RANK:0] sample_packing_eff_est: 0.78
[2024-04-18 14:41:59,858] [DEBUG] [axolotl.log:61] [PID:1765358] [RANK:0] total_num_steps: 1121
[2024-04-18 14:41:59,921] [DEBUG] [axolotl.train.log:61] [PID:1765358] [RANK:0] loading tokenizer... CohereForAI/c4ai-command-r-v01
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,490] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765365] [RANK:7] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,490] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765365] [RANK:7] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,490] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765365] [RANK:7] PAD: 0 / <PAD>
[2024-04-18 14:42:00,490] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765365] [RANK:7] UNK: None / None
[2024-04-18 14:42:00,490] [INFO] [axolotl.load_tokenizer:293] [PID:1765365] [RANK:7] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,497] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765358] [RANK:0] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,497] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765358] [RANK:0] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,497] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765358] [RANK:0] PAD: 0 / <PAD>
[2024-04-18 14:42:00,497] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765358] [RANK:0] UNK: None / None
[2024-04-18 14:42:00,497] [INFO] [axolotl.load_tokenizer:293] [PID:1765358] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-18 14:42:00,497] [DEBUG] [axolotl.train.log:61] [PID:1765358] [RANK:0] loading model and peft_config...
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,535] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765363] [RANK:5] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,535] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765363] [RANK:5] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,535] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765363] [RANK:5] PAD: 0 / <PAD>
[2024-04-18 14:42:00,535] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765363] [RANK:5] UNK: None / None
[2024-04-18 14:42:00,535] [INFO] [axolotl.load_tokenizer:293] [PID:1765363] [RANK:5] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,548] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765364] [RANK:6] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,548] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765364] [RANK:6] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,548] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765364] [RANK:6] PAD: 0 / <PAD>
[2024-04-18 14:42:00,548] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765364] [RANK:6] UNK: None / None
[2024-04-18 14:42:00,548] [INFO] [axolotl.load_tokenizer:293] [PID:1765364] [RANK:6] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,549] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765362] [RANK:4] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,550] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765362] [RANK:4] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,550] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765362] [RANK:4] PAD: 0 / <PAD>
[2024-04-18 14:42:00,550] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765362] [RANK:4] UNK: None / None
[2024-04-18 14:42:00,550] [INFO] [axolotl.load_tokenizer:293] [PID:1765362] [RANK:4] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,567] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765360] [RANK:2] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,567] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765360] [RANK:2] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,567] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765360] [RANK:2] PAD: 0 / <PAD>
[2024-04-18 14:42:00,567] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765360] [RANK:2] UNK: None / None
[2024-04-18 14:42:00,567] [INFO] [axolotl.load_tokenizer:293] [PID:1765360] [RANK:2] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,604] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765361] [RANK:3] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,604] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765361] [RANK:3] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,604] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765361] [RANK:3] PAD: 0 / <PAD>
[2024-04-18 14:42:00,604] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765361] [RANK:3] UNK: None / None
[2024-04-18 14:42:00,604] [INFO] [axolotl.load_tokenizer:293] [PID:1765361] [RANK:3] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,644] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765359] [RANK:1] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,644] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765359] [RANK:1] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,644] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765359] [RANK:1] PAD: 0 / <PAD>
[2024-04-18 14:42:00,644] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765359] [RANK:1] UNK: None / None
[2024-04-18 14:42:00,644] [INFO] [axolotl.load_tokenizer:293] [PID:1765359] [RANK:1] No Chat template selected. Consider adding a chat template for easier inference.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:18<00:00,  1.23s/it]
Loading checkpoint shards:  73%|████████████████████████████████████████████████████                   | 11/15 [00:18<00:06,  1.60s/it][2024-04-18 14:42:19,717] [INFO] [axolotl.load_model:720] [PID:1765365] [RANK:7] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.118GB misc)
[2024-04-18 14:42:19,725] [INFO] [axolotl.load_model:771] [PID:1765365] [RANK:7] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:19,726] [INFO] [axolotl.load_model:780] [PID:1765365] [RANK:7] converting modules to torch.float16 for flash attention
Loading checkpoint shards:  73%|████████████████████████████████████████████████████                   | 11/15 [00:18<00:06,  1.58s/it][2024-04-18 14:42:19,990] [INFO] [axolotl.load_model:825] [PID:1765365] [RANK:7] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.118GB misc)
Loading checkpoint shards:  73%|████████████████████████████████████████████████████                   | 11/15 [00:18<00:06,  1.59s/it]Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Loading checkpoint shards:  80%|████████████████████████████████████████████████████████▊              | 12/15 [00:20<00:04,  1.48s/it][2024-04-18 14:42:21,244] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
Loading checkpoint shards:  93%|██████████████████████████████████████████████████████████████████▎    | 14/15 [00:20<00:01,  1.44s/it][2024-04-18 14:42:22,177] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:21<00:00,  1.43s/it]
Loading checkpoint shards:  87%|█████████████████████████████████████████████████████████████▌         | 13/15 [00:21<00:02,  1.45s/it][2024-04-18 14:42:22,712] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
Loading checkpoint shards:  87%|█████████████████████████████████████████████████████████████▌         | 13/15 [00:21<00:02,  1.46s/it][2024-04-18 14:42:22,749] [INFO] [axolotl.load_model:720] [PID:1765358] [RANK:0] GPU memory usage after model load: 19.706GB (+0.171GB cache, +3.243GB misc)
[2024-04-18 14:42:22,756] [INFO] [axolotl.load_model:771] [PID:1765358] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:22,758] [INFO] [axolotl.load_model:780] [PID:1765358] [RANK:0] converting modules to torch.float16 for flash attention
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:21<00:00,  1.46s/it]
[2024-04-18 14:42:23,018] [WARNING] [axolotl.load_lora:984] [PID:1765358] [RANK:0] Exception caught during model.print_trainable_parameters(): 'torch.dtype' object has no attribute 'itemsize'
[2024-04-18 14:42:23,029] [INFO] [axolotl.load_model:825] [PID:1765358] [RANK:0] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +3.243GB misc)
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-18 14:42:23,194] [INFO] [axolotl.train.log:61] [PID:1765358] [RANK:0] Pre-saving adapter config to ./outputs/c4ai_lora
[2024-04-18 14:42:23,304] [INFO] [axolotl.load_model:720] [PID:1765363] [RANK:5] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.259GB misc)
[2024-04-18 14:42:23,311] [INFO] [axolotl.load_model:771] [PID:1765363] [RANK:5] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:23,313] [INFO] [axolotl.load_model:780] [PID:1765363] [RANK:5] converting modules to torch.float16 for flash attention
[2024-04-18 14:42:23,339] [INFO] [axolotl.train.log:61] [PID:1765358] [RANK:0] Starting trainer...
Loading checkpoint shards:  93%|██████████████████████████████████████████████████████████████████▎    | 14/15 [00:22<00:01,  1.34s/it][2024-04-18 14:42:23,504] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:23,572] [INFO] [axolotl.load_model:825] [PID:1765363] [RANK:5] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.259GB misc)
Loading checkpoint shards:  93%|██████████████████████████████████████████████████████████████████▎    | 14/15 [00:22<00:01,  1.35s/it]Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:22<00:00,  1.52s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:22<00:00,  1.53s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:22<00:00,  1.53s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:23<00:00,  1.54s/it]
[2024-04-18 14:42:24,022] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:23<00:00,  1.54s/it]
[2024-04-18 14:42:24,216] [INFO] [axolotl.load_model:720] [PID:1765362] [RANK:4] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.259GB misc)
[2024-04-18 14:42:24,228] [INFO] [axolotl.load_model:771] [PID:1765362] [RANK:4] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:24,230] [INFO] [axolotl.load_model:780] [PID:1765362] [RANK:4] converting modules to torch.float16 for flash attention
[2024-04-18 14:42:24,342] [INFO] [axolotl.load_model:720] [PID:1765359] [RANK:1] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.259GB misc)
[2024-04-18 14:42:24,348] [INFO] [axolotl.load_model:720] [PID:1765360] [RANK:2] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.259GB misc)
[2024-04-18 14:42:24,356] [INFO] [axolotl.load_model:720] [PID:1765364] [RANK:6] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.259GB misc)
[2024-04-18 14:42:24,357] [INFO] [axolotl.load_model:771] [PID:1765360] [RANK:2] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:24,357] [INFO] [axolotl.load_model:771] [PID:1765359] [RANK:1] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:24,358] [INFO] [axolotl.load_model:780] [PID:1765360] [RANK:2] converting modules to torch.float16 for flash attention
[2024-04-18 14:42:24,359] [INFO] [axolotl.load_model:780] [PID:1765359] [RANK:1] converting modules to torch.float16 for flash attention
[2024-04-18 14:42:24,363] [INFO] [axolotl.load_model:771] [PID:1765364] [RANK:6] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:24,365] [INFO] [axolotl.load_model:780] [PID:1765364] [RANK:6] converting modules to torch.float16 for flash attention
[2024-04-18 14:42:24,487] [INFO] [axolotl.load_model:720] [PID:1765361] [RANK:3] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.259GB misc)
[2024-04-18 14:42:24,495] [INFO] [axolotl.load_model:771] [PID:1765361] [RANK:3] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:24,497] [INFO] [axolotl.load_model:780] [PID:1765361] [RANK:3] converting modules to torch.float16 for flash attention
[2024-04-18 14:42:24,514] [INFO] [axolotl.load_model:825] [PID:1765362] [RANK:4] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.259GB misc)
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-18 14:42:24,638] [INFO] [axolotl.load_model:825] [PID:1765364] [RANK:6] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.259GB misc)
[2024-04-18 14:42:24,639] [INFO] [axolotl.load_model:825] [PID:1765360] [RANK:2] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.259GB misc)
[2024-04-18 14:42:24,641] [INFO] [axolotl.load_model:825] [PID:1765359] [RANK:1] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.259GB misc)
[2024-04-18 14:42:24,715] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:24,720] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-18 14:42:24,791] [INFO] [axolotl.load_model:825] [PID:1765361] [RANK:3] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.259GB misc)
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-18 14:42:25,162] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,439] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,469] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,578] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,674] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,729] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,744] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,831] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,937] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,997] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,187] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,356] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,374] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,381] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,473] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,589] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,667] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,787] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,800] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,935] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,937] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:27,198] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:27,457] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:27,463] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:27,678] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.6514461040496826 seconds
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
[2024-04-18 14:42:30,051] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765365] Added key: store_based_barrier_key:2 to store for rank: 7
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.6153619289398193 seconds
[2024-04-18 14:42:30,817] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765358] Added key: store_based_barrier_key:2 to store for rank: 0
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.6068532466888428 seconds
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.5634384155273438 seconds
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.5859313011169434 seconds
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.5404446125030518 seconds
[2024-04-18 14:42:31,718] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765363] Added key: store_based_barrier_key:2 to store for rank: 5
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.5590782165527344 seconds
[2024-04-18 14:42:31,865] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765362] Added key: store_based_barrier_key:2 to store for rank: 4
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.649749279022217 seconds
[2024-04-18 14:42:32,201] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765359] Added key: store_based_barrier_key:2 to store for rank: 1
[2024-04-18 14:42:32,414] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765360] Added key: store_based_barrier_key:2 to store for rank: 2
[2024-04-18 14:42:32,451] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765361] Added key: store_based_barrier_key:2 to store for rank: 3
[2024-04-18 14:42:32,710] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765364] Added key: store_based_barrier_key:2 to store for rank: 6
[2024-04-18 14:42:32,710] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765364] Rank 6: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,711] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765360] Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,714] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765363] Rank 5: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,714] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765365] Rank 7: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,714] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765358] Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,714] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765359] Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,717] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765361] Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,718] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765362] Rank 4: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:36,001] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,062] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,066] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,089] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,095] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,109] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,161] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,597] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,762] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,763] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,830] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,859] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,906] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,952] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  0%|                                                                                                        | 0/11503 [00:00<?, ?it/s]You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[2024-04-18 14:42:37,518] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:38,287] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
  0%|                                                                                                        | 0/11503 [00:03<?, ?it/s]
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1765359 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1765361 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1765364 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1765365 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1765358) of binary: /home/ashmal.vayani/anaconda3/envs/axolotl/bin/python
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1048, in launch_command
    multi_gpu_launcher(args)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/commands/launch.py", line 702, in multi_gpu_launcher
    distrib_run.run(args)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
axolotl.cli.train FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-04-18_14:42:42
  host      : 675d-4.dl-labs.ai
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 1765360)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2024-04-18_14:42:42
  host      : 675d-4.dl-labs.ai
  rank      : 4 (local_rank: 4)
  exitcode  : 1 (pid: 1765362)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2024-04-18_14:42:42
  host      : 675d-4.dl-labs.ai
  rank      : 5 (local_rank: 5)
  exitcode  : 1 (pid: 1765363)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-04-18_14:42:42
  host      : 675d-4.dl-labs.ai
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1765358)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

@ashmalvayani
Copy link
Author

ashmalvayani commented Apr 18, 2024

Upgrading the torch to 2.1.0, with this command:
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia

and changing the flash attention installation via this:

pip uninstall flash-attn
FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn

fixed both the issue of element_size and bf16 was solved.
"triu_tril_cuda_template" not implemented for 'BFloat16'

However, I think it's a work around and not the actual fix. I could be wrong, can you please let me know?

@QiFengSu
Copy link

变形金刚似乎还是有问题的(见huggingface/peft#1635)

nb_params = (
quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
)

合并此pr后可以解决 #30133

I had the same issue with the transformers line that you mentioned. I managed to fix it using the solution from the issue you highlighted, but I'm not entirely sure if it will work properly since the fix only involved setting a constant.

@ashmalvayani
Copy link
Author

变形金刚似乎还是有问题的(见huggingface/peft#1635)

nb_params = (
quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
)

合并此pr后可以解决 #30133

I had the same issue with the transformer line that you mentioned. I managed to fix it using the solution from the issue you highlighted, but I'm not entirely sure if it will work properly since the fix only involved setting a constant.

I see how there was a solution of manually adding a helper function like this

quant_storage = self.hf_quantizer.quantization_config.bnb_4bit_quant_storage
nb_params = get_dtype_size(quant_storage)
total_numel.append(param.numel() * 2 * nb_params)

However, the current line you've highlighted "Lines 1164 to 1166" is the issue where the quant_storage neither has the item_size and neither has the element_size (in earlier and later versions of transformers respectively).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants