You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello everyone,
I am trying to fine-tune the mixtral-8x22b-instruct model but I keep getting the OOM error.
I am using 3x A100 gpus for a total of 240gb of vram.
I am using QLORA 4bit.
After the first finetuning step it goes to OOM error.
My dataset consists of about 2000 records and they are all quite long texts, in some cases I think one record corresponds to about 30000 tokens.
Here is my "accelerate" configuration:
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
fsdp_config:
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_backward_prefetch: BACKWARD_PRE
fsdp_forward_prefetch: false
fsdp_cpu_ram_efficient_loading: true
fsdp_offload_params: true # offload may affect training speed
fsdp_sharding_strategy: FULL_SHARD
fsdp_state_dict_type: SHARDED_STATE_DICT
fsdp_sync_module_states: true
fsdp_use_orig_params: true
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1 # the number of nodes
num_processes: 3 # the number of GPUs in all nodes
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
File "/workspace/LLaMA-Factory/src/train.py", line 14, in <module>
main()
File "/workspace/LLaMA-Factory/src/train.py", line 5, in main
run_exp()
File "/workspace/LLaMA-Factory/src/llamafactory/train/tuner.py", line 33, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/workspace/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 73, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1885, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2216, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3250, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2121, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 522, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 266, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 289, in apply
return user_fn(self, *args)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 319, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 266, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 364.00 MiB. GPU 0 has a total capacity of 79.15 GiB of which 50.12 MiB is free. Process 893681 has 78.98 GiB memory in use. Of the allocated memory 75.62 GiB is allocated by PyTorch, and 2.70 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
What am I doing wrong?
Thank you in advance for your help
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello everyone,
I am trying to fine-tune the mixtral-8x22b-instruct model but I keep getting the OOM error.
I am using 3x A100 gpus for a total of 240gb of vram.
I am using QLORA 4bit.
After the first finetuning step it goes to OOM error.
My dataset consists of about 2000 records and they are all quite long texts, in some cases I think one record corresponds to about 30000 tokens.
Here is my "accelerate" configuration:
Here is my LLaMA-Factory configuration:
Error:
What am I doing wrong?
Thank you in advance for your help
Beta Was this translation helpful? Give feedback.
All reactions