Llamatune fails with your example code from its home page

steps to reproduce
1) start a runpod container with the pytorch 2.01 template and lots of disk space
2) run your sample command on a properly formatted dataset:
python -m llamatune.train \
    --model_name meta-llama/Llama-2-13b-chat-hf \
    --data_path master_qa.json \
    --training_recipe lora \
    --batch_size 8 \
    --gradient_accumulation_steps 4 \
    --learning_rate 1e-4 \
    --output_dir chat_llama2_13b \
    --use_auth_token xxxzzz
3) result is: 
Model ready for training!
trainable params: 250347520 || all params: 6922337280 || trainable: 3.616517223500557
WARNING:root:Loading data...
WARNING:root:Tokenizing inputs... This may take some time...
config TrainingConfig(model_name='meta-llama/Llama-2-13b-chat-hf', data_path='master_qa.json', output_dir='chat_llama2_13b', training_recipe='lora', optim='paged_adamw_8bit', batch_size=8, gradient_accumulation_steps=4, n_epochs=3, weight_decay=0.0, learning_rate=0.0001, max_grad_norm=0.3, gradient_checkpointing=True, do_train=True, lr_scheduler_type='cosine', warmup_ratio=0.03, logging_steps=1, group_by_length=True, save_strategy='epoch', save_total_limit=3, fp16=True, tokenizer_type='llama', trust_remote_code=False, compute_dtype=torch.float16, max_tokens=4096, do_eval=True, evaluation_strategy='epoch', use_auth_token='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', use_fast=False, bits=4, double_quant=True, quant_type='nf4', lora_r=64, lora_alpha=16, lora_dropout=0.0)
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/llamatune/train.py", line 50, in <module>
    trainer.train()
  File "/usr/local/lib/python3.10/dist-packages/llamatune/trainer.py", line 25, in train
    self.model_engine.train(data_module=self.data_module)
  File "/usr/local/lib/python3.10/dist-packages/llamatune/model_engines/llama_model_engine.py", line 33, in train
    trainer = Trainer(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 405, in __init__
    raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision.  if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.41.1`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llamatune fails with your example code from its home page #82

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llamatune fails with your example code from its home page #82

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions