Skip to content

Llamatune fails with your example code from its home page #82

@IridiumMaster

Description

@IridiumMaster

steps to reproduce

  1. start a runpod container with the pytorch 2.01 template and lots of disk space
  2. run your sample command on a properly formatted dataset:
    python -m llamatune.train
    --model_name meta-llama/Llama-2-13b-chat-hf
    --data_path master_qa.json
    --training_recipe lora
    --batch_size 8
    --gradient_accumulation_steps 4
    --learning_rate 1e-4
    --output_dir chat_llama2_13b
    --use_auth_token xxxzzz
  3. result is:
    Model ready for training!
    trainable params: 250347520 || all params: 6922337280 || trainable: 3.616517223500557
    WARNING:root:Loading data...
    WARNING:root:Tokenizing inputs... This may take some time...
    config TrainingConfig(model_name='meta-llama/Llama-2-13b-chat-hf', data_path='master_qa.json', output_dir='chat_llama2_13b', training_recipe='lora', optim='paged_adamw_8bit', batch_size=8, gradient_accumulation_steps=4, n_epochs=3, weight_decay=0.0, learning_rate=0.0001, max_grad_norm=0.3, gradient_checkpointing=True, do_train=True, lr_scheduler_type='cosine', warmup_ratio=0.03, logging_steps=1, group_by_length=True, save_strategy='epoch', save_total_limit=3, fp16=True, tokenizer_type='llama', trust_remote_code=False, compute_dtype=torch.float16, max_tokens=4096, do_eval=True, evaluation_strategy='epoch', use_auth_token='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', use_fast=False, bits=4, double_quant=True, quant_type='nf4', lora_r=64, lora_alpha=16, lora_dropout=0.0)
    Traceback (most recent call last):
    File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
    File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
    File "/usr/local/lib/python3.10/dist-packages/llamatune/train.py", line 50, in
    trainer.train()
    File "/usr/local/lib/python3.10/dist-packages/llamatune/trainer.py", line 25, in train
    self.model_engine.train(data_module=self.data_module)
    File "/usr/local/lib/python3.10/dist-packages/llamatune/model_engines/llama_model_engine.py", line 33, in train
    trainer = Trainer(
    File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 405, in init
    raise ValueError(
    ValueError: The model you want to train is loaded in 8-bit precision. if you want to fine-tune an 8-bit model, please make sure that you have installed bitsandbytes>=0.41.1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions