Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finetune的 MAX_STEPS = None 意义是什么?可以改成其他吗? #24

Closed
ZenXir opened this issue Apr 1, 2023 · 13 comments
Closed

Comments

@ZenXir
Copy link

ZenXir commented Apr 1, 2023

这里的 MAX_STEPS = None 为什么要设置成None?可以改成其他吗?

if not args.wandb:
    os.environ["WANDB_MODE"] = "disable"
# optimized for RTX 4090. for larger GPUs, increase some of these?
MICRO_BATCH_SIZE = 4  # this could actually be 5 but i like powers of 2
BATCH_SIZE = 128
MAX_STEPS = None
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = 3  # we don't always need 3 tbh
LEARNING_RATE = 3e-4  # the Karpathy constant
CUTOFF_LEN = 256  # 256 accounts for about 96% of the data
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.05
VAL_SET_SIZE = args.test_size #2000
TARGET_MODULES = [
    "q_proj",
    "v_proj",
]
@ZenXir
Copy link
Author

ZenXir commented Apr 1, 2023

是这样的大佬老师 我使用合并的model作为 base model 来 finetune, 提示这个错误
关于 MAX_STEPS 设置为None的原因

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:22<00:00, 11.04s/it]
Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-5488fd0b86b9abc9/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 473.18it/s]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 42.30it/s]
Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-5488fd0b86b9abc9/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 45.51it/s]
trainable params: 4194304 || all params: 6889689088 || trainable%: 0.060877986603275876
Traceback (most recent call last):
  File "/mnt/e/Chinese-Vicuna/finetune.py", line 228, in <module>
    trainer = transformers.Trainer(
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 543, in __init__
    if args.max_steps > 0:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
(Chinese-alpaca-lora) root@DESKTOP-6KDJTBC:/mnt/e/Chinese-Vicuna#

@Facico
Copy link
Owner

Facico commented Apr 1, 2023

@ZenXir max_step会在代码下面改。这个问题我昨天在本地branch改了忘push上来了,你可以更新一下。

@ZenXir
Copy link
Author

ZenXir commented Apr 1, 2023

好的大佬老师

@ZenXir ZenXir closed this as completed Apr 1, 2023
@ZenXir
Copy link
Author

ZenXir commented Apr 1, 2023

大佬老师 我使用合并的model 使用finetune.py 训练
试了多次 一直报错

模型合并过程和流程分两步:
1、是先按照 https://github.com/ymcui/Chinese-LLaMA-Alpaca 给出的embedding过的model 合并出 pth模型
2、把 1 合并出的pth模型,再通过 stransformer 转换成 huggingface 格式:
python src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir /mnt/e/Chinese-LLaMA-Alpaca/model --model_size 7B --output_dir /mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf

finetune命令是:
python finetune.py --data_path sample/merge.json --output_path lora-Vicuna_Embedded/7B/ --model_path /mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf

报错内容是这个:

CUDA SETUP: Loading binary /root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:21<00:00, 10.80s/it]
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-5488fd0b86b9abc9/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.06it/s]
trainable params: 4194304 || all params: 6889689088 || trainable%: 0.060877986603275876

 If there's a warning about missing keys above, please disregard :)
/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|                                                                                                                                                             | 0/16260 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/mnt/e/Chinese-Vicuna/finetune.py", line 271, in <module>
    trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 1636, in train
    return inner_training_loop(
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 1903, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 2649, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 2681, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/peft-0.3.0.dev0-py3.9.egg/peft/peft_model.py", line 529, in forward
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/accelerate-0.17.1-py3.9.egg/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/models/llama/modeling_llama.py", line 786, in forward
    loss = loss_fct(shift_logits.view(-1, self.config.vocab_size), shift_labels.view(-1))
RuntimeError: shape '[-1, 32000]' is invalid for input of size 50953080

@Facico
Copy link
Owner

Facico commented Apr 1, 2023

@ZenXir 我还没跑过他们的,你先自己研究一下吧。你这个情况就是没成功转过来。

RuntimeError: shape '[-1, 32000]' is invalid for input of size 50953080,llama的词表就是32000左右,这个仓库词表好像是49954这么多吧(不知道后续有没有更新)。如果我猜的没错的话,应该是要加上这一段东西model.resize_token_embeddings(len(tokenizer)) 来更新model内部的embedding维度,你可以试试

@ZenXir
Copy link
Author

ZenXir commented Apr 1, 2023

在 prepare for traning 前这样 resize_token_embeddings 就可以训练了大佬
我让机器跑两天 看看训练出来的效果怎么样

vocab_size = len(tokenizer.get_vocab())
print("Tokenizer的词表数量为:", vocab_size)
model.resize_token_embeddings(vocab_size)

@ZenXir
Copy link
Author

ZenXir commented Apr 1, 2023

@Facico 对了大佬老师
用合并了 embedding model的模型finetune 我使用的命令是:
python finetune.py --data_path sample/merge.json --output_path lora-Vicuna_Embedded/7B/ --model_path /mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf

其他参数都是默认的,我的机器是单卡 RTX4090 24G
在影响训练效果,和速度方面 有什么建议调整的参数不?
像 bath_size , test_size, epoch 什么的
尤其效果方面的 到时候可以更直观的对比

@Facico
Copy link
Owner

Facico commented Apr 12, 2023

抱歉消息太多了有些消息会看漏,如果要直观的对比的话,保持batch size和epoch就可以了,如果想要跑快一点可以将mirco batch size调大

@molyswu
Copy link

molyswu commented Apr 18, 2023

双卡,RTX3090:

if not args.wandb:
37 os.environ["WANDB_MODE"] = "disable"
38 # optimized for RTX 4090. for larger GPUs, increase some of these?
39 MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2
40 BATCH_SIZE = 128
41 MAX_STEPS = None
42 GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
43 EPOCHS = 3 # we don't always need 3 tbh
44 LEARNING_RATE = 3e-4 # the Karpathy constant
45 CUTOFF_LEN = 256 # 256 accounts for about 96% of the data
46 LORA_R = 8
47 LORA_ALPHA = 16
48 LORA_DROPOUT = 0.05
49 VAL_SET_SIZE = args.test_size #2000
50 TARGET_MODULES = [
51 "q_proj",
52 "v_proj",
53 ]

@molyswu
Copy link

molyswu commented Apr 18, 2023

/root/anaconda3/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
warnings.warn(
0%| | 0/32481 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
./Chinese-Vicuna/finetune.py:271 in │
│ │
│ 268 │
│ 269 print("\n If there's a warning about missing keys above, please disregard :)") │
│ 270 │
│ ❱ 271 trainer.train(resume_from_checkpoint=args.resume_from_checkpoint) │
│ 272 │
│ 273 model.save_pretrained(OUTPUT_DIR) │
│ 274 │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1662 in train │
│ │
│ 1659 │ │ inner_training_loop = find_executable_batch_size( │
│ 1660 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1661 │ │ ) │
│ ❱ 1662 │ │ return inner_training_loop( │
│ 1663 │ │ │ args=args, │
│ 1664 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1665 │ │ │ trial=trial, │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1929 in _inner_training_loop │
│ │
│ 1926 │ │ │ │ │ with model.no_sync(): │
│ 1927 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1928 │ │ │ │ else: │
│ ❱ 1929 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1930 │ │ │ │ │
│ 1931 │ │ │ │ if ( │
│ 1932 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2699 in training_step │
│ │
│ 2696 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │
│ 2697 │ │ │
│ 2698 │ │ with self.compute_loss_context_manager(): │
│ ❱ 2699 │ │ │ loss = self.compute_loss(model, inputs) │
│ 2700 │ │ │
│ 2701 │ │ if self.args.n_gpu > 1: │
│ 2702 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2731 in compute_loss │
│ │
│ 2728 │ │ │ labels = inputs.pop("labels") │
│ 2729 │ │ else: │
│ 2730 │ │ │ labels = None │
│ ❱ 2731 │ │ outputs = model(**inputs) │
│ 2732 │ │ # Save past state if it exists │
│ 2733 │ │ # TODO: this needs to be fixed and made cleaner later. │
│ 2734 │ │ if self.args.past_index >= 0: │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1102 in _call_impl │
│ │
│ 1099 │ │ # this function, and just call forward. │
│ 1100 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1101 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1102 │ │ │ return forward_call(*input, **kwargs) │
│ 1103 │ │ # Do not call functions when jit is used │
│ 1104 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1105 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ in forward:663 │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1102 in _call_impl │
│ │
│ 1099 │ │ # this function, and just call forward. │
│ 1100 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1101 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1102 │ │ │ return forward_call(*input, **kwargs) │
│ 1103 │ │ # Do not call functions when jit is used │
│ 1104 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1105 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/accelerate/hooks.py:165 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module._hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:709 in │
│ forward │
│ │
│ 706 │ │ │ shift_labels = labels[..., 1:].contiguous() │
│ 707 │ │ │ # Flatten the tokens │
│ 708 │ │ │ loss_fct = CrossEntropyLoss() │
│ ❱ 709 │ │ │ shift_logits = shift_logits.view(-1, self.config.vocab_size) │
│ 710 │ │ │ shift_labels = shift_labels.view(-1) │
│ 711 │ │ │ # Enable model parallelism │
│ 712 │ │ │ shift_labels = shift_labels.to(shift_logits.device) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000
0%| | 0/32481 [00:04<?, ?it/s]

@godzeo
Copy link

godzeo commented Apr 24, 2023

在 prepare for traning 前这样 resize_token_embeddings 就可以训练了大佬 我让机器跑两天 看看训练出来的效果怎么样

vocab_size = len(tokenizer.get_vocab())
print("Tokenizer的词表数量为:", vocab_size)
model.resize_token_embeddings(vocab_size)

大佬三句代码是加在哪一步的哪个文件里面呢?我也想做同样的训练,奈何我太菜了,没明白

@Facico
Copy link
Owner

Facico commented May 4, 2023

@godzeo 放在加载完模型和tokenizer后就行

@abbhay
Copy link

abbhay commented May 13, 2023

好的大佬老师

老哥 这个max_step 怎么填哇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants