-
Couldn't load subscription status.
- Fork 464
Description
Hi,
What is the best way to run this on my high performance laptop?
Should this somehow work? Can i calculate how many days/weeks it will run?
Thanks in advance
Specs:
OS: Win 11 (WSL2)
CPU: Intel Core i7 12850HX
Make: Lenovo Thinkpad P16 gen 1
Memory: 128GB DDR5-4800 (2400MHz)
GPU: Nvidia RTX A5500 16GB
I found that this command would work on my laptop it seems:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml --num_processes=1 scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_lora.yaml --load_in_4bit=true --gradient_accumulation_steps=1024 --per_device_eval_batch_size=1 --per_device_train_batch_size=1
how now run it for 1-2 hours ish:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml --num_processes=1 scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_lora.yaml --load_in_4bit=true --gradient_accumulation_steps=1024 --per_device_eval_batch_size=1 --per_device_train_batch_size=1
INFO:root:Using nproc_per_node=1.
2023-11-27 15:41:33.914308: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variableTF_ENABLE_ONEDNN_OPTS=0.
2023-11-27 15:41:33.941565: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-27 15:41:34.582753: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2023-11-27 15:41:35,164] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/usr/local/lib/python3.11/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: Theoptimize_cuda_cachearguement will be deprecated soon, please useoptimize_device_cacheinstead.
warnings.warn(
2023-11-27 15:41:35 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, 16-bits training: False
2023-11-27 15:41:35 - INFO - main - Model parameters ModelArguments(base_model_revision=None, model_name_or_path='mistralai/Mistral-7B-v0.1', model_revision='main', model_code_revision=None, torch_dtype='auto', trust_remote_code=False, use_flash_attention_2=True, use_peft=True, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'], lora_modules_to_save=None, load_in_8bit=False, load_in_4bit=True, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False)
2023-11-27 15:41:35 - INFO - main - Data parameters DataArguments(chat_template=None, dataset_mixer={'HuggingFaceH4/ultrachat_200k': 1.0}, dataset_splits=['train_sft', 'test_sft'], max_train_samples=None, max_eval_samples=None, preprocessing_num_workers=12, truncation_side=None)
2023-11-27 15:41:35 - INFO - main - Training/evaluation parameters SFTConfig(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1024,
gradient_checkpointing=True,
gradient_checkpointing_kwargs={'use_reentrant': False},
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=zephyr-7b-sft-lora,
hub_private_repo=False,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=info,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=data/zephyr-7b-sft-lora/runs/Nov27_15-41-35,
logging_first_step=True,
logging_nan_inf_filter=True,
logging_steps=5,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_type=SchedulerType.COSINE,
max_grad_norm=1.0,
max_seq_length=2048,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=1,
optim=OptimizerNames.ADAMW_TORCH,
optim_args=None,
output_dir=data/zephyr-7b-sft-lora,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=1,
per_device_train_batch_size=1,
prediction_loss_only=False,
push_to_hub=True,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=data/zephyr-7b-sft-lora,
save_on_each_node=False,
save_safetensors=True,
save_steps=500,
save_strategy=IntervalStrategy.NO,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
split_batches=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
Overwrite dataset info from restored data version if exists.
2023-11-27 15:41:38 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists.
Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458
2023-11-27 15:41:38 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458
Found cached dataset ultrachat_200k (/root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458)
2023-11-27 15:41:38 - INFO - datasets.builder - Found cached dataset ultrachat_200k (/root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458)
Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458
2023-11-27 15:41:38 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458
Overwrite dataset info from restored data version if exists.
2023-11-27 15:41:40 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists.
Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458
2023-11-27 15:41:40 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458
Found cached dataset ultrachat_200k (/root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458)
2023-11-27 15:41:40 - INFO - datasets.builder - Found cached dataset ultrachat_200k (/root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458)
Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458
2023-11-27 15:41:40 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458
Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-91f7f728fecb2505.arrow
2023-11-27 15:41:40 - INFO - datasets.arrow_dataset - Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-91f7f728fecb2505.arrow
Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-83009ff6f17d65d0.arrow
2023-11-27 15:41:40 - INFO - datasets.arrow_dataset - Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-83009ff6f17d65d0.arrow
2023-11-27 15:41:40 - INFO - main - Training on the following datasets and their proportions: ['train : 207865', 'test : 23110']
[INFO|tokenization_utils_base.py:2022] 2023-11-27 15:41:40,744 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/tokenizer.model
[INFO|tokenization_utils_base.py:2022] 2023-11-27 15:41:40,744 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/tokenizer.json
[INFO|tokenization_utils_base.py:2022] 2023-11-27 15:41:40,744 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2022] 2023-11-27 15:41:40,744 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/special_tokens_map.json
[INFO|tokenization_utils_base.py:2022] 2023-11-27 15:41:40,744 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/tokenizer_config.json
Loading cached processed dataset at /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-3e95fae9b410a2c7.arrow
2023-11-27 15:41:40 - INFO - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-3e95fae9b410a2c7.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-84dc14e69dab5370.arrow
2023-11-27 15:41:40 - INFO - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-84dc14e69dab5370.arrow
2023-11-27 15:41:40 - INFO - main - Sample 167621 of the processed training set:
........
2023-11-27 15:41:40 - INFO - main - *** Load pretrained model ***
2023-11-27 15:41:40 - INFO - main - *** Model loaded! ***
/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py:145: UserWarning: You passed a model_id to the SFTTrainer. This will automatically create anAutoModelForCausalLMor aPeftModel(if you passed apeft_config) for you.
warnings.warn(
[INFO|configuration_utils.py:717] 2023-11-27 15:41:40,964 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/config.json
[INFO|configuration_utils.py:777] 2023-11-27 15:41:40,964 >> Model config MistralConfig {
"_name_or_path": "mistralai/Mistral-7B-v0.1",
"architectures": [
"MistralForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"sliding_window": 4096,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.35.0",
"use_cache": false,
"vocab_size": 32000
}[INFO|modeling_utils.py:3121] 2023-11-27 15:41:40,972 >> loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/pytorch_model.bin.index.json
[INFO|modeling_utils.py:3184] 2023-11-27 15:41:40,974 >> Will use torch_dtype=torch.bfloat16 as defined in model's config object
[INFO|modeling_utils.py:1222] 2023-11-27 15:41:40,974 >> Instantiating MistralForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:791] 2023-11-27 15:41:40,976 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"use_cache": false
}[INFO|modeling_utils.py:3257] 2023-11-27 15:41:41,631 >> Detected 4-bit loading: activating 4-bit loading for this model
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:09<00:00, 4.75s/it][INFO|modeling_utils.py:3950] 2023-11-27 15:41:51,332 >> All model checkpoint weights were used when initializing MistralForCausalLM.[INFO|modeling_utils.py:3958] 2023-11-27 15:41:51,332 >> All the weights of MistralForCausalLM were initialized from the model checkpoint at mistralai/Mistral-7B-v0.1.
If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training.
[INFO|configuration_utils.py:751] 2023-11-27 15:41:51,488 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/generation_config.json
[INFO|configuration_utils.py:791] 2023-11-27 15:41:51,488 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}[INFO|training_args.py:1784] 2023-11-27 15:41:51,646 >> PyTorch: setting up devices
/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py:247: UserWarning: You passed a tokenizer withpadding_sidenot equal torightto the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider addingtokenizer.padding_side = 'right'to your code.
warnings.warn(
[INFO|trainer.py:593] 2023-11-27 15:41:52,619 >> Using auto half precision backend
2023-11-27 15:41:52 - INFO - main - *** Train ***
[INFO|trainer.py:1723] 2023-11-27 15:41:53,614 >> ***** Running training *****
[INFO|trainer.py:1724] 2023-11-27 15:41:53,614 >> Num examples = 207,865
[INFO|trainer.py:1725] 2023-11-27 15:41:53,614 >> Num Epochs = 1
[INFO|trainer.py:1726] 2023-11-27 15:41:53,614 >> Instantaneous batch size per device = 1
[INFO|trainer.py:1729] 2023-11-27 15:41:53,614 >> Total train batch size (w. parallel, distributed & accumulation) = 1,024
[INFO|trainer.py:1730] 2023-11-27 15:41:53,614 >> Gradient Accumulation steps = 1024
[INFO|trainer.py:1731] 2023-11-27 15:41:53,614 >> Total optimization steps = 202
[INFO|trainer.py:1732] 2023-11-27 15:41:53,616 >> Number of trainable parameters = 54,525,952
0%| | 0/202 [00:00<?, ?it/s][WARNING|tokenization_utils_base.py:3831] 2023-11-27 15:41:54,956 >> Token indices sequence length is longer than the specified maximum sequence length for this model (2377 > 2048). Running this sequence through the model will result in indexing errors
[WARNING|logging.py:314] 2023-11-27 15:41:55,018 >> You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the__call__method is faster than using a method to encode the text followed by a call to thepadmethod to get a padded encoding.
[WARNING|logging.py:329] 2023-11-27 15:41:55,763 >> The input hidden states seems to be silently casted in float32, this might be related to the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in torch.bfloat16.
[W reducer.cpp:1346] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
0%|▌ | 1/202 [4:36:47<927:14:16, 16607.25s/it]{'loss': 1.1453, 'learning_rate': 1.9998790632601496e-05, 'epoch': 0.0}
0%|▌ | 1/202 [4:36:47<927:14:16, 16607.25s/it]