You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
completed the fine tune on 'Weyaxi/Dolphin2.1-OpenOrca-7B' using ipex-llm on gpu max 1100
output directory look like as below with checkpoints and config file.
made changes to inference file and removed training parameter from the 'adapter_config.json' to do the inference
ran following command to perform the inference: - accelerate launch -m inference lora.yml --lora_model_dir="./qlora-out/"
after submitting instruction below issue occurred [ it also says certain quantization is not applicable on CPU, while we are running on GPU and did the FT on GPU"]
logs
(ft_llm) intel@imu-nex-sprx92-max1-sut:~/ritu/axolotl$ accelerate launch -m inference lora.yml --lora_model_dir="./qlora-out/"
/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvi sion.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvi sion.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
2024-05-23 11:31:52,323 - INFO - intel_extension_for_pytorch auto imported
2024-05-23 11:31:52,341 - WARNING - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
dP dP dP
88 88 88
.d8888b. dP. .dP .d8888b. 88 .d8888b. d8888P 88
88' `88 `8bd8' 88' `88 88 88' `88 88 88
88. .88 .d88b. 88. .88 88 88. .88 88 88
`88888P8 dP' `dP `88888P' dP `88888P' dP dP
/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always r esume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
[2024-05-23 11:31:55,282] [INFO] [axolotl.normalize_config:169] [PID:461596] [RANK:0] GPU memory usage baseline: 0.000GB ()
[2024-05-23 11:31:55,283] [INFO] [axolotl.common.cli.load_model_and_tokenizer:49] [PID:461596] [RANK:0] loading tokenizer... Weyaxi/Dolphin2.1-OpenOrca-7B
[2024-05-23 11:31:55,704] [DEBUG] [axolotl.load_tokenizer:216] [PID:461596] [RANK:0] EOS: 2 / </s>
[2024-05-23 11:31:55,704] [DEBUG] [axolotl.load_tokenizer:217] [PID:461596] [RANK:0] BOS: 1 / <s>
[2024-05-23 11:31:55,704] [DEBUG] [axolotl.load_tokenizer:218] [PID:461596] [RANK:0] PAD: 2 / </s>
[2024-05-23 11:31:55,704] [DEBUG] [axolotl.load_tokenizer:219] [PID:461596] [RANK:0] UNK: 0 / <unk>
[2024-05-23 11:31:55,704] [INFO] [axolotl.load_tokenizer:224] [PID:461596] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-05-23 11:31:55,704] [INFO] [axolotl.common.cli.load_model_and_tokenizer:51] [PID:461596] [RANK:0] loading model and (optionally) peft_config...
/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/ipex_llm/transformers/model.py:204: FutureWarning: BigDL LLM QLoRA does not support double quant now, set to False
warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3.78it/s]
[2024-05-23 11:33:26,633] [INFO] [axolotl.load_model:665] [PID:461596] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training
[2024-05-23 11:33:26,636] [INFO] [axolotl.load_model:677] [PID:461596] [RANK:0] converting modules to torch.bfloat16 for flash attention
[2024-05-23 11:33:26,850] [INFO] [axolotl.load_lora:789] [PID:461596] [RANK:0] found linear modules: ['up_proj', 'down_proj', 'o_proj', 'k_proj', 'v_proj', 'q_proj', 'gate_proj']
[2024-05-23 11:33:26,851] [DEBUG] [axolotl.load_lora:808] [PID:461596] [RANK:0] Loading pretained PEFT - LoRA
trainable params: 41,943,040 || all params: 4,012,118,016 || trainable%: 1.0454089294665454
[2024-05-23 11:33:27,535] [INFO] [axolotl.load_model:714] [PID:461596] [RANK:0] GPU memory usage after adapters: 0.000GB ()
================================================================================
Give me an instruction (Ctrl + D to submit):
hello, test
========================================
<s>hello, Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/intel/ritu/axolotl/inference.py", line 41, in <module>
fire.Fire(do_cli)
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/ritu/axolotl/inference.py", line 37, in do_cli
do_inference(cfg=parsed_cfg, cli_args=parsed_cli_args)
File "/home/intel/ritu/axolotl/src/axolotl/cli/__init__.py", line 153, in do_inference
generated = model.generate(
^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/peft/peft_model.py", line 1190, in generate
outputs = self.base_model.generate(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/ipex_llm/transformers/lookup.py", line 87, in generate
return original_generate(self,
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/ipex_llm/transformers/speculative.py", line 109, in generate
return original_generate(self,
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/transformers/generation/utils.py", line 1764, in generate
return self.sample(
^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/transformers/generation/utils.py", line 2861, in sample
outputs = self(
^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py", line 1044, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py", line 929, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py", line 654, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py", line 255, in forward
query_states = self.q_proj(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/peft/tuners/lora/layer.py", line 497, in forward
result = self.base_layer(x, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/ipex_llm/transformers/low_bit_linear.py", line 720, in forward
invalidInputError(self.qtype != NF3 and self.qtype != NF4 and self.qtype != FP8E4
File "/home/intel/miniconda3/envs/ft_llm/lib/python3.11/site-packages/ipex_llm/utils/common/log4Error.py", line 32, in invalidInputError
raise RuntimeError(errMsg)
RuntimeError: NF3, NF4, FP4 and FP8 quantization are currently not supported on CPU
inference file content:
#"""
#CLI to run inference on a trained model
#"""
# ritu - added
from ipex_llm import llm_patch
llm_patch(train=True)
#end
from pathlib import Path
import fire
import transformers
from axolotl.cli import (
do_inference,
do_inference_gradio,
load_cfg,
print_axolotl_text_art,
)
from axolotl.common.cli import TrainerCliArgs
def do_cli(config: Path = Path("examples/"), gradio=False, **kwargs):
# pylint: disable=duplicate-code
print_axolotl_text_art()
parsed_cfg = load_cfg(config, **kwargs)
parsed_cfg.sample_packing = False
parser = transformers.HfArgumentParser((TrainerCliArgs))
parsed_cli_args, _ = parser.parse_args_into_dataclasses(
return_remaining_strings=True
)
parsed_cli_args.inference = True
if gradio:
do_inference_gradio(cfg=parsed_cfg, cli_args=parsed_cli_args)
else:
do_inference(cfg=parsed_cfg, cli_args=parsed_cli_args)
if __name__ == "__main__":
fire.Fire(do_cli)
Scenario:
- accelerate launch -m inference lora.yml --lora_model_dir="./qlora-out/"
after submitting instruction below issue occurred [ it also says certain quantization is not applicable on CPU, while we are running on GPU and did the FT on GPU"]
logs
inference file content:
adapter_config
The text was updated successfully, but these errors were encountered: