-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Llama-2-7b-chat-hf-q4f16_1 model conversion reported an error: Vulkan target does not support Float16 capability. #1606
Comments
CPU:Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz,GPU:Huawei Technologies Co., Ltd. Hi1710 |
Please verify your GPU supports Vulkan float16 (both hardware and Vulkan driver support). We detect the vulkan device_properties through Vulkan API in TVM, which works with popular hardware. |
i am also getting same kind of error how to resolve this |
Perform model conversion:mlc_chat convert_weight ./dist/models/Llama2-7B-Chat-q4f16_1 --quantization q4f16_1 -o dist/Llama2-7B-Chat-q4f16_1-MLC
[2024-01-15 16:31:03] INFO auto_config.py:115: Found model configuration: dist/models/Llama2-7B-Chat-q4f16_1/config.json
[2024-01-15 16:31:03] INFO auto_device.py:85: Not found device: cuda:0
[2024-01-15 16:31:04] INFO auto_device.py:85: Not found device: rocm:0
[2024-01-15 16:31:04] INFO auto_device.py:85: Not found device: metal:0
[2024-01-15 16:31:04] INFO auto_device.py:76: Found device: vulkan:0
[2024-01-15 16:31:05] INFO auto_device.py:85: Not found device: opencl:0
[2024-01-15 16:31:05] INFO auto_device.py:33: Using device: vulkan:0
[2024-01-15 16:31:05] INFO auto_weight.py:70: Finding weights in: dist/models/Llama2-7B-Chat-q4f16_1
[2024-01-15 16:31:05] INFO auto_weight.py:120: Found source weight format: huggingface-torch. Source configuration: dist/models/Llama2-7B-Chat-q4f16_1/pytorch_model.bin.index.json
[2024-01-15 16:31:05] INFO auto_weight.py:143: Found source weight format: huggingface-safetensor. Source configuration: dist/models/Llama2-7B-Chat-q4f16_1/model.safetensors.index.json
[2024-01-15 16:31:05] INFO auto_weight.py:106: Using source weight configuration: dist/models/Llama2-7B-Chat-q4f16_1/pytorch_model.bin.index.json. Use
--source
to override.[2024-01-15 16:31:05] INFO auto_weight.py:110: Using source weight format: huggingface-torch. Use
--source-format
to override.[2024-01-15 16:31:05] INFO auto_config.py:153: Found model type: llama. Use
--model-type
to override.Weight conversion with arguments:
--config dist/models/Llama2-7B-Chat-q4f16_1/config.json
--quantization GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7)
--model-type llama
--device vulkan:0
--source dist/models/Llama2-7B-Chat-q4f16_1/pytorch_model.bin.index.json
--source-format huggingface-torch
--output dist/Llama2-7B-Chat-q4f16_1-MLC
[2024-01-15 16:31:05] INFO llama_model.py:51: context_window_size not found in config.json. Falling back to max_position_embeddings (2048)
[2024-01-15 16:31:05] INFO llama_model.py:71: prefill_chunk_size defaults to context_window_size (2048)
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
[2024-01-15 16:31:10] INFO huggingface_loader.py:169: Loading HF parameters from: dist/models/Llama2-7B-Chat-q4f16_1/pytorch_model-00002-of-00002.bin
[2024-01-15 16:31:13] INFO group_quantization.py:227: Compiling quantize function for key: ((32000, 4096), float16, vulkan, axis=1, output_transpose=False)
0%| | 0/195 [00:03<?, ?it/s]
Traceback (most recent call last):
File "/home/kylin/miniconda3/bin/mlc_chat", line 8, in
sys.exit(main())
^^^^^^
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/main.py", line 28, in main
cli.main(sys.argv[2:])
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/cli/convert_weight.py", line 87, in main
convert_weight(
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/interface/convert_weight.py", line 156, in convert_weight
_convert_args(args)
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/interface/convert_weight.py", line 107, in _convert_args
for name, param in LOADER[args.source_format](
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/loader/huggingface_loader.py", line 118, in load
q_params = self.quantize_param_map.map_funcmlc_name
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/quantization/group_quantization.py", line 228, in quantize_weight
quantize_func = _compile_quantize_func(_create_quantize_func())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/quantization/group_quantization.py", line 217, in _compile_quantize_func
ex = relax.build(mod, target=target)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kylin/miniconda3/lib/python3.11/site-packages/tvm/relax/vm_build.py", line 341, in build
return _vmlink(
^^^^^^^^
File "/home/kylin/miniconda3/lib/python3.11/site-packages/tvm/relax/vm_build.py", line 247, in _vmlink
lib = tvm.build(
^^^^^^^^^^
File "/home/kylin/miniconda3/lib/python3.11/site-packages/tvm/driver/build_module.py", line 294, in build
rt_mod_host = _driver_ffi.tir_to_runtime(annotated_mods, target_host)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.call
File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
File "/home/kylin/miniconda3/lib/python3.11/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
tvm.error.InternalError: Traceback (most recent call last):
10: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)>::AssignTypedLambda<tvm::__mk_TVM23::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)#1}>(tvm::__mk_TVM23::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)#1}, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, tvm::runtime::TVMRetValue)
9: tvm::TIRToRuntime(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)
8: tvm::codegen::Build(tvm::IRModule, tvm::Target)
7: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::IRModule, tvm::Target)>::AssignTypedLambda<tvm::codegen::__mk_TVM0::{lambda(tvm::IRModule, tvm::Target)#1}>(tvm::codegen::__mk_TVM0::{lambda(tvm::IRModule, tvm::Target)#1}, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, tvm::runtime::TVMRetValue)
6: tvm::codegen::BuildSPIRV(tvm::IRModule, tvm::Target)
5: tvm::codegen::LowerToSPIRV[abi:cxx11](tvm::IRModule, tvm::Target)
4: tvm::codegen::CodeGenSPIRV::BuildFunction(tvm::tir::PrimFunc const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
3: tvm::codegen::spirv::IRBuilder::GetSType(tvm::runtime::DataType const&, unsigned int, unsigned int)
2: tvm::codegen::spirv::IRBuilder::DeclareType(tvm::runtime::DataType const&, unsigned int, unsigned int)
1: tvm::codegen::spirv::IRBuilder::AddCapabilityFor(tvm::runtime::DataType const&)
0: ZN3tvm7runtime6deta
File "/workspace/tvm/src/target/spirv/ir_builder.cc", line 566
InternalError: Check failed: (spirv_support.supports_float16) is false: Vulkan target does not support Float16 capability. If your device supports 16-bit float operations, please either add -supports_float16=1 to the target, or query all device parameters by adding -from_device=0.
The text was updated successfully, but these errors were encountered: