Skip to content

Model can't inference for Llama3.2-1B when use -d fp16 to convert pte #9534

@WeiMa01

Description

@WeiMa01

when we runing Llama3.2-1B fp16.pte which use -d fp16 to convert Llama3.2-1B w/BF16 to pte, It meet an issue:
error log:
I 00:00:00.013206 executorch:main.cpp:69] Resetting threadpool with num threads = 6
I 00:00:00.027952 executorch:runner.cpp:67] Creating LLaMa runner: model_path=llama3_2_fp16_org.pte, tokenizer_path=../tokenizer.model
E 00:00:00.728030 executorch:XNNCompiler.cpp:635] Failed to create multiply node 266 with code: xnn_status_invalid_parameter
E 00:00:00.728090 executorch:XNNPACKBackend.cpp:106] XNNCompiler::compileModel failed: 0x1
E 00:00:00.728099 executorch:method.cpp:110] Init failed for backend XnnpackBackend: 0x1
E 00:00:00.771031 executorch:XNNCompiler.cpp:635] Failed to create multiply node 266 with code: xnn_status_invalid_parameter
E 00:00:00.771096 executorch:XNNPACKBackend.cpp:106] XNNCompiler::compileModel failed: 0x1
E 00:00:00.771104 executorch:method.cpp:110] Init failed for backend XnnpackBackend: 0x1

convert command:
python -m examples.models.llama.export_llama --model "llama3_2" --checkpoint "/model_convert/Llama-3.2-1B/original/consolidated_00.pth" --params "/Llama-3.2-1B/original/params.json" --use_sdpa_with_kv_cache -X --xnnpack-extended-ops --output_name "llama3_2_fp16_direct_convert_runtime.pte" -kv -d fp16 --max_seq_length 256

cc @digantdesai @mcr229 @cbilgin

Metadata

Metadata

Assignees

Labels

module: xnnpackIssues related to xnnpack delegation and the code under backends/xnnpack/triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

To triage

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions