Model can't inference for Llama3.2-1B when use -d fp16 to convert pte

when we runing Llama3.2-1B fp16.pte which use -d fp16 to convert Llama3.2-1B w/BF16 to pte, It meet an issue:
**error log:**
I 00:00:00.013206 executorch:main.cpp:69] Resetting threadpool with num threads = 6
I 00:00:00.027952 executorch:runner.cpp:67] Creating LLaMa runner: model_path=llama3_2_fp16_org.pte, tokenizer_path=../tokenizer.model
E 00:00:00.728030 executorch:XNNCompiler.cpp:635] Failed to create multiply node 266 with code: xnn_status_invalid_parameter
E 00:00:00.728090 executorch:XNNPACKBackend.cpp:106] XNNCompiler::compileModel failed: 0x1
E 00:00:00.728099 executorch:method.cpp:110] Init failed for backend XnnpackBackend: 0x1
E 00:00:00.771031 executorch:XNNCompiler.cpp:635] Failed to create multiply node 266 with code: xnn_status_invalid_parameter
E 00:00:00.771096 executorch:XNNPACKBackend.cpp:106] XNNCompiler::compileModel failed: 0x1
E 00:00:00.771104 executorch:method.cpp:110] Init failed for backend XnnpackBackend: 0x1

**convert command:** 
python  -m examples.models.llama.export_llama    --model "llama3_2"    --checkpoint "/model_convert/Llama-3.2-1B/original/consolidated_00.pth"    --params "/Llama-3.2-1B/original/params.json"    --use_sdpa_with_kv_cache    -X    --xnnpack-extended-ops    --output_name "llama3_2_fp16_direct_convert_runtime.pte"    -kv    -d fp16 --max_seq_length 256



cc @digantdesai @mcr229 @cbilgin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model can't inference for Llama3.2-1B when use -d fp16 to convert pte #9534

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model can't inference for Llama3.2-1B when use -d fp16 to convert pte #9534

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions