Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

When building from source getting errors quantization. #1105

@choochtech

Description

@choochtech

Hi,

First of all thanks for this fantastic work. Great job !

After building intel-extension-for-transformers from source we noticed that we get errors when quantizing the model.

image

After the initial Llama 2 7B model is converted to ne_llama_f32.bin (Float 32) the library fails to quantize the model getting the below error.

quant_model: failed to quantize model from 'runtime_outs/ne_llama_f32.bin'
Traceback (most recent call last):
File "/home/ubuntu/run.py", line 9, in
model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=woq_config, trust_remote_code=True)
File "/opt/conda/envs/llm_hakan/lib/python3.9/site-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py", line 265, in from_pretrained
model.init(
File "/opt/conda/envs/llm_hakan/lib/python3.9/site-packages/intel_extension_for_transformers/llm/runtime/graph/init.py", line 124, in init
assert os.path.exists(quant_bin), "Fail to quantize model"
AssertionError: Fail to quantize model

What are we missing ? Do we need to compile another package as well ?

Thanks

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions