When building from source getting errors quantization.

Hi,

First of all thanks for this fantastic work.   Great job !

After building intel-extension-for-transformers from source we noticed that we get errors when quantizing the model.

![image](https://github.com/intel/intel-extension-for-transformers/assets/2943831/701e0d81-1485-4e7c-ab52-3c79f2e1572f)

After the initial Llama 2 7B model is converted to ne_llama_f32.bin (Float 32) the library fails to quantize the model getting the below error.

quant_model: failed to quantize model from 'runtime_outs/ne_llama_f32.bin'
Traceback (most recent call last):
  File "/home/ubuntu/run.py", line 9, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=woq_config, trust_remote_code=True)
  File "/opt/conda/envs/llm_hakan/lib/python3.9/site-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py", line 265, in from_pretrained
    model.init(
  File "/opt/conda/envs/llm_hakan/lib/python3.9/site-packages/intel_extension_for_transformers/llm/runtime/graph/__init__.py", line 124, in init
    assert os.path.exists(quant_bin), "Fail to quantize model"
AssertionError: Fail to quantize model

What are we missing ?  Do we need to compile another package as well ?  

Thanks


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When building from source getting errors quantization. #1105

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When building from source getting errors quantization. #1105

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions