Skip to content

bug: inference of fine-tuning model is not working #143

@KayMKM

Description

@KayMKM

I followed this document to fine-tune and deployed inference: https://github.com/microsoft/vscode-ai-toolkit/blob/main/doc/finetune.md

But I the inference endpoint occurs an error:

2025-01-09T06:46:43.857955385Z Traceback (most recent call last):
2025-01-09T06:46:43.857984860Z   File "/mount/inference/utils.py", line 55, in load_model
2025-01-09T06:46:43.875525329Z     model = AutoModelForCausalLM.from_pretrained(
2025-01-09T06:46:43.875556908Z             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-09T06:46:43.875562920Z   File "/opt/conda/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
2025-01-09T06:46:43.875625156Z     return model_class.from_pretrained(
2025-01-09T06:46:43.875632990Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-09T06:46:43.875660385Z   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4000, in from_pretrained
2025-01-09T06:46:43.876266321Z     dispatch_model(model, **device_map_kwargs)
2025-01-09T06:46:43.876299283Z   File "/opt/conda/lib/python3.11/site-packages/accelerate/big_modeling.py", line 498, in dispatch_model
2025-01-09T06:46:43.876418477Z     model.to(device)
2025-01-09T06:46:43.876484914Z   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2849, in to
2025-01-09T06:46:43.876775446Z     raise ValueError(
2025-01-09T06:46:43.876788781Z ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
2025-01-09T06:46:43.876792589Z 
2025-01-09T06:46:43.876795624Z During handling of the above exception, another exception occurred:
2025-01-09T06:46:43.876797928Z 
2025-01-09T06:46:43.876801214Z Traceback (most recent call last):
2025-01-09T06:46:43.876829868Z   File "/mount/inference/./gradio_chat.py", line 42, in <module>
2025-01-09T06:46:43.895541690Z     model = load_model(model_name, torch_dtype, quant_type)
2025-01-09T06:46:43.895572127Z             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-09T06:46:43.895577717Z   File "/mount/inference/utils.py", line 70, in load_model
2025-01-09T06:46:43.908149431Z     raise RuntimeError(f"Error loading model: {e}")
2025-01-09T06:46:43.908180659Z RuntimeError: Error loading model: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

I tried to upgrade the transformer package version to 4.47.1 but not working

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions