bug: inference of fine-tuning model is not working

I followed this document to fine-tune and deployed inference: https://github.com/microsoft/vscode-ai-toolkit/blob/main/doc/finetune.md

But I the inference endpoint occurs an error:
```
2025-01-09T06:46:43.857955385Z Traceback (most recent call last):
2025-01-09T06:46:43.857984860Z   File "/mount/inference/utils.py", line 55, in load_model
2025-01-09T06:46:43.875525329Z     model = AutoModelForCausalLM.from_pretrained(
2025-01-09T06:46:43.875556908Z             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-09T06:46:43.875562920Z   File "/opt/conda/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
2025-01-09T06:46:43.875625156Z     return model_class.from_pretrained(
2025-01-09T06:46:43.875632990Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-09T06:46:43.875660385Z   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4000, in from_pretrained
2025-01-09T06:46:43.876266321Z     dispatch_model(model, **device_map_kwargs)
2025-01-09T06:46:43.876299283Z   File "/opt/conda/lib/python3.11/site-packages/accelerate/big_modeling.py", line 498, in dispatch_model
2025-01-09T06:46:43.876418477Z     model.to(device)
2025-01-09T06:46:43.876484914Z   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2849, in to
2025-01-09T06:46:43.876775446Z     raise ValueError(
2025-01-09T06:46:43.876788781Z ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
2025-01-09T06:46:43.876792589Z 
2025-01-09T06:46:43.876795624Z During handling of the above exception, another exception occurred:
2025-01-09T06:46:43.876797928Z 
2025-01-09T06:46:43.876801214Z Traceback (most recent call last):
2025-01-09T06:46:43.876829868Z   File "/mount/inference/./gradio_chat.py", line 42, in <module>
2025-01-09T06:46:43.895541690Z     model = load_model(model_name, torch_dtype, quant_type)
2025-01-09T06:46:43.895572127Z             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-09T06:46:43.895577717Z   File "/mount/inference/utils.py", line 70, in load_model
2025-01-09T06:46:43.908149431Z     raise RuntimeError(f"Error loading model: {e}")
2025-01-09T06:46:43.908180659Z RuntimeError: Error loading model: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
```

I tried to upgrade the transformer package version to 4.47.1 but not working

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: inference of fine-tuning model is not working #143

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: inference of fine-tuning model is not working #143

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions