-
Notifications
You must be signed in to change notification settings - Fork 192
Closed
Labels
needs attentionThe issue needs contributor's attentionThe issue needs contributor's attentionneeds more infoNeed user to provide more infoNeed user to provide more infono recent activity
Description
I followed this document to fine-tune and deployed inference: https://github.com/microsoft/vscode-ai-toolkit/blob/main/doc/finetune.md
But I the inference endpoint occurs an error:
2025-01-09T06:46:43.857955385Z Traceback (most recent call last):
2025-01-09T06:46:43.857984860Z File "/mount/inference/utils.py", line 55, in load_model
2025-01-09T06:46:43.875525329Z model = AutoModelForCausalLM.from_pretrained(
2025-01-09T06:46:43.875556908Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-09T06:46:43.875562920Z File "/opt/conda/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
2025-01-09T06:46:43.875625156Z return model_class.from_pretrained(
2025-01-09T06:46:43.875632990Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-09T06:46:43.875660385Z File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4000, in from_pretrained
2025-01-09T06:46:43.876266321Z dispatch_model(model, **device_map_kwargs)
2025-01-09T06:46:43.876299283Z File "/opt/conda/lib/python3.11/site-packages/accelerate/big_modeling.py", line 498, in dispatch_model
2025-01-09T06:46:43.876418477Z model.to(device)
2025-01-09T06:46:43.876484914Z File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2849, in to
2025-01-09T06:46:43.876775446Z raise ValueError(
2025-01-09T06:46:43.876788781Z ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
2025-01-09T06:46:43.876792589Z
2025-01-09T06:46:43.876795624Z During handling of the above exception, another exception occurred:
2025-01-09T06:46:43.876797928Z
2025-01-09T06:46:43.876801214Z Traceback (most recent call last):
2025-01-09T06:46:43.876829868Z File "/mount/inference/./gradio_chat.py", line 42, in <module>
2025-01-09T06:46:43.895541690Z model = load_model(model_name, torch_dtype, quant_type)
2025-01-09T06:46:43.895572127Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-09T06:46:43.895577717Z File "/mount/inference/utils.py", line 70, in load_model
2025-01-09T06:46:43.908149431Z raise RuntimeError(f"Error loading model: {e}")
2025-01-09T06:46:43.908180659Z RuntimeError: Error loading model: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
I tried to upgrade the transformer package version to 4.47.1 but not working
Metadata
Metadata
Assignees
Labels
needs attentionThe issue needs contributor's attentionThe issue needs contributor's attentionneeds more infoNeed user to provide more infoNeed user to provide more infono recent activity