New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom model: RuntimeError: weight shared.weight does not exist #541
Comments
I got similar error when I load wizardcoder with quantize tag, without quantize everything is just fine. RuntimeError: weight transformer.h.0.attn.c_attn.qweight does not exist run: |
Same with a LoRA merged falcon. |
Happened to me as well, "fixed it" by reverting to the 0.8 version of the Docker container, so it seems 0.9 version specific. |
@ckanaar thanks for the advice. It works for me too. |
Hi @PitchboyDev - following up on this for deploying LORA merged Falcon models on to TGI. How did you manage to deploy the model by downgrading TGI to
However, when I use
Any insight on how you solved this? Thanks! |
@rohan-pradhan which version of falcon do you have ? For us, we have used the 7b version and downgrade to version 0.8 did the trick. |
@PitchboyDev - yes, we are using Falcon 7B too! |
It may be a problem related to safe tensors and torch shared tensors: https://huggingface.co/docs/safetensors/torch_shared_tensors Because I had a similar error when tying to manually save my model in safetensors using the save_file method:
Now I save it with save_model, and TGI gives me this kind of error:
EDIT: Solved it for my by generation the safetensors with the transformers
|
Thanks for sharing your solution ! |
Hello, Same issue here, we are trying to run our custom model with TGI (https://huggingface.co/cmarkea/bloomz-560m-sft-chat).
Our weights have the following format "transformer.word_embeddings.weight" and not "word_embeddings.weight" as the error suggests. So it looks like the base_model_prefix is not configured properly. Would it be possible to set base_model_prefix="transformer" as default for BloomModels as it is done for BloomPretrainedModels ? Or is it possible to add a CLI arg to specify the weight prefix? Looking forward to test the latest versions features 🚀 Thanks! |
System Info
Information
Tasks
Reproduction
When launching TGI on custom model derived from
lmsys/fastchat-t5-3b-v1.0
with the following command:docker run --rm --network none --gpus 0 -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-generation-inference:latest --model-id /data/fastchat-t5-3b-v1.0
I got the following error message:
Expected behavior
I'd like to run TGI on my custom model on my RTX-3090 GPU.
The text was updated successfully, but these errors were encountered: