What's the proper way to run a downloaded model with GPU? #990
Unanswered
dany-nonstop
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
GALLERIES=[{"name":"model-gallery","url":"github:go-skynet/model-gallery/index.yaml"},{"url":"github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]
environment parameter to enable model galleries,{"id": "thebloke__wizard-vicuna-13b-ggml__wizard-vicuna-13b.ggmlv3.q4_0.bin"}
at api's/models/apply
to download a model locally, the download was successfulthebloke__wizard-vicuna-13b-ggml__wizard-vicuna-13b.ggmlv3.q4_0.bin.yaml
vicuna-chat.tmpl
vicuna-completion.tmpl
wizard-vicuna-13B.ggmlv3.q4_0.bin
./v1/completion
, I can only call the model by thewizard-vicuna-13B.ggmlv3.q4_0.bin
file name, not the model name in the gallerythebloke__wizard-vicuna-13b-ggml__wizard-vicuna-13b.ggmlv3.q4_0.bin
top
and and GPU load withnvidia-smi
inside the dockerthebloke__wizard-vicuna-13b-ggml__wizard-vicuna-13b.ggmlv3.q4_0.bin.yaml
, and added a linegpu_layers: 1000
, but it seems the file is ignored if I'm using the actual model namewizard-vicuna-13B.ggmlv3.q4_0.bin
.question: what is the proper way to use a model's definition yaml file? how can I enable GPU for inference?
many thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions